RT full text indexing with PostgreSQL

Kenneth_Marshall · February 2, 2009, 3:41am

I just updated the PostgreSQLFullText pages on the wiki with
the triggers to keep the parsed document columns updated whenever
there is a change to the dependent columns.

Here are the commands for adding the triggers to keep the textsearchable
columns corresponding to attachments.subject/content and
objectcustomfieldvalues.largecontent up to date:

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON attachments FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(textsearchable, ‘pg_catalog.english’, subject, content);

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON objectcustomfieldvalues FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(textsearchable, ‘pg_catalog.english’, largecontent);

Please let me know if there are any problems and bon voyage.

Ken

Kenneth_Marshall · February 2, 2009, 5:29pm

I just updated the PostgreSQLFullText pages on the wiki with
the triggers to keep the parsed document columns updated whenever
there is a change to the dependent columns.

Here are the commands for adding the triggers to keep the textsearchable
columns corresponding to attachments.subject/content and
objectcustomfieldvalues.largecontent up to date:

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON attachments FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(textsearchable, ‘pg_catalog.english’, subject, content);

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON objectcustomfieldvalues FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(textsearchable, ‘pg_catalog.english’, largecontent);

Please let me know if there are any problems and bon voyage.

Ken

I have added a variation of the above triggers to the wiki page
to only pre-parse the first 1MB of attachments/customfields:

CREATE FUNCTION attachments_trigger() RETURNS trigger AS $$
begin
new.textsearchable :=
to_tsvector(‘pg_catalog.english’, substring(coalesce(new.subject, ‘’) || coalesce(new.content, ‘’) from 1 for 1000000));
return new;
end
$$ LANGUAGE plpgsql;

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON attachments FOR EACH ROW EXECUTE PROCEDURE attachments_trigger();

CREATE FUNCTION objectcustomfieldvalues_trigger() RETURNS trigger AS $$
begin
new.textsearchable :=
to_tsvector(‘pg_catalog.english’, substring(coalesce(new.largecontent, ‘’) from 1 for 1000000));
return new;
end
$$ LANGUAGE plpgsql;

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON objectcustomfieldvalues FOR EACH ROW EXECUTE PROCEDURE
objectcustomfieldvalues_trigger();

Please send me any feedback or suggestions.

Cheers,
Ken

PS. The fast full contents searches really rock!

Kenneth_Marshall · February 3, 2009, 11:45pm

I just updated the PostgreSQLFullText pages on the wiki with
the triggers to keep the parsed document columns updated whenever
there is a change to the dependent columns.

Here are the commands for adding the triggers to keep the textsearchable
columns corresponding to attachments.subject/content and
objectcustomfieldvalues.largecontent up to date:

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON attachments FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(textsearchable, ‘pg_catalog.english’, subject, content);

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON objectcustomfieldvalues FOR EACH ROW EXECUTE PROCEDURE
tsvector_update_trigger(textsearchable, ‘pg_catalog.english’, largecontent);

Please let me know if there are any problems and bon voyage.

Ken

I have added a variation of the above triggers to the wiki page
to only pre-parse the first 1MB of attachments/customfields:

CREATE FUNCTION attachments_trigger() RETURNS trigger AS $$
begin
new.textsearchable :=
to_tsvector(‘pg_catalog.english’, substring(coalesce(new.subject, ‘’) || coalesce(new.content, ‘’) from 1 for 1000000));
return new;
end
$$ LANGUAGE plpgsql;

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON attachments FOR EACH ROW EXECUTE PROCEDURE attachments_trigger();

CREATE FUNCTION objectcustomfieldvalues_trigger() RETURNS trigger AS $$
begin
new.textsearchable :=
to_tsvector(‘pg_catalog.english’, substring(coalesce(new.largecontent, ‘’) from 1 for 1000000));
return new;
end
$$ LANGUAGE plpgsql;

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE
ON objectcustomfieldvalues FOR EACH ROW EXECUTE PROCEDURE
objectcustomfieldvalues_trigger();

Please send me any feedback or suggestions.

Cheers,
Ken

PS. The fast full contents searches really rock!

Dear RT Users,

I just finished testing the two index types for PostgreSQL text
indexing (GIST and GIN). Here are the index creation commands using
the GIN index type. I have tried both, and unless you are in an
extremely update intensive environment you will really want GIN
– very, very fast queries.

CREATE INDEX attachments_textsearch ON attachments
USING GIN (textsearchable );

CREATE INDEX largecontent_textsearch ON objectcustomfieldvalues
USING GIN (textsearchable );

Happy full text searching,
Ken

Rafael_Martinez · February 6, 2009, 10:01am

Kenneth Marshall wrote:> On Sun, Feb 01, 2009 at 09:41:42PM -0600, Kenneth Marshall wrote:

I just updated the PostgreSQLFullText pages on the wiki with
the triggers to keep the parsed document columns updated whenever
there is a change to the dependent columns.

Hello Kenneth

First of all thank you for this work. We have been waiting for this for
a long time but the work we thought had to be done with the RT-code has
stopped us to implement full text search.

After a close check of the wiki page:
PostgreSQLFullText - Request Tracker Wiki …

I think that the patch of DBIx::SearchBuilder is not 100%
complete/correct. All lines has the ‘-’ (minus) symbol. I suppose that
at least some of them should be ‘+’ and not ‘-’?

This is a tiny patch, would not be an idea to implement this in RT-core
functionality, maybe with some config parameters that
activate/deactivate full text search (when using postgresql/oracle, etc) .

regards
Rafael Martinez, r.m.guerrero@usit.uio.no
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/

Kenneth_Marshall · February 6, 2009, 1:44pm

Kenneth Marshall wrote:

I just updated the PostgreSQLFullText pages on the wiki with
the triggers to keep the parsed document columns updated whenever
there is a change to the dependent columns.

Hello Kenneth

First of all thank you for this work. We have been waiting for this for
a long time but the work we thought had to be done with the RT-code has
stopped us to implement full text search.

After a close check of the wiki page:
PostgreSQLFullText - Request Tracker Wiki …

I think that the patch of DBIx::SearchBuilder is not 100%
complete/correct. All lines has the ‘-’ (minus) symbol. I suppose that
at least some of them should be ‘+’ and not ‘-’?

This is a tiny patch, would not be an idea to implement this in RT-core
functionality, maybe with some config parameters that
activate/deactivate full text search (when using postgresql/oracle, etc) .

regards

Rafael Martinez, r.m.guerrero@usit.uio.no
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/

Rafael,

The diff I posted was to go from the FULLTEXT patch to the
original one. I agree that it would make more sense to go the
other way. I have updated the page. I do think that having it
as an option in RT-core would be nice. I know for PostgreSQL
versions 8.3 and higher, you can use the patch by default since
FULLTEXT indexing comes with the database as a standard feature.
PostgreSQL will simply convert the attachment field automatically
using to_tsvector() on the fly so it will just work as is. Of
course creating the needed index will greatly improve the performance.
I do not know if OracleText has the same fallback in the absence of
the FULLTEXT setup and index. I suspect that run-time tests for
a configuration option would be needed.

Cheers,
Ken

Kenneth_Marshall · February 6, 2009, 1:46pm

Kenneth Marshall wrote:

I just updated the PostgreSQLFullText pages on the wiki with
the triggers to keep the parsed document columns updated whenever
there is a change to the dependent columns.

Hello Kenneth

First of all thank you for this work. We have been waiting for this for
a long time but the work we thought had to be done with the RT-code has
stopped us to implement full text search.

After a close check of the wiki page:
PostgreSQLFullText - Request Tracker Wiki …

I think that the patch of DBIx::SearchBuilder is not 100%
complete/correct. All lines has the ‘-’ (minus) symbol. I suppose that
at least some of them should be ‘+’ and not ‘-’?

This is a tiny patch, would not be an idea to implement this in RT-core
functionality, maybe with some config parameters that
activate/deactivate full text search (when using postgresql/oracle, etc) .

regards

Rafael Martinez, r.m.guerrero@usit.uio.no
Center for Information Technology Services
University of Oslo, Norway

PGP Public Key: http://folk.uio.no/rafael/

Rafael,

The diff I posted was to go from the FULLTEXT patch to the
original one. I agree that it would make more sense to go the
other way. I have updated the page. I do think that having it
as an option in RT-core would be nice. I know for PostgreSQL
versions 8.3 and higher, you can use the patch by default since
FULLTEXT indexing comes with the database as a standard feature.
PostgreSQL will simply convert the attachment field automatically
using to_tsvector() on the fly so it will just work as is. Of
course creating the needed index will greatly improve the performance.
I do not know if OracleText has the same fallback in the absence of
the FULLTEXT setup and index. I suspect that run-time tests for
a configuration option would be needed.

Cheers,
Ken

Sorry about this post. It applied to the version that did not
use an additional column to store the processed attachments/largecontent
not the one I posted.

Ken