Fulltext Indexing in PostgreSQL 9.6/10

ktm · September 28, 2017, 3:08pm

Hi,

I have been looking at RT fulltext indexing. We have had reports that too many results were returned for a specific search. The problem was due to the fact that the function that is used to generate the database query for PostgreSQL is plainto_tsquery(‘string’). This function converts the string into individual tokens and then produces a query requesting any item that contains all of the listed tokens at any location. This means that searching for ‘bad word’ would also return results with ‘word bad’. In PostgreSQL 9.6 and 10, we can do better. They introduce the function phraseto_tsquery(‘string’). Like plainto_tsquery() it generates a list of tokens from the string however is also constrains the locations of the tokens such that token1 is before token2 which is before token3, … Would it be possible to use it instead if the PostgreSQL version is 9.6 or higher?

It looks like the only location that needs to be changed is:

— src/rt-4.4.2/lib/RT/Tickets.pm 2017-07-12 09:23:39.000000000 -0500
+++ lib/RT/Tickets.pm 2017-09-28 10:05:25.163603802 -0500
@@ -977,7 +977,7 @@
ALIAS => $alias,
FIELD => $index,
OPERATOR => ‘@@’,

           VALUE       => 'plainto_tsquery('. $dbh->quote($value) .')',

           VALUE       => 'phraseto_tsquery('. $dbh->quote($value) .')',
           QUOTEVALUE  => 0,
       );
   }

Regards,
Ken

ktm · January 24, 2019, 8:38pm

Hi again,

Since PostgreSQL 11 has been released, it would be nice to perhaps move to using the function websearch_to_tsquery() instead since it provides these nice options for refining your searches:

unquoted text: text not inside quote marks will be converted to terms separated by & operators, as if processed by plainto_tsquery.
“quoted text”: text inside quote marks will be converted to terms separated by <-> operators, as if processed by phraseto_tsquery.
OR: logical or will be converted to the | operator.
-: the logical not operator, converted to the the ! operator.

Regards,
Ken

Petr_Hanousek · January 29, 2019, 9:10am

I agree, thumbs up. Maybe BP support notices this thread

ktm · February 11, 2020, 10:08pm

One final followup. It looks like all that needs to happen is to replace the following line in lib/RT/Tickets.pm:

            VALUE       => 'plainto_tsquery('. $dbh->quote($value) .')',

with:

            VALUE       => 'websearch_to_tsquery('. $dbh->quote($value) .')',

I did testing against the original with the strings to specify the extra options for websearch and they pass through correctly to the DB, so effectively you will get the normal behavior and then if you know the syntax all of the additional options. It needs at least PostgreSQL 11+. This is easier than phraseto_tsquery() because there is no need to code the fallback to plainto_tsquery().

Regards,
Ken