RT Include substrings with Full-Text Indexing (PostgreSQL)

Hello all,

Is there any way to setup full-text indexing (with PostgreSQL) to index substrings?

For example, I want to be able to search on part of a string but it currently fails because it is not delimited by spaces

In ticket:
https://encrypted.google.com/search?q=rt

Search query:
google.com

Results:
No results

Using RT 4.0.5

Thanks,
Greg

Hello all,

Is there any way to setup full-text indexing (with PostgreSQL) to index substrings?

For example, I want to be able to search on part of a string but it currently fails because it is not delimited by spaces

In ticket:
https://encrypted.google.com/search?q=rt

Search query:
google.com

Results:
No results

Using RT 4.0.5

Thanks,
Greg

Hi Greg,

What exactly do you mean? Could you give an explicit example, please.

Cheers,
Ken

Hey Ken,

So right know everything is indexed and searching works correctly except that I cannot search on a substring.

For example,

I have a ticket with the following contained within.
https://encrypted.google.com/search?q=rt

I search for “google.com”…
No results are returned.

I search for “https://encrypted.google.com/search?q=rt” …
The ticket with the content is returned.

It looks as though the content within ticket is indexed based off of whitespaces into the attachments table.

Thanks,
Greg-----Original Message-----
From: ktm@rice.edu [mailto:ktm@rice.edu]
Sent: Thursday, July 26, 2012 3:06 PM
To: Betz, Gregory
Cc: rt-users@lists.bestpractical.com
Subject: EXTERNAL: Re: [rt-users] RT Include substrings with Full-Text Indexing (PostgreSQL)

On Thu, Jul 26, 2012 at 06:59:49PM +0000, Betz, Gregory wrote:

Hello all,

Is there any way to setup full-text indexing (with PostgreSQL) to index substrings?

For example, I want to be able to search on part of a string but it
currently fails because it is not delimited by spaces

In ticket:
https://encrypted.google.com/search?q=rt

Search query:
google.com

Results:
No results

Using RT 4.0.5

Thanks,
Greg

Hi Greg,

What exactly do you mean? Could you give an explicit example, please.

Cheers,
Ken

Hey Ken,

So right know everything is indexed and searching works correctly except that I cannot search on a substring.

For example,

I have a ticket with the following contained within.
https://encrypted.google.com/search?q=rt

I search for “google.com”…
No results are returned.

I search for “https://encrypted.google.com/search?q=rt” …
The ticket with the content is returned.

It looks as though the content within ticket is indexed based off of whitespaces into the attachments table.

Thanks,
Greg

Hi Greg,

Row I get it. I have had the same problem and it has been discussed on
the PostgreSQL mailing lists. This is caused by the parser/tokenizer that is
used by default in PostgreSQL. It only indexes the “full” hostname. There was
someone working to change some of these behaviors in the default parser. In
particular a domain/sub-domain part of a token would also be indexed. I do
not think that it has been finished and I have not seen any recent discussions.

Cheers,
Ken

It looks as though the content within ticket is indexed based off of
whitespaces into the attachments table.

It’s not necessarily just whitespace, but it’s however Pg internally
tokenizes the text. substring matching on tokens is a hard problem to
get right without bogging down to the same problems as LIKE.

If you search around there’s a lot of discussion about this, as Ken
suggests.

Hello all,

Is there any way to setup full-text indexing (with PostgreSQL) to index substrings?

I have setup this for RT 3.8.10:
http://requesttracker.wikia.com/wiki/PostgreSQLFullTextTrgm

This setup addresses exactly the problem you have.
My wiki-page is marked as outdated now. I did not try to upgrade to RT
4.x to say (I don’t have examined the data model of RT4.x if it
changed). This setup a bit wastes your database space of course. You
need to extract all possible trigrams from contents…

Zito

rt-mysql2pg (20.2 KB)