Case insensitive searching in Content?

Hi,
I’m preparing migration from RT 3.2.2 + Mysql 3.x to RT 3.8.8

  • Mysql or Pg.
    I just discovered a problem with the case insensitive searching in the
    column Attachments.Content (longblob) while playing with a new RT
    instance.

Ticket SQL query
Content like 'outlook’
on our current (old) RT instance, returns some result set. Column Content has
data-type longtext in the old database schema, so every letter case
combination are found.

On a new testing platform RT 3.8.8 + Mysql only partial set of tickets is
returned with the same query. I must to modify the query to
Content like ‘outlook’ or Content like 'Outlook’
at least to see the most of the tickets.

On the other hand RT 3.8.8 + Pg has Content column data-type text, so
case-insensitive searching works, but there is another issue with non
UTF-8 attachments encoded in quoted-printable, but this is a minor
problem. Pg database behaves better than Mysql in this aspect.

Is there any intention to detach non textual data from the Content
column into e.g. ContentB in future?

Have anyone some knowledge or suggestion about this?

Best Regards
Zito

… 3rd try to mail …On Sun, Jun 06, 2010 at 02:34:58PM -0500, Kenneth Marshall wrote:

Hi Zito,

You can look in the mailing list archives for several similar “discoveries” on
how MySQL works with content searches and the longblob data type. I would
definitely recommend using PostgreSQL for its full-text indexing available
in the wiki. Of course, I am partial… :slight_smile:

Hi Ken,
thanks for the suggestion. I red the wiki page
http://wiki.bestpractical.com/view/PostgreSQLFullText already.
I’m definitely going to try this way, also patching DBIx::SearchBuilder
is a bit hack, but ILIKE is too slow on Pg.
Regards
Zito

… 3rd try to mail …

Hi Zito,

You can look in the mailing list archives for several similar “discoveries” on
how MySQL works with content searches and the longblob data type. I would
definitely recommend using PostgreSQL for its full-text indexing available
in the wiki. Of course, I am partial… :slight_smile:

Hi Ken,
thanks for the suggestion. I red the wiki page
http://wiki.bestpractical.com/view/PostgreSQLFullText already.
I’m definitely going to try this way, also patching DBIx::SearchBuilder
is a bit hack, but ILIKE is too slow on Pg.
Regards

Zito

Hi Zito,

ILIKE is fine on PostgreSQL as long as the search is anchored on
the left and there is an index. Otherwise it will do a sequential
scan of the attachments table to find matching records. The wiki
entries for full-text indexing with PostgreSQL and Oracle can avoid
the sequential scan through the use of the database’s full-text
index support. I do agree that patching the actual DBIx::SearchBuilder
is not preferable. Unfortunately, since that is the API that is used
to provide a database abstraction layer and it does not yet include
the ability to specify full-text indexing details, you have to patch
it. If it is any consolation, the patch is much, much simpler (trivial)
for the PostgreSQL support versus the version that is needed for
OracleText. :slight_smile:

Regards,
Ken

Hi Ken,

ILIKE is fine on PostgreSQL as long as the search is anchored on
the left and there is an index. Otherwise it will do a sequential
scan of the attachments table to find matching records. The wiki
entries for full-text indexing with PostgreSQL and Oracle can avoid
the sequential scan through the use of the database’s full-text

thanks for your work on PostgreSQLFullText wiki page! Good starting
point. I’m playing with the fulltext on one instance already. I must
discuss the possibility to use the fulltext on Pg with my leader. The
results are different from those get from ILIKE. I tried to find 'cisco’
for example :). Normal search returned 25 tickets, but fulltext only 15.
Fulltext parser did tokens from URL and www.cisco.com was a lexem as is
(not broken further). I can’t say this is fine or not, it is difference.
Users must know this - how to write the query correctly.

index support. I do agree that patching the actual DBIx::SearchBuilder
is not preferable. Unfortunately, since that is the API that is used
to provide a database abstraction layer and it does not yet include
the ability to specify full-text indexing details, you have to patch
it. If it is any consolation, the patch is much, much simpler (trivial)
for the PostgreSQL support versus the version that is needed for
OracleText. :slight_smile:

I completely understand. I put the package libdbix-searchbuilder-perl
(Debian) on hold in aptitude, so sysadmin should remember, that this
package needs a special handling. I did a copy of SearchBuilder.pm file
into /local/lib/DBIx/ and did a modification. This perl path
is searched first, so a modification is beside other RT code…

I have prepared a script for converting data from Mysql to Pg and for
adding/removing fulltext support based on your wiki page. Maybe it can
be added to wiki for someone else to play with fulltext and Pg. I must
to test it a bit and switch some fixed parts into conditional blocks
controlled by command-line options.

Regards
Zito

Hi Ken,

ILIKE is fine on PostgreSQL as long as the search is anchored on
the left and there is an index. Otherwise it will do a sequential
scan of the attachments table to find matching records. The wiki
entries for full-text indexing with PostgreSQL and Oracle can avoid
the sequential scan through the use of the database’s full-text

thanks for your work on PostgreSQLFullText wiki page! Good starting
point. I’m playing with the fulltext on one instance already. I must
discuss the possibility to use the fulltext on Pg with my leader. The
results are different from those get from ILIKE. I tried to find 'cisco’
for example :). Normal search returned 25 tickets, but fulltext only 15.
Fulltext parser did tokens from URL and www.cisco.com was a lexem as is
(not broken further). I can’t say this is fine or not, it is difference.
Users must know this - how to write the query correctly.

Correct, the results depend on the parsing rules that are used
by the full-text configuration. These rules can be adapted to
your needs, but I do agree that some explanation about common
types of query problems would be worthwhile.

Regards,
Ken