Slow content queries

Another downside of my users’ penchant for attaching large files is that
content queries using the query builder are unusably slow now. I’ve
noticed that content queries blindly search all attachments, including
non-text attachments. Changing content searches to only search
text-type attachments results in significant performance increases (33
seconds down to 16 seconds in a test query – I ran each query multiple
times to get stable numbers). However, most of the time when I do a
content search, I really only expect to be searching the text of in-line
comments and replies, and it would really help if content queries could
have a content-type modifier, with normal in-line message content being
the default. If I restrict a content search to only search attachments
with no filenames (“and attachments.filename is null”), the speed of my
test query improves to 1.4 seconds. Where would be the right spot to
inject this additional condition into the RT code?

I’ve also been wondering whether it would someday be possible for for
DBIx::SearchBuilder to be modified to optionally support PostgreSQL’s de
facto official full-text search engine, tsearch2?

A query that generated SQL like this:

( (attachments.content ilike ‘%term1%’) and (attachments.content ilike
’%term2%’) )

would wind up looking like this when using tsearch2:

( attachments.content_tsearch2_idx @@ to_tsquery(‘default’, ‘term1 &
term2’) )

where content_tsearch2_idx is a new column (tsearch2 requires a
dedicated column for its indexing purposes).

I suppose the translation could also look like this:

( (attachments.content_tsearch2_idx @@ to_tsquery(‘default’, ‘term1’))
and (attachments.content_tsearch2_idx @@ to_tsquery(‘default’, ‘term2’)))

but it might not be as efficient.

Thanks,
Kevin Murphy

P.S. For posterity, I’m using RT 3.4.4, PostgreSQL 8.0.4,
DBIx::SearchBuilder v1.35, Perl 5.8.7, on Mac OS X 10.4.3.

I’ve also been wondering whether it would someday be possible for for
DBIx::SearchBuilder to be modified to optionally support PostgreSQL’s de
facto official full-text search engine, tsearch2?

A query that generated SQL like this:

( (attachments.content ilike ‘%term1%’) and (attachments.content ilike
‘%term2%’) )

would wind up looking like this when using tsearch2:

( attachments.content_tsearch2_idx @@ to_tsquery(‘default’, ‘term1 &
term2’) )

where content_tsearch2_idx is a new column (tsearch2 requires a
dedicated column for its indexing purposes).

I suppose the translation could also look like this:

( (attachments.content_tsearch2_idx @@ to_tsquery(‘default’, ‘term1’))
and (attachments.content_tsearch2_idx @@ to_tsquery(‘default’, ‘term2’)))
I made a modification to SearchBuilder to use the ConText option of
Oracle to search inside attachments. It improves performance a lot. One
thing to watch out for is that in the Querybuilder you select ‘Content’
contains ‘searchstring’. SearchBuilder translates this into 'where
content=“%searchstring%”. This kills any index you have on content
obviously so I modified SearchBuilder not to do that anymore.
Search the archives for my patch or if you can’t find it I’ll post is
again and time permitting add it to the Wiki.

Joop

Joop van de Wege JoopvandeWege@mococo.nl

Kevin Murphy wrote:

I’ve noticed that content queries blindly search all
attachments, including non-text attachments.
Where would be the right spot to inject this additional
condition into the RT code?

To repeat: does anyone know the right spot in the RT code to apply a
condition to all searches of content (attachments), such that the search
would be restricted to attachments whose filename is NULL?

Thanks,
Kevin Murphy

At Tuesday 12/6/2005 09:35 AM, Kevin Murphy wrote:

Kevin Murphy wrote:

I’ve noticed that content queries blindly search all
attachments, including non-text attachments.
Where would be the right spot to inject this additional
condition into the RT code?

To repeat: does anyone know the right spot in the RT code to apply a
condition to all searches of content (attachments), such that the search
would be restricted to attachments whose filename is NULL?

Thanks,
Kevin Murphy

Kevin,

I sent a patch for this to the devel list on 11/28 - I think I copied you.
Let me know if you didn’t receive it.

Steve

Stephen Turner wrote:

At Tuesday 12/6/2005 09:35 AM, Kevin Murphy wrote:

does anyone know the right spot in the RT code to apply a condition
to all searches of content (attachments), such that the search would
be restricted to attachments whose filename is NULL?

I sent a patch for this to the devel list on 11/28 - I think I copied
you. Let me know if you didn’t receive it.

Steve,

Did that cover query builder searches also? I thought it was just for
normal ticket display.

-Kevin

At Tuesday 12/6/2005 11:13 AM, Kevin Murphy wrote:

Stephen Turner wrote:

At Tuesday 12/6/2005 09:35 AM, Kevin Murphy wrote:

does anyone know the right spot in the RT code to apply a condition to
all searches of content (attachments), such that the search would be
restricted to attachments whose filename is NULL?

I sent a patch for this to the devel list on 11/28 - I think I copied
you. Let me know if you didn’t receive it.

Steve,

Did that cover query builder searches also? I thought it was just for
normal ticket display.

-Kevin

Ah no, sorry.

I haven’t played with this, but if you look in Tickets_Overlay.pm there’s a
method called _TransLimit which seems to form the SQL clauses necessary for
searching on transaction content. This would probably be the place to add
the ‘filename is null’ condition so that it would be in effect for all
ticket searches.

Steve

Stephen Turner
Senior Programmer/Analyst - Client Support Services
MIT Information Services and Technology (IS&T)

Stephen Turner wrote:

I haven’t played with this, but if you look in Tickets_Overlay.pm
there’s a method called _TransLimit which seems to form the SQL
clauses necessary for searching on transaction content. This would
probably be the place to add the ‘filename is null’ condition so that
it would be in effect for all ticket searches.

Thanks, Steve. In lib/RT/Tickets_Overlay.pm, in function _TransLimit, I
replaced:

   $self->_SQLLimit(
                    ALIAS         => $self->{_sql_trattachalias},
                    FIELD         => $field,
                    OPERATOR      => $op,
                    VALUE         => $value,
                    CASESENSITIVE => 0,
                    ENTRYAGGREGATOR => 'AND',
                    @rest
                   );

with this:

if ($field eq 'Content') {
   $self->_SQLLimit(
                    ALIAS         => $self->{_sql_trattachalias},
                    FIELD         => 'Filename',
                    OPERATOR      => 'IS',
                    VALUE         => 'NULL',
                    SUBCLAUSE     => 'contentquery',
                    ENTRYAGGREGATOR => 'AND',
                   );
   $self->_SQLLimit(
                    ALIAS         => $self->{_sql_trattachalias},
                    FIELD         => $field,
                    OPERATOR      => $op,
                    VALUE         => $value,
                    CASESENSITIVE => 0,
                    @rest,
                    ENTRYAGGREGATOR => 'AND',
                    SUBCLAUSE     => 'contentquery',
                   );
} else {
   $self->_SQLLimit(
                    ALIAS         => $self->{_sql_trattachalias},
                    FIELD         => $field,
                    OPERATOR      => $op,
                    VALUE         => $value,
                    CASESENSITIVE => 0,
                    ENTRYAGGREGATOR => 'AND',
                    @rest
                   );
}

This change makes my content queries much faster, because it skips the
enormous binary (and text, for that matter) file attachments when
performing the ILIKE on Attachments.Content. Note that I have little
idea of what I’m doing (the DBIx::SearchBuilder documentation is not
exactly complete), but the generated SQL seems OK. I guess I should
really put these changes in local/lib/RT/Tickets_Overlay.pm?

-Kevin Murphy

Kevin, if you create patch that makes this block optional (with RT
config option) then I think it could be applied to the mainline,
because this is behaviour that the most users want.On 12/7/05, Kevin Murphy murphy@genome.chop.edu wrote:

Stephen Turner wrote:

I haven’t played with this, but if you look in Tickets_Overlay.pm
there’s a method called _TransLimit which seems to form the SQL
clauses necessary for searching on transaction content. This would
probably be the place to add the ‘filename is null’ condition so that
it would be in effect for all ticket searches.

Thanks, Steve. In lib/RT/Tickets_Overlay.pm, in function _TransLimit, I
replaced:

   $self->_SQLLimit(
                    ALIAS         => $self->{_sql_trattachalias},
                    FIELD         => $field,
                    OPERATOR      => $op,
                    VALUE         => $value,
                    CASESENSITIVE => 0,
                    ENTRYAGGREGATOR => 'AND',
                    @rest
                   );

with this:

if ($field eq 'Content') {
   $self->_SQLLimit(
                    ALIAS         => $self->{_sql_trattachalias},
                    FIELD         => 'Filename',
                    OPERATOR      => 'IS',
                    VALUE         => 'NULL',
                    SUBCLAUSE     => 'contentquery',
                    ENTRYAGGREGATOR => 'AND',
                   );
   $self->_SQLLimit(
                    ALIAS         => $self->{_sql_trattachalias},
                    FIELD         => $field,
                    OPERATOR      => $op,
                    VALUE         => $value,
                    CASESENSITIVE => 0,
                    @rest,
                    ENTRYAGGREGATOR => 'AND',
                    SUBCLAUSE     => 'contentquery',
                   );
} else {
   $self->_SQLLimit(
                    ALIAS         => $self->{_sql_trattachalias},
                    FIELD         => $field,
                    OPERATOR      => $op,
                    VALUE         => $value,
                    CASESENSITIVE => 0,
                    ENTRYAGGREGATOR => 'AND',
                    @rest
                   );
}

This change makes my content queries much faster, because it skips the
enormous binary (and text, for that matter) file attachments when
performing the ILIKE on Attachments.Content. Note that I have little
idea of what I’m doing (the DBIx::SearchBuilder documentation is not
exactly complete), but the generated SQL seems OK. I guess I should
really put these changes in local/lib/RT/Tickets_Overlay.pm?

-Kevin Murphy


The rt-users Archives

Be sure to check out the RT Wiki at http://wiki.bestpractical.com

Download a free sample chapter of RT Essentials from O’Reilly Media at http://rtbook.bestpractical.com

WE’RE COMING TO YOUR TOWN SOON - RT Training in Amsterdam, Boston and
San Francisco - Find out more at http://bestpractical.com/services/training.html

Best regards, Ruslan.

Ruslan Zakirov wrote:

Kevin, if you create patch that makes this block optional (with RT
config option) then I think it could be applied to the mainline,
because this is behaviour that the most users want.

Thanks, Steve. In lib/RT/Tickets_Overlay.pm, in function _TransLimit, I
replaced:
[…]
with this:
[…]

OK, I will give this a shot. Is there a patch creation policy for RT
(diff options, etc)? This could be mentioned on the wiki.

-Kevin Murphy