Content searching takes a long time and runs multiple queries

Justin_Hayes1 · December 7, 2010, 5:31pm

Guys,

Searching for ticket content takes forever. I’ve done a bit of digging and for a single search in one of my queues over the last year, RT spawned 3 separate queries.

2 are counts (which appear to be identical), and 1 gets the actual content.

Is there anyway round this? Losing loads of time just to get counts seems rather counter-productive? The final select was actually pretty quick.

I’ve added the queries below.

Many thanks,

Justin

Time: 101207 17:24:09

User@Host: rt_support[rt_support] @ localhost []

Query_time: 57.722237 Lock_time: 0.000183 Rows_sent: 1 Rows_examined: 122794

SET timestamp=1291742649;
SELECT COUNT(DISTINCT main.id) FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id);

Time: 101207 17:24:38

User@Host: rt_support[rt_support] @ localhost []

Query_time: 28.780620 Lock_time: 0.000510 Rows_sent: 1 Rows_examined: 122794

SET timestamp=1291742678;
SELECT COUNT(DISTINCT main.id) FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id);

Time: 101207 17:24:42

User@Host: rt_support[rt_support] @ localhost []

Query_time: 4.492875 Lock_time: 0.000175 Rows_sent: 50 Rows_examined: 100799

SET timestamp=1291742682;
SELECT DISTINCT main.* FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id) ORDER BY main.id ASC LIMIT 50;

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.com

Kenneth_Marshall · December 7, 2010, 5:40pm

You need to use a DB backend that supports fulltext indexing for
content searchs to be fast. The actual query that you stated runs
quickly, is only for the first 50 tickets. I do agree that running
the same count() query twice for the same search is sub-optimal. I
do not see how you could avoid the count query totally if you are
paginating the results.

Cheers,
KenOn Tue, Dec 07, 2010 at 05:31:17PM +0000, Justin Hayes wrote:

Guys,

Searching for ticket content takes forever. I’ve done a bit of digging and for a single search in one of my queues over the last year, RT spawned 3 separate queries.

2 are counts (which appear to be identical), and 1 gets the actual content.

Is there anyway round this? Losing loads of time just to get counts seems rather counter-productive? The final select was actually pretty quick.

I’ve added the queries below.

Many thanks,

Justin

Time: 101207 17:24:09

User@Host: rt_support[rt_support] @ localhost

Query_time: 57.722237 Lock_time: 0.000183 Rows_sent: 1 Rows_examined: 122794

SET timestamp=1291742649;
SELECT COUNT(DISTINCT main.id) FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id);

Time: 101207 17:24:38

User@Host: rt_support[rt_support] @ localhost

Query_time: 28.780620 Lock_time: 0.000510 Rows_sent: 1 Rows_examined: 122794

SET timestamp=1291742678;
SELECT COUNT(DISTINCT main.id) FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id);

Time: 101207 17:24:42

User@Host: rt_support[rt_support] @ localhost

Query_time: 4.492875 Lock_time: 0.000175 Rows_sent: 50 Rows_examined: 100799

SET timestamp=1291742682;
SELECT DISTINCT main.* FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id) ORDER BY main.id ASC LIMIT 50;

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.com

Justin_Hayes1 · December 7, 2010, 5:49pm

Hi Ken,

I was just thinking the same about the counts - it has to do that for pagination. Though I guess it could have been written to run 1 query for all the data, and just display the first 50 etc.

Which DB backend would work faster?

Thanks,

Justin

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.comOn 7 Dec 2010, at 17:40, Kenneth Marshall wrote:

You need to use a DB backend that supports fulltext indexing for
content searchs to be fast. The actual query that you stated runs
quickly, is only for the first 50 tickets. I do agree that running
the same count() query twice for the same search is sub-optimal. I
do not see how you could avoid the count query totally if you are
paginating the results.

Cheers,
Ken

On Tue, Dec 07, 2010 at 05:31:17PM +0000, Justin Hayes wrote:

Guys,

Searching for ticket content takes forever. I’ve done a bit of digging and for a single search in one of my queues over the last year, RT spawned 3 separate queries.

2 are counts (which appear to be identical), and 1 gets the actual content.

Is there anyway round this? Losing loads of time just to get counts seems rather counter-productive? The final select was actually pretty quick.

I’ve added the queries below.

Many thanks,

Justin

Time: 101207 17:24:09

User@Host: rt_support[rt_support] @ localhost

Query_time: 57.722237 Lock_time: 0.000183 Rows_sent: 1 Rows_examined: 122794

SET timestamp=1291742649;
SELECT COUNT(DISTINCT main.id) FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id);

Time: 101207 17:24:38

User@Host: rt_support[rt_support] @ localhost

Query_time: 28.780620 Lock_time: 0.000510 Rows_sent: 1 Rows_examined: 122794

SET timestamp=1291742678;
SELECT COUNT(DISTINCT main.id) FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id);

Time: 101207 17:24:42

User@Host: rt_support[rt_support] @ localhost

Query_time: 4.492875 Lock_time: 0.000175 Rows_sent: 50 Rows_examined: 100799

SET timestamp=1291742682;
SELECT DISTINCT main.* FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id) ORDER BY main.id ASC LIMIT 50;

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.com

Kenneth_Marshall · December 7, 2010, 7:40pm

Hi Justin,

In the wiki, there are fulltext index modifications for Oracle and
PostgreSQL. I based the PostgreSQL version on the Oracle version
and we use it here. It works very well indeed. It looks like the
pre version rt-4, a.k.a rt-3.9.6 support Oracle and PostgreSQL
using their fulltext support and MySQL using sphinx, pretty cool.

Regards,
KenOn Tue, Dec 07, 2010 at 05:49:12PM +0000, Justin Hayes wrote:

Hi Ken,

I was just thinking the same about the counts - it has to do that for pagination. Though I guess it could have been written to run 1 query for all the data, and just display the first 50 etc.

Which DB backend would work faster?

Thanks,

Justin

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.com

On 7 Dec 2010, at 17:40, Kenneth Marshall wrote:

You need to use a DB backend that supports fulltext indexing for
content searchs to be fast. The actual query that you stated runs
quickly, is only for the first 50 tickets. I do agree that running
the same count() query twice for the same search is sub-optimal. I
do not see how you could avoid the count query totally if you are
paginating the results.

Cheers,
Ken

On Tue, Dec 07, 2010 at 05:31:17PM +0000, Justin Hayes wrote:

Guys,

Searching for ticket content takes forever. I’ve done a bit of digging and for a single search in one of my queues over the last year, RT spawned 3 separate queries.

2 are counts (which appear to be identical), and 1 gets the actual content.

Is there anyway round this? Losing loads of time just to get counts seems rather counter-productive? The final select was actually pretty quick.

I’ve added the queries below.

Many thanks,

Justin

Time: 101207 17:24:09

User@Host: rt_support[rt_support] @ localhost

Query_time: 57.722237 Lock_time: 0.000183 Rows_sent: 1 Rows_examined: 122794

SET timestamp=1291742649;
SELECT COUNT(DISTINCT main.id) FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id);

Time: 101207 17:24:38

User@Host: rt_support[rt_support] @ localhost

Query_time: 28.780620 Lock_time: 0.000510 Rows_sent: 1 Rows_examined: 122794

SET timestamp=1291742678;
SELECT COUNT(DISTINCT main.id) FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id);

Time: 101207 17:24:42

User@Host: rt_support[rt_support] @ localhost

Query_time: 4.492875 Lock_time: 0.000175 Rows_sent: 50 Rows_examined: 100799

SET timestamp=1291742682;
SELECT DISTINCT main.* FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id) ORDER BY main.id ASC LIMIT 50;

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.com

Justin_Hayes1 · December 8, 2010, 9:55am

Hmm we were looking at sphinx. Would you suggest plugging that in rather than migrating to postgres (which we’re also familiar with)?

Justin

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.comOn 7 Dec 2010, at 19:40, Kenneth Marshall wrote:

Hi Justin,

In the wiki, there are fulltext index modifications for Oracle and
PostgreSQL. I based the PostgreSQL version on the Oracle version
and we use it here. It works very well indeed. It looks like the
pre version rt-4, a.k.a rt-3.9.6 support Oracle and PostgreSQL
using their fulltext support and MySQL using sphinx, pretty cool.

Regards,
Ken

On Tue, Dec 07, 2010 at 05:49:12PM +0000, Justin Hayes wrote:

Hi Ken,

I was just thinking the same about the counts - it has to do that for pagination. Though I guess it could have been written to run 1 query for all the data, and just display the first 50 etc.

Which DB backend would work faster?

Thanks,

Justin

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.com

On 7 Dec 2010, at 17:40, Kenneth Marshall wrote:

You need to use a DB backend that supports fulltext indexing for
content searchs to be fast. The actual query that you stated runs
quickly, is only for the first 50 tickets. I do agree that running
the same count() query twice for the same search is sub-optimal. I
do not see how you could avoid the count query totally if you are
paginating the results.

Cheers,
Ken

On Tue, Dec 07, 2010 at 05:31:17PM +0000, Justin Hayes wrote:

Guys,

Searching for ticket content takes forever. I’ve done a bit of digging and for a single search in one of my queues over the last year, RT spawned 3 separate queries.

2 are counts (which appear to be identical), and 1 gets the actual content.

Is there anyway round this? Losing loads of time just to get counts seems rather counter-productive? The final select was actually pretty quick.

I’ve added the queries below.

Many thanks,

Justin

Time: 101207 17:24:09

User@Host: rt_support[rt_support] @ localhost

Query_time: 57.722237 Lock_time: 0.000183 Rows_sent: 1 Rows_examined: 122794

SET timestamp=1291742649;
SELECT COUNT(DISTINCT main.id) FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id);

Time: 101207 17:24:38

User@Host: rt_support[rt_support] @ localhost

Query_time: 28.780620 Lock_time: 0.000510 Rows_sent: 1 Rows_examined: 122794

SET timestamp=1291742678;
SELECT COUNT(DISTINCT main.id) FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id);

Time: 101207 17:24:42

User@Host: rt_support[rt_support] @ localhost

Query_time: 4.492875 Lock_time: 0.000175 Rows_sent: 50 Rows_examined: 100799

SET timestamp=1291742682;
SELECT DISTINCT main.* FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id) ORDER BY main.id ASC LIMIT 50;

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.com

Kenneth_Marshall · December 8, 2010, 1:54pm

Given that you are familiar with PostgreSQL already, I would use
it because the current versions of RT support the fulltext indexing
already and you have fewer moving pieces. If you are already running
the 4.9.x series, then you could certainly test the sphinx integration.

Cheers,
KenOn Wed, Dec 08, 2010 at 09:55:06AM +0000, Justin Hayes wrote:

Hmm we were looking at sphinx. Would you suggest plugging that in rather than migrating to postgres (which we’re also familiar with)?

Justin

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.com

On 7 Dec 2010, at 19:40, Kenneth Marshall wrote:

Hi Justin,

In the wiki, there are fulltext index modifications for Oracle and
PostgreSQL. I based the PostgreSQL version on the Oracle version
and we use it here. It works very well indeed. It looks like the
pre version rt-4, a.k.a rt-3.9.6 support Oracle and PostgreSQL
using their fulltext support and MySQL using sphinx, pretty cool.

Regards,
Ken

On Tue, Dec 07, 2010 at 05:49:12PM +0000, Justin Hayes wrote:

Hi Ken,

I was just thinking the same about the counts - it has to do that for pagination. Though I guess it could have been written to run 1 query for all the data, and just display the first 50 etc.

Which DB backend would work faster?

Thanks,

Justin

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.com

On 7 Dec 2010, at 17:40, Kenneth Marshall wrote:

You need to use a DB backend that supports fulltext indexing for
content searchs to be fast. The actual query that you stated runs
quickly, is only for the first 50 tickets. I do agree that running
the same count() query twice for the same search is sub-optimal. I
do not see how you could avoid the count query totally if you are
paginating the results.

Cheers,
Ken

On Tue, Dec 07, 2010 at 05:31:17PM +0000, Justin Hayes wrote:

Guys,

Searching for ticket content takes forever. I’ve done a bit of digging and for a single search in one of my queues over the last year, RT spawned 3 separate queries.

2 are counts (which appear to be identical), and 1 gets the actual content.

Is there anyway round this? Losing loads of time just to get counts seems rather counter-productive? The final select was actually pretty quick.

I’ve added the queries below.

Many thanks,

Justin

Time: 101207 17:24:09

User@Host: rt_support[rt_support] @ localhost

Query_time: 57.722237 Lock_time: 0.000183 Rows_sent: 1 Rows_examined: 122794

SET timestamp=1291742649;
SELECT COUNT(DISTINCT main.id) FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id);

Time: 101207 17:24:38

User@Host: rt_support[rt_support] @ localhost

Query_time: 28.780620 Lock_time: 0.000510 Rows_sent: 1 Rows_examined: 122794

SET timestamp=1291742678;
SELECT COUNT(DISTINCT main.id) FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id);

Time: 101207 17:24:42

User@Host: rt_support[rt_support] @ localhost

Query_time: 4.492875 Lock_time: 0.000175 Rows_sent: 50 Rows_examined: 100799

SET timestamp=1291742682;
SELECT DISTINCT main.* FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id) ORDER BY main.id ASC LIMIT 50;

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.com

Justin_Hayes1 · December 8, 2010, 4:31pm

I hadn’t actually realised it was possible to get hold of 4.9, though if that’s beta I guess I’d want to wait.

Postgres might be the way to go then. Will have to look into how difficult a migration would be.

http://requesttracker.wikia.com/wiki/PostgreSQLFullText

Is that the relevant wiki page for getting the searching to run fast Ken?

Thanks,

Justin

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.comOn 8 Dec 2010, at 13:54, Kenneth Marshall wrote:

Given that you are familiar with PostgreSQL already, I would use
it because the current versions of RT support the fulltext indexing
already and you have fewer moving pieces. If you are already running
the 4.9.x series, then you could certainly test the sphinx integration.

Cheers,
Ken

On Wed, Dec 08, 2010 at 09:55:06AM +0000, Justin Hayes wrote:

Hmm we were looking at sphinx. Would you suggest plugging that in rather than migrating to postgres (which we’re also familiar with)?

Justin

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.com

On 7 Dec 2010, at 19:40, Kenneth Marshall wrote:

Hi Justin,

In the wiki, there are fulltext index modifications for Oracle and
PostgreSQL. I based the PostgreSQL version on the Oracle version
and we use it here. It works very well indeed. It looks like the
pre version rt-4, a.k.a rt-3.9.6 support Oracle and PostgreSQL
using their fulltext support and MySQL using sphinx, pretty cool.

Regards,
Ken

On Tue, Dec 07, 2010 at 05:49:12PM +0000, Justin Hayes wrote:

Hi Ken,

I was just thinking the same about the counts - it has to do that for pagination. Though I guess it could have been written to run 1 query for all the data, and just display the first 50 etc.

Which DB backend would work faster?

Thanks,

Justin

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.com

On 7 Dec 2010, at 17:40, Kenneth Marshall wrote:

You need to use a DB backend that supports fulltext indexing for
content searchs to be fast. The actual query that you stated runs
quickly, is only for the first 50 tickets. I do agree that running
the same count() query twice for the same search is sub-optimal. I
do not see how you could avoid the count query totally if you are
paginating the results.

Cheers,
Ken

On Tue, Dec 07, 2010 at 05:31:17PM +0000, Justin Hayes wrote:

Guys,

Searching for ticket content takes forever. I’ve done a bit of digging and for a single search in one of my queues over the last year, RT spawned 3 separate queries.

2 are counts (which appear to be identical), and 1 gets the actual content.

Is there anyway round this? Losing loads of time just to get counts seems rather counter-productive? The final select was actually pretty quick.

I’ve added the queries below.

Many thanks,

Justin

Time: 101207 17:24:09

User@Host: rt_support[rt_support] @ localhost

Query_time: 57.722237 Lock_time: 0.000183 Rows_sent: 1 Rows_examined: 122794

SET timestamp=1291742649;
SELECT COUNT(DISTINCT main.id) FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id);

Time: 101207 17:24:38

User@Host: rt_support[rt_support] @ localhost

Query_time: 28.780620 Lock_time: 0.000510 Rows_sent: 1 Rows_examined: 122794

SET timestamp=1291742678;
SELECT COUNT(DISTINCT main.id) FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id);

Time: 101207 17:24:42

User@Host: rt_support[rt_support] @ localhost

Query_time: 4.492875 Lock_time: 0.000175 Rows_sent: 50 Rows_examined: 100799

SET timestamp=1291742682;
SELECT DISTINCT main.* FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) WHERE (Transactions_1.ObjectType = ‘RT::Ticket’) AND (main.Status != ‘deleted’) AND (main.Created > ‘2010-01-01 00:00:00’ AND main.Queue = ‘4’ AND Attachments_2.Content LIKE ‘%testing%’) AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id) ORDER BY main.id ASC LIMIT 50;

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.com

Jesse_Vincent · December 8, 2010, 4:33pm

I hadn’t actually realised it was possible to get hold of 4.9, though if that’s beta I guess I’d want to wait.

3.9. And we’re aiming for a release candidate by 12/25. You can
download test builds from
http://download.bestpractical.com/pub/rt/devel/

Best,
Jesse

Justin_Hayes1 · December 8, 2010, 4:44pm

I was about to say that should have been 3.9

Been a while since I checked out bestpractical.com. Just saw the post about the dev release, and looks like things are going well which is great news.

The only downside is the time needed to migrate to postgres, and then the time it’s going to take to port all my custom code into the new version.

But that’s my problem, and one I have every time I upgrade, so can’t complain!

Looking forward to a production-ready version…

Justin

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.comOn 8 Dec 2010, at 16:33, Jesse Vincent wrote:

On Wed 8.Dec’10 at 16:31:11 +0000, Justin Hayes wrote:

I hadn’t actually realised it was possible to get hold of 4.9, though if that’s beta I guess I’d want to wait.

3.9. And we’re aiming for a release candidate by 12/25. You can
download test builds from
Index of /pub/rt/devel

Best,
Jesse

Jesse_Vincent · December 8, 2010, 4:48pm

I was about to say that should have been 3.9

Been a while since I checked out bestpractical.com. Just saw the post about the dev release, and looks like things are going well which is great news.

The only downside is the time needed to migrate to postgres,

Or you could try the sphinx-based fts for mysql that’s baked in

Justin_Hayes1 · December 10, 2010, 8:26am

Any view on which is faster Jesse (postgres or mysql/sphinx)?

Also how much faster than the old Content search are we talking? Orders of magnitude, or just ‘faster’?

Thanks again,

Justin

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.comOn 8 Dec 2010, at 16:48, Jesse Vincent wrote:

On Wed 8.Dec’10 at 16:44:57 +0000, Justin Hayes wrote:

I was about to say that should have been 3.9

Been a while since I checked out bestpractical.com. Just saw the post about the dev release, and looks like things are going well which is great news.

The only downside is the time needed to migrate to postgres,

Or you could try the sphinx-based fts for mysql that’s baked in

and then the time it’s going to take to port all my custom code into the new version.

But that’s my problem, and one I have every time I upgrade, so can’t complain!

Jesse_Vincent · December 10, 2010, 11:39am

Any view on which is faster Jesse (postgres or mysql/sphinx)?

I haven’t yet benchmarked em.

Also how much faster than the old Content search are we talking? Orders
of magnitude, or just ‘faster’?

Having played with one of the implementations, it’s night and day.

Kenneth_Marshall · December 10, 2010, 2:46pm

Any view on which is faster Jesse (postgres or mysql/sphinx)?

Also how much faster than the old Content search are we talking? Orders of magnitude, or just ‘faster’?

Thanks again,

Justin

Hi Justin,

I have not benchmarked the MySQL/Sphinx relative to the PostgreSQL
implementation but the few I have found using a search showed their
performance to be pretty comparable. I have not had a chance to look
at how RT integrates with Sphinx, but when I wanted to test it with
a PostgreSQL database, it did not keep its indexes up realtime, but
required a periodic job to run for updates and periodic reindexes.
The PostgreSQL fulltext indexes are kept in sync at all times. This
helps avoid the “the ticket is there but the search did not find it”
syndrome. As far as the speed with the fulltext indexing versus no
fulltext indexing, a sample search that I ran while testing took
20 minutes for the table scan versus a couple of seconds for the
search with the index support, 600X faster. Of course, the bigger
win is that your I/O system is not tapped out while you are
searching in the content and you database scales much, much more
gracefully.

Cheers,
Ken

Joop · December 10, 2010, 2:49pm

Jesse Vincent wrote:

Any view on which is faster Jesse (postgres or mysql/sphinx)?

I haven’t yet benchmarked em.

Also how much faster than the old Content search are we talking? Orders
of magnitude, or just ‘faster’?

Having played with one of the implementations, it’s night and day.
That is, IMHO, if you don’t prefix queries for content with % which will
kill any and all indices on your table.
I have removed the pre/post pending of % from Searchbuilder because of
this. I rather have that people complain they can’t find something then
that they kill the database. Now they know that if they want search for
content that they need to add a % at the end and if they really need it
at the front but they also know that they are gonna wait for it.

Regards,

Joop

Justin_Hayes1 · December 11, 2010, 9:35am

Thanks Ken. I certainly prefer the idea of using inbuilt DB server functionality to improve performance than an external tool hooked in.

I’ll probably look into a PostgreSQL migration first and see where that takes me. I hope it’s just a case of creating the schema by running a 3.8.8 install against it, dumping the data and loading it in

Justin

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.comOn 10 Dec 2010, at 14:46, Kenneth Marshall wrote:

On Fri, Dec 10, 2010 at 08:26:58AM +0000, Justin Hayes wrote:

Any view on which is faster Jesse (postgres or mysql/sphinx)?

Also how much faster than the old Content search are we talking? Orders of magnitude, or just ‘faster’?

Thanks again,

Justin

Hi Justin,

I have not benchmarked the MySQL/Sphinx relative to the PostgreSQL
implementation but the few I have found using a search showed their
performance to be pretty comparable. I have not had a chance to look
at how RT integrates with Sphinx, but when I wanted to test it with
a PostgreSQL database, it did not keep its indexes up realtime, but
required a periodic job to run for updates and periodic reindexes.
The PostgreSQL fulltext indexes are kept in sync at all times. This
helps avoid the “the ticket is there but the search did not find it”
syndrome. As far as the speed with the fulltext indexing versus no
fulltext indexing, a sample search that I ran while testing took
20 minutes for the table scan versus a couple of seconds for the
search with the index support, 600X faster. Of course, the bigger
win is that your I/O system is not tapped out while you are
searching in the content and you database scales much, much more
gracefully.

Cheers,
Ken

Justin Hayes
OpenBet Support Manager
justin.hayes@openbet.com

On 8 Dec 2010, at 16:48, Jesse Vincent wrote:

On Wed 8.Dec’10 at 16:44:57 +0000, Justin Hayes wrote:

I was about to say that should have been 3.9

Been a while since I checked out bestpractical.com. Just saw the post about the dev release, and looks like things are going well which is great news.

The only downside is the time needed to migrate to postgres,

Or you could try the sphinx-based fts for mysql that’s baked in

and then the time it’s going to take to port all my custom code into the new version.

But that’s my problem, and one I have every time I upgrade, so can’t complain!

Kenneth_Marshall · December 11, 2010, 3:19pm

Thanks Ken. I certainly prefer the idea of using inbuilt DB server functionality to improve performance than an external tool hooked in.

I’ll probably look into a PostgreSQL migration first and see where that takes me. I hope it’s just a case of creating the schema by running a 3.8.8 install against it, dumping the data and loading it in

Justin

The migration should be pretty straightforward. There is a wiki page on it:

http://requesttracker.wikia.com/wiki/MySQLToPg

There are a number of methods that people have used for the migration. I would
probably use the dump/restore method and then create a second version using the
connect to both DBs at once and transfer the data. Then I would compare the
resulting DBs. They should be the same.

Ken