*very* slow 'OR' searches

Cast: RT 3.4.1, perl 5.8.3, DBIx::SearchBuilder 1.22, MySQL 4.1.10a.
Setting: a test database with 50 tickets

Searching for “Subject LIKE ‘kerberos’” takes a few hundreths of a
second; same for “Content LIKE ‘slip’”. Searching for “Subject LIKE
’kerberos’ OR Content LIKE ‘slip’” takes 6 seconds (hundreds of times
longer than the component queries); with a realistic number of slips
it takes effectively forever (hours). The query produced is:

SELECT DISTINCT main.*
FROM Tickets main, Transactions Transactions_1, Attachments Attachments_2
WHERE
((Transactions_1.ObjectType = ‘RT::Ticket’)) AND
((main.EffectiveId = main.id)) AND
((main.Status != ‘deleted’)) AND
((main.Type = ‘ticket’)) AND
((main.Subject LIKE ‘%kerberos%’) OR
((Attachments_2.Content LIKE ‘%slip%’) AND
(Attachments_2.TransactionId = Transactions_1.id) AND
(main.id = Transactions_1.ObjectId) ) )
ORDER BY main.id ASC LIMIT 50;

Feeding that into “explain” says, in part (trimmed to 72 columns):

| table | type | possible_keys | key | rows |
| main | ALL | PRIMARY | NULL | 51 |
| Transactions_1 | ref | PRIMARY,Transactions1 | Transactions1 | 210 |
| Attachments_2 | ALL | Attachments2 | NULL | 91 |

which is not good. For the equivalent query:

SELECT DISTINCT main.* FROM Tickets main
JOIN Transactions Transactions_1 ON (main.id = Transactions_1.ObjectId)
JOIN Attachments Attachments_2
ON (Attachments_2.TransactionId = Transactions_1.id)
WHERE
(Transactions_1.ObjectType = ‘RT::Ticket’) AND
(main.EffectiveId = main.id) AND
(main.Status != ‘deleted’) AND
(main.Type = ‘ticket’) AND
((main.Subject LIKE ‘%kerberos%’) OR
(Attachments_2.Content LIKE ‘%slip%’))
ORDER BY main.id ASC LIMIT 50;

explain is happier:

| table | type | possible_keys | key | rows |
| main | ALL | PRIMARY | NULL | 51 |
| Transactions_1 | ref | PRIMARY,Transactions1 | Transactions1 | 2 |
| Attachments_2 | ref | Attachments2 | Attachments2 | 1 |

and the query runs in a few hundreths of a second.

I find it somewhat worrisome that it is so easy to stumble across a
simple search that takes essentially forever. Presumably this happens
because neither the query builder nor the MySQL “optimizer” notices
that the first term of the query doesn’t depend on Attachments or
Transactions, and so misses the opportunity to move the restrictions
from the second term into JOIN clauses. In general this is probably a
hard problem to solve, but couldn’t the query builder do better on the
simple “apples OR oranges” search?

I find it somewhat worrisome that it is so easy to stumble across a
simple search that takes essentially forever. Presumably this happens
because neither the query builder nor the MySQL “optimizer” notices

MySQL doesn’t have an optimizer, at all. Basically it takes a best guess
based on query term ordering and available indices. It’s reliable and
predictable but the programmer (or in this cas search builder) has to do
all the real work.

I find it somewhat worrisome that it is so easy to stumble across a
simple search that takes essentially forever. Presumably this happens
because neither the query builder nor the MySQL “optimizer” notices

MySQL doesn’t have an optimizer, at all. Basically it takes a best guess
based on query term ordering and available indices. It’s reliable and
predictable but the programmer (or in this cas search builder) has to do
all the real work.

Sounds like another good reason to use PgSQL instead. :slight_smile:

Cheers,
– jra
Jay R. Ashworth jra@baylink.com
Designer Baylink RFC 2100
Ashworth & Associates The Things I Think '87 e24
St Petersburg FL USA http://baylink.pitas.com +1 727 647 1274

  If you can read this... thank a system administrator.  Or two.  --me

“Jay R. Ashworth” jra@baylink.com writes:

I find it somewhat worrisome that it is so easy to stumble across a
simple search that takes essentially forever. Presumably this happens
because neither the query builder nor the MySQL “optimizer” notices

MySQL doesn’t have an optimizer, at all. […]

I must need scarier “scare quotes” :wink:

Sounds like another good reason to use PgSQL instead. :slight_smile:

PostgreSQL 8.0.1 tries harder, but sadly doesn’t actually do any
better. Here’s the skeleton of the query plan for the query produced
by SearchBuilder for “Subject like ‘foo’ OR Content like ‘bar’”:

Limit (cost=750.26…750.33 rows=1 width=172)
-> Unique (cost=750.26…750.33 rows=1 width=172)
-> Sort (cost=750.26…750.27 rows=1 width=172)
-> Nested Loop (cost=11.88…750.25 rows=1 width=172)
-> Seq Scan on transactions transactions_1
(cost=0.00…48.40 rows=211 width=8)
-> Materialize (cost=11.88…12.97 rows=109 width=335)
-> Nested Loop
(cost=0.00…11.77 rows=109 width=335)
-> Seq Scan on tickets main
(cost=0.00…5.59 rows=1 width=172)
-> Seq Scan on attachments attachments_2
(cost=0.00…5.09 rows=109 width=163)

and the same for my rewrite:

Limit (cost=14.47…14.53 rows=1 width=172)
-> Unique (cost=14.47…14.53 rows=1 width=172)
-> Sort (cost=14.47…14.47 rows=1 width=172)
-> Nested Loop (cost=0.00…14.46 rows=1 width=172)
-> Nested Loop (cost=0.00…11.43 rows=1 width=176)
-> Seq Scan on tickets main
(cost=0.00…5.59 rows=1 width=172)
-> Index Scan using transactions1
(cost=0.00…5.82 rows=1 width=8)
-> Index Scan using attachments2
(cost=0.00…3.01 rows=1 width=163)

These were produced after a “vacuum analyze;” and running the queries
(PostgreSQL had an incorrectly optimistic estimate for the first query
until I vacuumed, after which it got more realistic). I haven’t done
any PostgreSQL performance tuning, so I won’t make any quantitative
comparisons between the DBs, but the same qualitative relationship
holds–the query produced by SearchBuilder is orders of magnitude
slower than the rewrite.

I found a post to the MySQL MaxDB mailing list from someone running RT
on MaxDB, complaining about the performance of ORs, with a response
from someone at SAP that this is a hard problem, and MaxDB currently
does not optimize ORs–that saved me from trying MaxDB. I could
try Oracle, but it’s probably easier to tell our users not to OR
dissimilar types.

-dan

I found a post to the MySQL MaxDB mailing list from someone running RT
on MaxDB, complaining about the performance of ORs, with a response
from someone at SAP that this is a hard problem, and MaxDB currently
does not optimize ORs–that saved me from trying MaxDB. I could
try Oracle, but it’s probably easier to tell our users not to OR
dissimilar types.

Rather than trying other databases, It makes more sense to spend a bit
of time on RT::Tickets_Overlay.pm’s logic :wink: Patches are most certainly
welcome.

“Jay R. Ashworth” jra@baylink.com writes:

I find it somewhat worrisome that it is so easy to stumble across a
simple search that takes essentially forever. Presumably this happens
because neither the query builder nor the MySQL “optimizer” notices

MySQL doesn’t have an optimizer, at all. […]

I must need scarier “scare quotes” :wink:

Careful; that’s not me saying that there; in fact, you didn’t keep
any of what I said. Whatever it was. :slight_smile:

Sounds like another good reason to use PgSQL instead. :slight_smile:

[ I will read everything before writing anything. I will read
everything… ]

PostgreSQL 8.0.1 tries harder, but sadly doesn’t actually do any
better. Here’s the skeleton of the query plan for the query produced
by SearchBuilder for “Subject like ‘foo’ OR Content like ‘bar’”:

Limit (cost=750.26…750.33 rows=1 width=172)
-> Unique (cost=750.26…750.33 rows=1 width=172)
-> Sort (cost=750.26…750.27 rows=1 width=172)
-> Nested Loop (cost=11.88…750.25 rows=1 width=172)
-> Seq Scan on transactions transactions_1
(cost=0.00…48.40 rows=211 width=8)
-> Materialize (cost=11.88…12.97 rows=109 width=335)
-> Nested Loop
(cost=0.00…11.77 rows=109 width=335)
-> Seq Scan on tickets main
(cost=0.00…5.59 rows=1 width=172)
-> Seq Scan on attachments attachments_2
(cost=0.00…5.09 rows=109 width=163)

and the same for my rewrite:

Limit (cost=14.47…14.53 rows=1 width=172)
-> Unique (cost=14.47…14.53 rows=1 width=172)
-> Sort (cost=14.47…14.47 rows=1 width=172)
-> Nested Loop (cost=0.00…14.46 rows=1 width=172)
-> Nested Loop (cost=0.00…11.43 rows=1 width=176)
-> Seq Scan on tickets main
(cost=0.00…5.59 rows=1 width=172)
-> Index Scan using transactions1
(cost=0.00…5.82 rows=1 width=8)
-> Index Scan using attachments2
(cost=0.00…3.01 rows=1 width=163)

These were produced after a “vacuum analyze;” and running the queries
(PostgreSQL had an incorrectly optimistic estimate for the first query
until I vacuumed, after which it got more realistic). I haven’t done
any PostgreSQL performance tuning, so I won’t make any quantitative
comparisons between the DBs, but the same qualitative relationship
holds–the query produced by SearchBuilder is orders of magnitude
slower than the rewrite.

I found a post to the MySQL MaxDB mailing list from someone running RT
on MaxDB, complaining about the performance of ORs, with a response
from someone at SAP that this is a hard problem, and MaxDB currently
does not optimize ORs–that saved me from trying MaxDB. I could
try Oracle, but it’s probably easier to tell our users not to OR
dissimilar types.

Yeah… but these are all ad-hoc queries, right? The built in stuff
has been optimized?

Cheers,
– jra
Jay R. Ashworth jra@baylink.com
Designer Baylink RFC 2100
Ashworth & Associates The Things I Think '87 e24
St Petersburg FL USA http://baylink.pitas.com +1 727 647 1274

  If you can read this... thank a system administrator.  Or two.  --me