Sphinx weirdness

Hi.

rt 4.0.10 + mysql 5.6.22 + sphinx 2.2.7

rt uses such query:

mysql> SELECT COUNT(DISTINCT main.id) FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectType = ‘RT::Ticket’ ) AND ( Transactions_1.ObjectId = main.id ) LEFT JOIN Attachments Attachments_2
ON ( Attachments_2.TransactionId = Transactions_1.id ) LEFT JOIN AttachmentsIndex AttachmentsIndex_3 ON ( AttachmentsIndex_3.id = Attachments_2.id ) WHERE (main.IsMerged IS NULL) AND (main.Status != ‘deleted’) AND
(main.Type = ‘ticket’) AND ( ( ( main.Subject LIKE ‘%inconventus%’ OR ( AttachmentsIndex_3.query = ‘inconventus’ ) ) ) AND ( main.Status = ‘new’ OR main.Status = ‘open’ OR main.Status = ‘stalled’ ) );
| COUNT(DISTINCT main.id) |
| 0 |
1 row in set (0.16 sec)

mysql> EXPLAIN SELECT COUNT(DISTINCT main.id) FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectType = ‘RT::Ticket’ ) AND ( Transactions_1.ObjectId = main.id ) LEFT JOIN Attachments
Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) LEFT JOIN AttachmentsIndex AttachmentsIndex_3 ON ( AttachmentsIndex_3.id = Attachments_2.id ) WHERE (main.IsMerged IS NULL) AND (main.Status !=
‘deleted’) AND (main.Type = ‘ticket’) AND ( ( ( main.Subject LIKE ‘%inconventus%’ OR ( AttachmentsIndex_3.query = ‘inconventus’ ) ) ) AND ( main.Status = ‘new’ OR main.Status = ‘open’ OR main.Status = ‘stalled’ ) );
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | main | range | PRIMARY,tickets_status | tickets_status | 195 | NULL | 2228 | Using index condition; Using where |
| 1 | SIMPLE | Transactions_1 | ref | Transactions1 | Transactions1 | 70 | const,rt3.main.id | 1 | Using where; Using index |
| 1 | SIMPLE | Attachments_2 | ref | Attachments2 | Attachments2 | 4 | rt3.Transactions_1.id | 4 | Using index |
| 1 | SIMPLE | AttachmentsIndex_3 | ALL | NULL | NULL | NULL | NULL | 20 | Using where; Using join buffer (Block Nested Loop) |
4 rows in set (0.00 sec)

and as you see above it finds nothing BUT if I remove
“main.Subject LIKE ‘%inconventus%’ OR” from query it finds one ticket, why? :

mysql> SELECT COUNT(DISTINCT main.id) FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectType = ‘RT::Ticket’ ) AND ( Transactions_1.ObjectId = main.id ) LEFT JOIN Attachments Attachments_2
ON ( Attachments_2.TransactionId = Transactions_1.id ) LEFT JOIN AttachmentsIndex AttachmentsIndex_3 ON ( AttachmentsIndex_3.id = Attachments_2.id ) WHERE (main.IsMerged IS NULL) AND (main.Status != ‘deleted’) AND
(main.Type = ‘ticket’) AND ( ( ( ( AttachmentsIndex_3.query = ‘inconventus’ ) ) ) AND ( main.Status = ‘new’ OR main.Status = ‘open’ OR main.Status = ‘stalled’ ) );
| COUNT(DISTINCT main.id) |
| 1 |
1 row in set (0.01 sec)

mysql> EXPLAIN SELECT COUNT(DISTINCT main.id) FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectType = ‘RT::Ticket’ ) AND ( Transactions_1.ObjectId = main.id ) LEFT JOIN Attachments
Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) LEFT JOIN AttachmentsIndex AttachmentsIndex_3 ON ( AttachmentsIndex_3.id = Attachments_2.id ) WHERE (main.IsMerged IS NULL) AND (main.Status !=
‘deleted’) AND (main.Type = ‘ticket’) AND ( ( ( ( AttachmentsIndex_3.query = ‘inconventus’ ) ) ) AND ( main.Status = ‘new’ OR main.Status = ‘open’ OR main.Status = ‘stalled’ ) );
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | AttachmentsIndex_3 | ref | query | query | 767 | const | 3 | Using where with pushed condition |
| 1 | SIMPLE | Attachments_2 | eq_ref | PRIMARY,Attachments2 | PRIMARY | 4 | rt3.AttachmentsIndex_3.id | 1 | Using where |
| 1 | SIMPLE | Transactions_1 | eq_ref | PRIMARY,Transactions1 | PRIMARY | 4 | rt3.Attachments_2.TransactionId | 1 | Using where |
| 1 | SIMPLE | main | eq_ref | PRIMARY,tickets_status | PRIMARY | 4 | rt3.Transactions_1.ObjectId | 1 | Using where |
4 rows in set (0.00 sec)

Any idea what is going on? Additional subject searching shouldn’t cause such problem
since it’s ORed with the rest of that part of query.

Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )

mysql> EXPLAIN SELECT COUNT(DISTINCT main.id) FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectType = ‘RT::Ticket’ ) AND ( Transactions_1.ObjectId = main.id ) LEFT JOIN Attachments
Attachments_2 ON ( Attachments_2.TransactionId = Transactions_1.id ) LEFT JOIN AttachmentsIndex AttachmentsIndex_3 ON ( AttachmentsIndex_3.id = Attachments_2.id ) WHERE (main.IsMerged IS NULL) AND (main.Status !=
‘deleted’) AND (main.Type = ‘ticket’) AND ( ( ( main.Subject LIKE ‘%inconventus%’ OR ( AttachmentsIndex_3.query = ‘inconventus’ ) ) ) AND ( main.Status = ‘new’ OR main.Status = ‘open’ OR main.Status = ‘stalled’ ) );
±—±------------±-------------------±------±-----------------------±---------------±--------±----------------------±-----±---------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
±—±------------±-------------------±------±-----------------------±---------------±--------±----------------------±-----±---------------------------------------------------+
| 1 | SIMPLE | main | range | PRIMARY,tickets_status | tickets_status | 195 | NULL | 2228 | Using index condition; Using where |
| 1 | SIMPLE | Transactions_1 | ref | Transactions1 | Transactions1 | 70 | const,rt3.main.id | 1 | Using where; Using index |
| 1 | SIMPLE | Attachments_2 | ref | Attachments2 | Attachments2 | 4 | rt3.Transactions_1.id | 4 | Using index |
| 1 | SIMPLE | AttachmentsIndex_3 | ALL | NULL | NULL | NULL | NULL | 20 | Using where; Using join buffer (Block Nested Loop) |
±—±------------±-------------------±------±-----------------------±---------------±--------±----------------------±-----±---------------------------------------------------+
4 rows in set (0.00 sec)

Ok, mysql is too smart! For sphinx to work mysql needs to first query
AttachmentsIndex_3 and then make joins to it. Otherwise sphinx won’t work.
That’s due to sphinx architecture.

Here mysql is too smart and differently optimizes query thus breaking
sphinx support in rt.

Now I’ve tried FORCE INDEX and such but wasn’t able to force mysql to first query
AttachmentsIndex_3.

There is STRAIGHT_JOIN that forces joins orders, so maybe that is some solution.

Other ideas?
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )

Ok, mysql is too smart! For sphinx to work mysql needs to first query
AttachmentsIndex_3 and then make joins to it. Otherwise sphinx won’t work.
That’s due to sphinx architecture.

Here mysql is too smart and differently optimizes query thus breaking
sphinx support in rt.

Now I’ve tried FORCE INDEX and such but wasn’t able to force mysql to first
query AttachmentsIndex_3.

There is STRAIGHT_JOIN that forces joins orders, so maybe that is some
solution.

STRAIGHT_JOIN also won’t work since mysql still is able to make changes
and optimizations to the query.

Fortunately code below seems to be working - using UNION and separate,
simple sphinx subquery:

SELECT DISTINCT main.id FROM Tickets main JOIN Transactions Transactions_1 ON ( Transactions_1.ObjectType = ‘RT::Ticket’ ) AND ( Transactions_1.ObjectId = main.id ) JOIN Attachments Attachments_2 ON (
Attachments_2.TransactionId = Transactions_1.id ) JOIN ((SELECT ai.id FROM AttachmentsIndex ai WHERE ai.query = ‘inconventus’) UNION (SELECT ti.id FROM Tickets ti WHERE ti.Subject LIKE ‘%inconventus%’)) u ON
u.id=Attachments_2.id WHERE (main.IsMerged IS NULL) AND (main.Status != ‘deleted’) AND (main.Type = ‘ticket’) AND ( main.Status = ‘new’ OR main.Status = ‘open’ OR main.Status = ‘stalled’ );

Devs, could you please change querying code, so that sphinx will
always get its own subquery?

That will always work because mysql query optimizer will not be able
to mess with sphinx query. Optimizer will be able only to optimize
combining sphinx results with the rest of query though and that’s ok and desired.

Sphinx subquery will always be simple, like
SELECT ai.id FROM AttachmentsIndex ai WHERE ai.query = ‘inconventus’

Thanks,
Arkadiusz Miśkiewicz, arekm / ( maven.pl | pld-linux.org )

Ok, mysql is too smart! For sphinx to work mysql needs to first
query AttachmentsIndex_3 and then make joins to it. Otherwise
sphinx won’t work. That’s due to sphinx architecture.

Here mysql is too smart and differently optimizes query thus
breaking sphinx support in rt.

Now I’ve tried FORCE INDEX and such but wasn’t able to force mysql
to first query AttachmentsIndex_3.

There is STRAIGHT_JOIN that forces joins orders, so maybe that is
some solution.

This limitation is unfortunately documented, and not easily fixable:
https://bestpractical.com/docs/rt/latest/full_text_indexing.html#Caveats1

STRAIGHT_JOIN also won’t work since mysql still is able to make
changes and optimizations to the query.

Fortunately code below seems to be working - using UNION and separate,
simple sphinx subquery:
[snip]

That query is incorrect; it unions Attachment ids (from the Sphinx
results table) with Ticket ids (from the “ti” subquery on Tickets).

Devs, could you please change querying code, so that sphinx will
always get its own subquery?

Please try the straight MySQL FTS, included in 4.2.10, instead. It is
much faster, and not nearly as fiddly.

  • Alex