SQL sorting issues with requestor in 5.0.2

Hi all,
I have recently upgrade a production server to 5.0.2 from 4.2.x
I have moved from mysql to MariaDB.

Whenever I do a search and I sort by any fields the results is pretty snappy, the only exception is the requestor field. Now instead of the usual 1-3 seconds I wait 15-30 seconds.

I think this may be a bug in RT query.

I compared old and new DB queries and what I was able to gather:
OLD system
show full processlist on mysql during the requestor field sort shows :
SELECT main.* FROM Tickets main JOIN Groups Groups_1 ON ( Groups_1.Domain = ‘RT::Ticket-Role’ ) AND ( Groups_1.Name = ‘Requestor’ ) AND ( Groups_1.Instance = main.id ) LEFT JOIN CachedGroupMembers CachedGroupMembers_2 ON ( CachedGroupMembers_2.Disabled = ‘0’ ) AND ( CachedGroupMembers_2.GroupId != CachedGroupMembers_2.MemberId ) AND ( CachedGroupMembers_2.GroupId = Groups_1.id ) LEFT JOIN Users Users_3 ON ( Users_3.id = CachedGroupMembers_2.MemberId ) WHERE (main.IsMerged IS NULL) AND (main.Status != ‘deleted’) AND (main.Type = ‘ticket’) AND (main.Queue = ‘108’ AND ( main.Status = ‘new’ OR main.Status = ‘open’ OR main.Status = ‘review’ OR main.Status = ‘stalled’ OR main.Status = ‘crp’ ) ) GROUP BY main.id ORDER BY MIN(Users_3.EmailAddress) ASC LIMIT 50

NEW system
show full processlist on mariaDB during the requestor field sort shows :
SELECT main.* FROM Tickets main LEFT JOIN Groups Groups_2 ON ( Groups_2.Domain = ‘RT::Ticket-Role’ ) AND ( Groups_2.Name = ‘Requestor’ ) AND ( Groups_2.Instance = main.id ) JOIN Queues Queues_1 ON ( Queues_1.id = main.Queue ) LEFT JOIN CachedGroupMembers CachedGroupMembers_3 ON ( CachedGroupMembers_3.Disabled = ‘0’ ) AND ( CachedGroupMembers_3.GroupId != CachedGroupMembers_3.MemberId ) AND ( CachedGroupMembers_3.GroupId = Groups_2.id ) LEFT JOIN Groups Groups_4 ON ( Groups_4.id = CachedGroupMembers_3.MemberId ) LEFT JOIN Users Users_5 ON ( Users_5.id = CachedGroupMembers_3.MemberId ) WHERE (Groups_4.id IS NULL) AND (main.IsMerged IS NULL) AND (main.Status != ‘deleted’) AND (main.Type = ‘ticket’) AND (main.Queue = ‘124’ AND ( ( Queues_1.Lifecycle = ‘default’ AND main.Status IN (‘crp’, ‘new’, ‘open’, ‘review’, ‘stalled’) ) OR ( Queues_1.Lifecycle = ‘approvals’ AND main.Status IN (‘new’, ‘open’, ‘stalled’) ) ) ) GROUP BY main.id ORDER BY MAX(Users_5.Name) DESC LIMIT 50

That first join is prob the culprit:
FROM Tickets main JOIN Groups Groups_1
vs
FROM Tickets main LEFT JOIN Groups Groups_2

LEFT JOIN will return a bucket load more results and I can’t see why you’d have it, either.

Maybe someone knows more about this issue and has any recommendation ?

/regards
Filip

1 Like

the resulting recordset is the same, but it has to parse many more rows to get the same result in the end.

Anyone ?
No one else is experiencing the same slow sorting issue ?

/regards
Filip

this sounds remarkably like Speed rt5 after upgrade from rt 4.4.3

@gvvg @Sean_Cwiek did you end up finding anything concrete?

Basic navigation is triggering queries with execution plans that are just way off the scale:

explain SELECT DISTINCT main.* FROM Tickets main LEFT JOIN Groups Groups_1  ON ( Groups_1.Domain = 'RT::Ticket-Role' ) AND ( Groups_1.Name = 'Requestor' ) AND ( Groups_1.Instance = main.id ) LEFT JOIN CachedGroupMembers CachedGroupMembers_2  ON ( CachedGroupMembers_2.Disabled = '0' ) AND ( CachedGroupMembers_2.GroupId = Groups_1.id ) LEFT JOIN CachedGroupMembers CachedGroupMembers_3  ON ( CachedGroupMembers_3.Disabled = '0' ) AND ( CachedGroupMembers_3.GroupId = Groups_1.id )  WHERE (main.IsMerged IS NULL) AND (main.Status != 'deleted') AND (main.Type = 'ticket') AND ( ( CachedGroupMembers_2.MemberId IN ('1032992') )  or  ( CachedGroupMembers_3.MemberId IN ('0') ) )  ORDER BY main.id ASC

the first two rows say it all:

1	SIMPLE	main	ref	Status,Type	Type	67	const	126626	Using index condition; Using where; Using temporary
1	SIMPLE	Groups_1	ref	groups1,groups2,groups3,Name,Domain	Name	803	const	456660	Using where; Distinct

i.e. 57,825,029,160 rows to evaluate.

That’s a very different result set, comparing to the same operation from 4.x series. I’m not certain, but I don’t think it’s a MariaDB bug in how it’s indexing the tables - it’s actually executing a different query compared to the 4.x series. The 4.x query returns the same result set using a fraction of the memory / CPU / etc, compared with the query that 5.x is generating.

@Chris_Herrmann2 - Our speed issues were resolved with some VACUUM and ANALYZE operations on our Postgres DB after moving from 4.x to 5.x.

yeah we’re using MariaDB not postgres. Running optimize did improve things, as did tweaking memory settings for keybuffers / join buffers etc, but in short the new version of RT generates a different, hugely more expensive SQL query to return exactly the same record set as the old version.

Thanks for sharing this information. It was useful.

I’m really sorry it was quite a while ago and I can’t remember what I did. Perhaps I gave the VM more resources…
It’s quite snappy now…
I should have taken notes.

I’ve been trying to make more sense of this today (i.e. why is it using LEFT JOIN in scenarios where it doesn’t make sense). There’s a couple of other things that don’t make sense to me - for example - this only impacts users who have restricted access. A common scenario is that we give a group access to see only a single queue. For our main helpdesk team it runs OK. But for these restricted users, it renders it totally unusable, taking up to 10 minutes to login. If I grant the same user access to a wider set of permissions that the Helpdesk team have… BAM! fast again.

A few thoughts here.

Fundamentally, I think it’s because the system is incorrectly using LEFT JOIN for scenarios where it should be using an INNER JOIN. But more on that later.

I suspect it might be something to do with using FULLTEXT indexes… based upon Tickets.pm

 elsif ( $db_type eq 'mysql' and not $config->{Sphinx}) {
            my $dbh = $RT::Handle->dbh;
            $self->Limit(
                %rest,
                FUNCTION    => "MATCH($alias.Content)",
                OPERATOR    => 'AGAINST',
                VALUE       => "(". $dbh->quote($value) ." IN BOOLEAN MODE)",
                QUOTEVALUE  => 0,
            );
            # As with Oracle, above, this forces the LEFT JOINs into
            # JOINS, which allows the FULLTEXT index to be used.
            # Orthogonally, the IS NOT NULL clause also helps the
            # optimizer decide to use the index.
            $self->Limit(
                ENTRYAGGREGATOR => 'AND',
                ALIAS           => $alias,
                FIELD           => "Content",
                OPERATOR        => 'IS NOT',
                VALUE           => 'NULL',
                QUOTEVALUE      => 0,
            );

My issue is that it’s applying it in a case where you explicitly do not want a LEFT OUTER JOIN. When a user is logging in you only need an inner join to return an accurate recordset AND because it’s much faster / cheaper to execute.

For example this is the first of 2 queries that holds up the user logging in::

SELECT COUNT(DISTINCT main.id) FROM Tickets main LEFT JOIN Groups Groups_3  ON ( Groups_3.Domain = 'RT::Ticket-Role' ) AND ( Groups_3.Instance = main.id ) LEFT JOIN ObjectCustomFieldValues ObjectCustomFieldValues_1  ON ( ObjectCustomFieldValues_1.CustomField = '71' ) AND ( ObjectCustomFieldValues_1.Disabled = '0' ) AND ( ObjectCustomFieldValues_1.ObjectType = 'RT::Ticket' ) AND ( ObjectCustomFieldValues_1.ObjectId = main.id ) LEFT JOIN CachedGroupMembers CachedGroupMembers_4  ON ( CachedGroupMembers_4.Disabled = '0' ) AND ( CachedGroupMembers_4.MemberId IN ('564298', '1229834') ) AND ( CachedGroupMembers_4.GroupId = Groups_3.id ) LEFT JOIN CustomFieldValues CustomFieldValues_2  ON ( CustomFieldValues_2.Name = ObjectCustomFieldValues_1.Content ) AND ( CustomFieldValues_2.CustomField = ObjectCustomFieldValues_1.CustomField )  WHERE ( ( main.Queue IN ('3', '65') OR  ( CachedGroupMembers_4.MemberId IS NOT NULL AND Groups_3.Name = 'AdminCc' AND main.Queue IN ('2') )  OR  ( CachedGroupMembers_4.MemberId IS NOT NULL AND Groups_3.Name = 'Requestor' )  OR  ( main.Owner = '564298' AND main.Queue IN ('2') )  ) ) AND (main.IsMerged IS NULL) AND (main.Status != 'deleted') AND (main.Type = 'ticket') AND (main.Owner = '10' AND main.Status IN ('new', 'open'))

results in a query plan:

1	SIMPLE	main	range	Tickets1,Tickets2,Status,Type	Status	259		1295	Using index condition; Using where
1	SIMPLE	Groups_3	ref	groups1,groups2,groups3,Domain	groups1	259	const	521184	Using where
1	SIMPLE	ObjectCustomFieldValues_1	ref	TicketCustomFieldValues1,TicketCustomFieldValues2,ObjectCustomFieldValues2,SHREDDER_OCFV1,SHREDDER_OCFV2	TicketCustomFieldValues1	12	const,rtfet.main.id	1	Using where
1	SIMPLE	CachedGroupMembers_4	ref	DisGrouMem,CachedGroupMembers3,cachedgroupmembers1,SHREDDER_CGM1	DisGrouMem	5	rtfet.Groups_3.id	1	Using where; Using index
1	SIMPLE	CustomFieldValues_2	ref	CustomFieldValues1	CustomFieldValues1	4	rtfet.ObjectCustomFieldValues_1.CustomField	2	Using where

it should be returning a very small recordset:

1	SIMPLE	CachedGroupMembers_2	range	DisGrouMem,CachedGroupMembers3,cachedgroupmembers1,SHREDDER_CGM1	cachedgroupmembers1	5		1537	Using where; Using index
1	SIMPLE	Groups_1	eq_ref	PRIMARY,groups1,groups2,groups3,Name,Domain	PRIMARY	4	rtfet.CachedGroupMembers_2.GroupId	1	Using where
1	SIMPLE	main	eq_ref	PRIMARY,Tickets1,Tickets2,Status,Type	PRIMARY	4	rtfet.Groups_1.Instance	1	Using where

the TL;DR there is that version #1 is returning ~439,000X as many rows as are required to return the result. And it hurts.

I’ve also been down the CachedGroupMembers rabbit hole, running shrink-cgm-table but this hasn’t made any appreciable difference.

I’m happy to have someone hit me with a clue bat…

OK, fixed… (and hopefully someone can explain why this behaviour results?)

If you head to:
Modify Global Group Rights
Pick UserGroupOfInterest#1
GeneralRights
Assign “View Ticket Summaries” aka ShowTicket

it then changes the execution time from many, many minutes to a fraction of a second.

I’m still busy doing some testing to make sure that this doesn’t accidentally grant other access that I don’t want, but so far it looks good.

The Groups in question already had this same right on the queue(s) that I actually want them to be able to access.