'strange' performance bottlenecks

Hi.

We have a large RT system, consisting of nearly 400000 tickets.
Our set up is as follows:

We have 2 machines:

ticket1 (sun solaris 9)
apache 1.3.27
rt345
mysql4 client

ticket2 (sun X4100 solaris 10)
mysql5

The RT instance on ticket1 is talking successfully to ticket2 and
producing mostly very good speeds.
However we have a ‘strange problem’ occurring.
Every so often, twice a day, the mysql instance (on ticket2) maxes out
at 99% cpu use.
Then RT starts running very slowly.
This bottleneck can occur unpredictably and last for up to an hour.

I am currently trying to track down this problem but not having a lot of
success.
When the congestion occurs the mysql processlist shows around 10 similar
queries such as this one:
/~~~~~~~~~~~~~~~~~~~~~~~~~~~
SELECT COUNT(DISTINCT main.id) FROM Users main , Principals
Principals_1, CachedGroupMembers CachedGroupMembers_2, Groups Groups_3,
ACL ACL_4 WHERE ((ACL_4.PrincipalType = Groups_3.Type)) AND
((ACL_4.RightName = ‘OwnTicket’)) AND ((CachedGroupMembers_2.MemberId =
Principals_1.id)) AND ((Groups_3.id = CachedGroupMembers_2.GroupId)) AND
((Principals_1.PrincipalType = ‘User’)) AND ((Principals_1.id != ‘1’))
AND ((main.id = Principals_1.id)) AND ((ACL_4.ObjectType = ‘RT::Ticket’
AND ACL_4.ObjectId = 113141) OR (ACL_4.ObjectType = ‘RT::Queue’ AND
ACL_4.ObjectId = 5) OR (ACL_4.ObjectType = ‘RT::System’)) AND
((Groups_3.Domain = ‘RT::Ticket-Role’ AND Groups_3.Instance = ‘113141’)
OR (Groups_3.Domain = ‘RT::Queue-Role’ AND Groups_3.Instance = ‘5’) OR
(Groups_3.Domain = ‘RT::System-Role’));
___________________________

Each of these queries can take up to 100 or more seconds to complete
during the bottleneck.
If I run the query at ‘normal’ times, it returns almost immediately.
It seems that the bulk of the these queries together, stifles RT and
causes it to grind to a very slow pace. And maybe there is an underlying
query that could cause it.

My theory at present is that somewhere in the organisation someone is
doing a task that includes a specific query or group of queries that
causes RT to flounder.

I am running rt in debug mode, but have no clues as yet as to what is
causing the problem.

Any help would be appreciated.
Kind regards.

Luke

You wouldn’t find anything special in RT logs. It’s problem with MySQL
query optimizer/planner. Could you run EXPLAIN SELECT
from mysql shell several times and look if plan changes between.

Also you need to run:
ANALYZE TABLE ACL, CachedGroupMembers, GroupMembers, Groups, Principals, Users;
From MySQL shell to update server statistics for RT tables.On 4/11/06, Luke Vanderfluit lvanderf@internode.com.au wrote:

Hi.

We have a large RT system, consisting of nearly 400000 tickets.
Our set up is as follows:

We have 2 machines:

ticket1 (sun solaris 9)

apache 1.3.27
rt345
mysql4 client

ticket2 (sun X4100 solaris 10)

mysql5

The RT instance on ticket1 is talking successfully to ticket2 and
producing mostly very good speeds.
However we have a ‘strange problem’ occurring.
Every so often, twice a day, the mysql instance (on ticket2) maxes out
at 99% cpu use.
Then RT starts running very slowly.
This bottleneck can occur unpredictably and last for up to an hour.

I am currently trying to track down this problem but not having a lot of
success.
When the congestion occurs the mysql processlist shows around 10 similar
queries such as this one:
/~~~~~~~~~~~~~~~~~~~~~~~~~~~
SELECT COUNT(DISTINCT main.id) FROM Users main , Principals
Principals_1, CachedGroupMembers CachedGroupMembers_2, Groups Groups_3,
ACL ACL_4 WHERE ((ACL_4.PrincipalType = Groups_3.Type)) AND
((ACL_4.RightName = ‘OwnTicket’)) AND ((CachedGroupMembers_2.MemberId =
Principals_1.id)) AND ((Groups_3.id = CachedGroupMembers_2.GroupId)) AND
((Principals_1.PrincipalType = ‘User’)) AND ((Principals_1.id != ‘1’))
AND ((main.id = Principals_1.id)) AND ((ACL_4.ObjectType = ‘RT::Ticket’
AND ACL_4.ObjectId = 113141) OR (ACL_4.ObjectType = ‘RT::Queue’ AND
ACL_4.ObjectId = 5) OR (ACL_4.ObjectType = ‘RT::System’)) AND
((Groups_3.Domain = ‘RT::Ticket-Role’ AND Groups_3.Instance = ‘113141’)
OR (Groups_3.Domain = ‘RT::Queue-Role’ AND Groups_3.Instance = ‘5’) OR
(Groups_3.Domain = ‘RT::System-Role’));
___________________________

Each of these queries can take up to 100 or more seconds to complete
during the bottleneck.
If I run the query at ‘normal’ times, it returns almost immediately.
It seems that the bulk of the these queries together, stifles RT and
causes it to grind to a very slow pace. And maybe there is an underlying
query that could cause it.

My theory at present is that somewhere in the organisation someone is
doing a task that includes a specific query or group of queries that
causes RT to flounder.

I am running rt in debug mode, but have no clues as yet as to what is
causing the problem.

Any help would be appreciated.
Kind regards.


Luke


The rt-users Archives

Community help: http://wiki.bestpractical.com
Commercial support: sales@bestpractical.com

Discover RT’s hidden secrets with RT Essentials from O’Reilly Media.
Buy a copy at http://rtbook.bestpractical.com

We’re hiring! Come hack Perl for Best Practical: Careers — Best Practical Solutions

Best regards, Ruslan.