RT SQL 'lockup' (mysql->99% cpu)

Two large postings to introduce myself to the list :stuck_out_tongue:
I noticed elsewhere on the maillists users watching their mysql server
lock up with various queries going on

(see below for horrible mysqladmin processlist example)

We eventually found we could replicate this by searching for requestor
email address containing (anything) and then another ‘contains’ search to
the search query produced is a good dozen lines long according to mysql
query log (i spent too long thinking this was a mysql error). Anyway the
query takes over 10 minutes to respond :stuck_out_tongue: during which time all the other
query threads lock, and RT appears to have ‘died’.

Around this time our users hit refresh a ton and hence totally drive
everything into the ground (sigh), forcing us to just restart the mysqld
a lot. The users then dont like the various error pages the
server generates, and resent even more that ‘back’ in IE doesn’t restore
their posted form data (mozilla does tho…)

My thinking at the moment is not to let my users near the search feature
in RT as its far too advanced and complicated for them, I appreciate by
its nature you can create horrendously complicated searches (which is
nice) however at the moment they are locking the db server up about 20
times a day.

would it be possible to timelimit the sql queries, or have a more simple
search page that just takes a ‘word’ and searches for it in emails,
usernames, ticket subjects etc (the search in the top right seems to do
subjects and ticket numbers only?)

Iain

| Id | User | Host | db | Command | Time | State |
Info
|
| 1 | rt_user | localhost | rt3 | Query | 130 | Locked |
SELECT DISTINCT main.* FROM Tickets main WHERE ((main.EffectiveId =
main.id)) AND ((main.Type = 'ti |
| 2 | rt_user | localhost | rt3 | Query | 25 | User lock |
SELECT GET_LOCK(‘Apache-Session-d1d4573829558297ddf09545e082d3a9’, 3600)
|
| 3 | rt_user | localhost | rt3 | Query | 618 | Copying to tmp table |
SELECT DISTINCT main.* FROM Tickets main, Groups Groups_1, Principals
Principals_2, CachedGroupMembe |
| 4 | rt_user | localhost | rt3 | Query | 81 | Locked |
SELECT * FROM Tickets WHERE id = ‘573’
|
| 5 | rt_user | localhost | rt3 | Query | 44 | Locked |
SELECT DISTINCT main.* FROM Tickets main WHERE ((main.EffectiveId =
main.id)) AND ((main.Type = 'ti |
| 6 | rt_user | localhost | rt3 | Query | 110 | User lock |
SELECT GET_LOCK(‘Apache-Session-d1d4573829558297ddf09545e082d3a9’, 3600)
|
| 7 | rt_user | localhost | rt3 | Query | 131 | Locked |
UPDATE Tickets SET LastUpdated=‘2003-07-10 09:20:38’ WHERE id=‘812’
|
| 8 | rt_user | localhost | rt3 | Query | 120 | User lock |
SELECT GET_LOCK(‘Apache-Session-d1d4573829558297ddf09545e082d3a9’, 3600)
|
| 11 | root | localhost | | Sleep | 231 | |
|
| 12 | rt_user | localhost | rt3 | Query | 70 | Locked |
SELECT DISTINCT main.* FROM Tickets main WHERE ((main.EffectiveId =
main.id)) AND ((main.Type = 'ti |
| 13 | rt_user | localhost | rt3 | Query | 63 | Locked |
SELECT DISTINCT main.* FROM Tickets main WHERE ((main.EffectiveId =
main.id)) AND ((main.Type = 'ti |

We eventually found we could replicate this by searching for requestor
email address containing (anything) and then another ‘contains’ search to
the search query produced is a good dozen lines long according to mysql
query log (i spent too long thinking this was a mysql error). Anyway the
query takes over 10 minutes to respond :stuck_out_tongue: during which time all the other
query threads lock, and RT appears to have ‘died’.

There was a post to, iirc, rt-users, yesterday discussing this
performance issue with a recommended index to add to the database. That
may fix it for you.

would it be possible to timelimit the sql queries, or have a more simple
search page that just takes a ‘word’ and searches for it in emails,
usernames, ticket subjects etc (the search in the top right seems to do
subjects and ticket numbers only?)

Anything is possible. The magic search box at the top right does a
whole bunch of different things, depending on context.

http://www.bestpractical.com/rt – Trouble Ticketing. Free.

This one time, at band camp, Jesse Vincent wrote:

Anything is possible. The magic search box at the top right does a
whole bunch of different things, depending on context.

Have you recently tested the magic search box behind mod_fastcgi? When
searching for a ticket number (i.e. entering only digits into the field) I
consitently get a:

Moved.
The server encountered an internal error or misconfiguration and was unable
to copmlete your request.

The server then proceeds to return 500 Internal Server Error for every
attempt at a page reload until apache is restarted.

It only happens with numerical input – text searches work as expected.

As I recall, this started happening after I switched from the world of pain
that was mod_perl2 to mod_fastcgi.

jaq@spacepants.org http://spacepants.org/jaq.gpg

This one time, at band camp, Jesse Vincent wrote:

Anything is possible. The magic search box at the top right does a
whole bunch of different things, depending on context.

Have you recently tested the magic search box behind mod_fastcgi? When
searching for a ticket number (i.e. entering only digits into the field) I
consitently get a:

Moved.
The server encountered an internal error or misconfiguration and was unable
to copmlete your request.

Are you seeing anything in your httpd, system or database error logs?

The server then proceeds to return 500 Internal Server Error for every
attempt at a page reload until apache is restarted.

It only happens with numerical input – text searches work as expected.

As I recall, this started happening after I switched from the world of pain
that was mod_perl2 to mod_fastcgi.

Glad you saw the light.


jaq@spacepants.org http://spacepants.org/jaq.gpg


rt-devel mailing list
rt-devel@lists.fsck.com
http://lists.fsck.com/mailman/listinfo/rt-devel

http://www.bestpractical.com/rt – Trouble Ticketing. Free.

This one time, at band camp, Jesse Vincent wrote:

Are you seeing anything in your httpd, system or database error logs?

Only the following in the httpd error log:

[error] [client 10.108.62.189] FastCGI: comm with server
"/usr/bin/mason_handler.fcgi" aborted: error parsing headers: malformed
header ‘Location="/Ticket/Display.html?id=1155"’, referrer:
https://rt.engineroom.anchor.net.au/
[error] [client 10.108.62.189] FastCGI: comm with server
"/usr/bin/mason_handler.fcgi" aborted: error parsing headers: malformed
header ‘Pragma=“no-cache”’, referrer:
https://rt.engineroom.anchor.net.au/Ticket/Update.html

The first occurs after searching for ticket 1155 in the magic search box,
and the server returns the Moved, the second occurs after trying to go back to the front page (the one with the MyTickets elements) and the server returns 500.

jaq@spacepants.org http://spacepants.org/jaq.gpg

We eventually found we could replicate this by searching for requestor
email address containing (anything) and then another ‘contains’ search to
the search query produced is a good dozen lines long according to mysql
query log (i spent too long thinking this was a mysql error). Anyway the
query takes over 10 minutes to respond :stuck_out_tongue: during which time all the other
query threads lock, and RT appears to have ‘died’.

There was a post to, iirc, rt-users, yesterday discussing this
performance issue with a recommended index to add to the database. That
may fix it for you.

Had a look around for this and found an index on cachedgroupmembers was
recommended. Have done a lot of investigation into this, (thanks to mysql
explain command phew), and while that index does help, its not enough for
our setup.

I performed a lot of tests on a query that takes 9256 seconds on our RT
database (2.5 hours :P) and eventually came up with these two:

create index user_MemberIdGroupId on CachedGroupMembers(MemberId,GroupId);

create index user_PrincipalType on Principals(PrincipalType);

Neither one alone speeds the query up faster than one minute (which is all
i tested when ensuring one index wasn’t doing all the work) and separate
indexes on memberid and groupid produced no extra speed increase.

However these two indexes together sped the query up to 12 seconds, about
770 times faster for me.

I’ve spent about 2 weeks tracking this down now :slight_smile: I originally thought it
was a bug in mysql, or something related to the dual processor machine
we’d moved to for the ‘release’ version.

Some stats on the size of our DB for reference:
mysql> select count() from CachedGroupMembers;
| 12772 |
mysql> select count(
) from Groups;
| 5264 |
mysql> select count() from Users;
| 527 |
mysql> select count(
) from Tickets;
| 1170 |
mysql> select count(*) from Principals;
| 5793 |

Our RT is version 3.0.2 at this time.

Iain

create index user_MemberIdGroupId on
CachedGroupMembers(MemberId,GroupId);

create index user_PrincipalType on Principals(PrincipalType);

However these two indexes together sped the query up to 12 seconds,
about

770 times faster for me.

Jesus, this really speeds things up.

I had RT time out on me with 200 OPEN Tickets in a queue by just clicking
on
the link to the Queue via “Quick Search” (i had set fcgi timeout to 300).

Now, when i click, it displays the 200 open tickets in around 7 seconds.
Could still be faster, but is much better as the previous timeout :wink:

Hehe glad its useful… Unfortunately no rest for me tho, my user base has
now taken to searching for 3 simultaneous requestors emails :frowning:
Presumably if i could speed that search up (no chance) they’d then break it
with four, so i guess i’ll have to find another approach such as bodging the
RT search page, or maybe remove some peoples limbs :stuck_out_tongue:
Has anyone created a simple search page for RT that doesn’t give users the
power to do stupid things, or has patched the existing page, save me digging
into that ?

Iain

PS: I think the issue is the users dont understand the search criteria dont
reset when you go back to the search page… maybe have two buttons,
‘search’ and ‘refine search’? dunno, i can see this issue in two lights - a)
stupid user, who cares, rt does what it does, or b) users can completely
halt RT functionality which is a disaster - completely opposing views,
Jesse, which approach do you take for RT - user or admin tool?

PPS: Imagine if we gave our non IT-staff users access :stuck_out_tongue:

PS: I think the issue is the users dont understand the search criteria dont
reset when you go back to the search page… maybe have two buttons,
‘search’ and ‘refine search’?

It used to be that way. It confused users too much.

dunno, i can see this issue in two lights - a)
stupid user, who cares, rt does what it does, or b) users can completely
halt RT functionality which is a disaster - completely opposing views,
Jesse, which approach do you take for RT - user or admin tool?

I train my users. If your users have a legitimate need for searching
for tickets with a combination of multiple requestor addresses, it’s
probably worth looking at the queries RT’s generating and seeing what
can be done indexwise to increase the performance.

-jesse

http://www.bestpractical.com/rt – Trouble Ticketing. Free.

PS: I think the issue is the users dont understand the search criteria
dont

reset when you go back to the search page… maybe have two buttons,
‘search’ and ‘refine search’?

It used to be that way. It confused users too much.

Just cant win eh :slight_smile:

dunno, i can see this issue in two lights - a)

stupid user, who cares, rt does what it does, or b) users can completely
halt RT functionality which is a disaster - completely opposing views,
Jesse, which approach do you take for RT - user or admin tool?

I train my users. If your users have a legitimate need for searching
for tickets with a combination of multiple requestor addresses, it’s
probably worth looking at the queries RT’s generating and seeing what
can be done indexwise to increase the performance.

Heh i dont think its intentional. Many of the searches are for the same
address repeatedly, otherwise when an initial search fails to find a result
they add another requestors email - which works, its an OR clause and all
that, but while 1 requestor email is semi-instant and 2 is slow, 3 is dead
(3 sepearate queried would work better heh)

We’ve tried educating the users, and are continuing to do so, i dont expect
to see this 3rd layer of the problem often, but its still there and has
already reared its head.

I need a user proof solution to this problem, and attempting to fix the
users generally doesn’t work in my experience. Also theres no easy way to
tie the search query to the user that performed it - in fact i HOPE you
can work it out by reading back through the sql thread that made the fatal
query, reading the last used session id and tracking that back to a session
set and reading the email address from the binary object shoved in the DB -
is there a better way than this? Will this method work? :stuck_out_tongue:

Iain

dunno, i can see this issue in two lights - a)
stupid user, who cares, rt does what it does, or b) users can completely
halt RT functionality which is a disaster - completely opposing views,
Jesse, which approach do you take for RT - user or admin tool?

I train my users. If your users have a legitimate need for searching
for tickets with a combination of multiple requestor addresses, it’s
probably worth looking at the queries RT’s generating and seeing what
can be done indexwise to increase the performance.

The answer is not user-training, surely it is to fix the query builder. It is
perfectly reasonable to assume a database can handle such a simple request and
if it can’t handle this, what other requests will it choke on?

I have found that a search for two requestor e-mail addresses will take a
reasonable amount of time when it is first issued, but exponentially more the
next time and so on … After adding the two indexes suggested by Iain the
problematic search returned in about 5 seconds (instead of the ten minutes I
had seen previously) so I was hopeful of it being fixed. But hitting reload
caused it to take about a minute. Hitting reload again … we are now at 17
minutes of CPU time on the mysqld and no sign of it finishing yet.

This is with RT 3.0.2, MySQL 3.23.55 Apache 1.3.28 and mod_perl 1.28

Ian Grant, Computer Lab., William Gates Building, JJ Thomson Ave., Cambridge
Phone: +44 1223 334420

dunno, i can see this issue in two lights - a)
stupid user, who cares, rt does what it does, or b) users can completely
halt RT functionality which is a disaster - completely opposing views,
Jesse, which approach do you take for RT - user or admin tool?

I train my users. If your users have a legitimate need for searching
for tickets with a combination of multiple requestor addresses, it’s
probably worth looking at the queries RT’s generating and seeing what
can be done indexwise to increase the performance.

The answer is not user-training, surely it is to fix the query builder. It is
perfectly reasonable to assume a database can handle such a simple request and
if it can’t handle this, what other requests will it choke on?

In this case, yes, there is an optimization we’re looking at for this
case that should dramatically improve simple searches on requestor
addresses , but it’s always going to be possible for users to string together
stupidly broken queries.

I have found that a search for two requestor e-mail addresses will take a
reasonable amount of time when it is first issued, but exponentially more the
next time and so on … After adding the two indexes suggested by Iain the
problematic search returned in about 5 seconds (instead of the ten minutes I
had seen previously) so I was hopeful of it being fixed. But hitting reload
caused it to take about a minute. Hitting reload again … we are now at 17
minutes of CPU time on the mysqld and no sign of it finishing yet.

This is with RT 3.0.2, MySQL 3.23.55 Apache 1.3.28 and mod_perl 1.28

First up, move up to mysql 4.0.13 or newer. really. It’s internal query
processor is WORLDS better. It won’t cure cancer, but it should
definitely help performance. Especially if you turn on query caching.


Ian Grant, Computer Lab., William Gates Building, JJ Thomson Ave., Cambridge
Phone: +44 1223 334420

http://www.bestpractical.com/rt – Trouble Ticketing. Free.

This is with RT 3.0.2, MySQL 3.23.55 Apache 1.3.28 and mod_perl 1.28

First up, move up to mysql 4.0.13 or newer. really. It’s internal query
processor is WORLDS better. It won’t cure cancer, but it should
definitely help performance. Especially if you turn on query caching.

First thing we did. Didn’t seem to improve much, although i’ve not tried
(or heard of to date :wink: query caching, although effectiveness depends whats
cached i guess, the users rarely execute these queries more than once cos i
have to go kill mysql connections there after and get annoyed to see the
query a second, third even fourth time turn up in the process list. I guess
the query has to complete once too?

I dont mean to doubt your views or knowledge on mysql etc but given how
little use mysql4 over mysql3 is to me (an uncompletable query still is…)
should i spend a day looking at query caching and performance (remember this
is works time not mine :wink: or are you just suggesting any ideas you can think
of rather than knowing that this will solve my issue.

I’ve become a firm believer the only solution is to better the generated
query or prevent users generating this particular search, since optimising
mysql seems to lead nowhere (except to even more impossible queries) and
having a user base that is 100% competent all the time with every entry
seems a pipe dream… You should see some of the e-mail addresses/typos they
type into RT (omg!), and i get to sort the bounces :S

Iain

This is with RT 3.0.2, MySQL 3.23.55 Apache 1.3.28 and mod_perl 1.28

First up, move up to mysql 4.0.13 or newer. really. It’s internal query
processor is WORLDS better. It won’t cure cancer, but it should
definitely help performance. Especially if you turn on query caching.

I dont mean to doubt your views or knowledge on mysql etc but given how
little use mysql4 over mysql3 is to me (an uncompletable query still is…)
should i spend a day looking at query caching and performance (remember this
is works time not mine :wink: or are you just suggesting any ideas you can think
of rather than knowing that this will solve my issue.

It’s not a magic bullet that will magically make all bogus queries
perform in negligable time, but its query processor has proven to be
much more advanced and to NOT DIE in situations where Mysql 3.23 curls
up into a little ball and refuses to deal with the world. I can’t say
whether you should spend a day of your work time on it. That would
depend on how much of a problem this issue is for your organization.

I’ve become a firm believer the only solution is to better the generated
query or prevent users generating this particular search, since optimising
mysql seems to lead nowhere (except to even more impossible queries) and
having a user base that is 100% competent all the time with every entry
seems a pipe dream… You should see some of the e-mail addresses/typos they
type into RT (omg!), and i get to sort the bounces :S

Great. I don’t know when we’re going to get to this query builderchange internally.
If it’s an issue that’s important to you, as it appears to be, I’d love to
see a patch.

Jesse

Iain

http://www.bestpractical.com/rt – Trouble Ticketing. Free.

This is with RT 3.0.2, MySQL 3.23.55 Apache 1.3.28 and mod_perl 1.28

First up, move up to mysql 4.0.13 or newer. really. It’s internal
query

processor is WORLDS better. It won’t cure cancer, but it should
definitely help performance. Especially if you turn on query caching.

I dont mean to doubt your views or knowledge on mysql etc but given how
little use mysql4 over mysql3 is to me (an uncompletable query still
is…)

should i spend a day looking at query caching and performance (remember
this

is works time not mine :wink: or are you just suggesting any ideas you can
think

of rather than knowing that this will solve my issue.

It’s not a magic bullet that will magically make all bogus queries
perform in negligable time, but its query processor has proven to be
much more advanced and to NOT DIE in situations where Mysql 3.23 curls
up into a little ball and refuses to deal with the world. I can’t say
whether you should spend a day of your work time on it. That would
depend on how much of a problem this issue is for your organization.

Just fyi, already using 4.0.13, first thing we changed to try fix the issue.
Made no noticable difference (except to make my mysql rpm something i have
to manually look after rather than rely on up2date hehe). Well in all
fairness it might have made some difference, but on a 6 hours query anything
short of several thousand times faster isn’t going to get spotted unless I
spend days profiling this (or create a smaller testing RT DB).

Anyway I have to work on other things, strings to decode and process, ya
know, usual stuff, so unless my user base continues to knacker RT (and i’ll
probably be the last to know now i’ve explained mysqladmin kill :P) I dont
think i’ll be investigating this further, I may get around to patchign the
search page to remove multiple-requestors.

The only true solution would apppear to be to play with the query generating
logic somehow, but its a 10 line query that comes out and i’ve not even
touched that part of RT yet (more a back-end coder - mainly because i’m
moving away from using perl to script everything and unifying it all into
one java application rather than several dozen disparate systems -
integration with other languages and systems is important for my job at the
moment).

Is this issue specific to Mysql? I appreciate a vastly superior query parser
on the server side should be able to (in theory) make queries so good theres
no point trying to optimise them but thats clearly not happening here - i
could write a sequential analyser of the tables to find ‘multiple
requestors’ quicker than this query does (a LOT quicker, like within a
couple of seconds, the ideal web app response rate rather than 12, or even
thousands)? anyone running RT on pgsql/oracle/db2/whatever else there is
(oh yeah ms sql lol) that could try this search at a non-crucial time for
their install and see if it responds in less than 10 minutes? (nb will lock
up your RT totally during the query).

Another approach (probably worse than the current situation) - is it
necessary to lock tables during a query? is this mysql’s doing or rt’s
doing? its this that causes all other RT’s to die eventually - they all wait
for some lock that this query holds, killing the query releases the other
threads… I appreciate the issues of referential integrity, but defensive
coding could avoid these problems since its a search not a write… Although
predicting my users, they would simply end up sending the same query 10
times until the server is just bogged down running a dozen several-hour
queries (which would then presumably take even longer)…

Oh last thing… when it locks up for the several hours mysql is ‘copying
data to tmp table’… I dont know /exactly/ what its doing (i.e. compiling
what data in to where) but i guess the join just makes something that is
tremendously large and takes a while to generate (??) - is this going to
chew gigabytes of ram/swap/disk store eventually? If its nasty-join related
then presumably this gets exponentially worse as the users/tickets increase
(users i’m thinking will hit it worse if i understand the back end DB
structure)

I’ll retreat slightly from this list now, i’ve posted many long essays in my
brief time, hope some of them are useful :slight_smile: i might be back, so dont get to
comfy evil laugh

Summary: dont search for more than one requestors email in one search from
the RTsearch page because the query occupies (at least) mysql for hours
(thousand+ tickets, few hundred users)…

Thanks for your help,
Iain

Just fyi, already using (mysql) 4.0.13, first thing we changed to try fix
the issue.
Made no noticable difference (except to make my mysql rpm something i have
to manually look after rather than rely on up2date hehe). Well in all
fairness it might have made some difference, but on a 6 hours query
anything
short of several thousand times faster isn’t going to get spotted unless I
spend days profiling this (or create a smaller testing RT DB).

More info - i ported our database to mssql for fun (!) (well as an
educational experience)… it suffered no problems with these kinds of
searches, it could even do a 4 requestor search reasonably quick… since i
hadn’t actually set up RT to use mssql i had trouble generating test queries
(hitting dont search on the search page still seems to generate a count()
type request for something that locks my mysql up eventually).

Iain