Severe performance problems suddenly in RT 3.8.2

We migrated our RT server to RT 3.8.2 a week ago, and all went well
until today, when suddenly we’ve been hit by a massive performance
problem. It seems that almost any query which touches the Attachments
table takes an absolute age to complete. For example, I’ve just
logged into the MySQL database itself directly, and done:

select count(*) from Attachments;

and I’ve been waiting for a couple of minutes, with nothing coming back.

Is there anything I ought to be looking at? The database server is
MySQL 5.0.32

Tim

The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

We migrated our RT server to RT 3.8.2 a week ago, and all went well
until today,

From what version?

when suddenly we’ve been hit by a massive performance
problem. It seems that almost any query which touches the Attachments
table takes an absolute age to complete.

Anything in mysql’s error logs?

What does EXPLAIN SELECT COUNT(id) FROM Attachments say?

Is there anything I ought to be looking at? The database server is
MySQL 5.0.32

MySQL versions > 4.1 and < 5.0.45 are known to have a crippling query
optimizer bug that WILL hurt you on RT, but I’m not sure this is that.

Tim’s just got into his car to go home, but I can provide some
answers…

We migrated our RT server to RT 3.8.2 a week ago, and all went well
until today,
From what version?

3.4.2

And just to be sure, you ran the commands in UPGRADING.mysql?

Anything in mysql’s error logs?

No, only the messages in the slow log, e.g:

Query_time: 240 Lock_time: 0 Rows_sent: 4 Rows_examined: 27

SELECT DISTINCT main.Id AS id, main.Filename AS filename,
main.ContentType AS contenttype, main.Headers AS headers, main.Subject
AS subject, main.Parent AS parent, main.ContentEncoding AS
contentencoding, main.ContentType AS contenttype, main.TransactionId AS
transactionid, main.Created AS created FROM Attachments main JOIN
Transactions Transactions_1 ON ( Transactions_1.id = main.TransactionId
) JOIN Tickets Tickets_2 ON ( Tickets_2.id = Transactions_1.ObjectId )
WHERE (Tickets_2.EffectiveId = ‘97814’) AND (Transactions_1.ObjectType =
‘RT::Ticket’) ORDER BY main.id ASC;

Can you give me an ‘EXPLAIN’ on that query?

and

Query_time: 275 Lock_time: 0 Rows_sent: 1 Rows_examined: 806085

select count(id) from Attachments;

What does EXPLAIN SELECT COUNT(id) FROM Attachments say?

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
±—±------------±------------±------±--------------±--------±--------±-----±--------±------------+
| 1 | SIMPLE | Attachments | index | NULL | PRIMARY | 4 | NULL | 2698295 | Using index |

MySQL versions > 4.1 and < 5.0.45 are known to have a crippling query
optimizer bug that WILL hurt you on RT, but I’m not sure this is that.

Any pointers, please? I couldn’t see anything in the RT wiki, sorry.

“institutional wisdom”, sadly. But I’m assured “That issue isn’t this
one”

Can you easily optimize your tables? Perhaps first, it’s worth running
mysqltuner.pl (http://mysqltuner.pl) and posting the output.

And just to be sure, you ran the commands in UPGRADING.mysql?

I believe so (Tim did it).

Query_time: 240 Lock_time: 0 Rows_sent: 4 Rows_examined: 27

SELECT DISTINCT main.Id AS id, main.Filename AS filename,
main.ContentType AS contenttype, main.Headers AS headers, main.Subject
AS subject, main.Parent AS parent, main.ContentEncoding AS
contentencoding, main.ContentType AS contenttype, main.TransactionId AS
transactionid, main.Created AS created FROM Attachments main JOIN
Transactions Transactions_1 ON ( Transactions_1.id = main.TransactionId
) JOIN Tickets Tickets_2 ON ( Tickets_2.id = Transactions_1.ObjectId )
WHERE (Tickets_2.EffectiveId = ‘97814’) AND (Transactions_1.ObjectType =
‘RT::Ticket’) ORDER BY main.id ASC;

Can you give me an ‘EXPLAIN’ on that query?

Sure:

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | Tickets_2 | ref | PRIMARY,Tickets6 | Tickets6 | 4 | const | 1 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | Transactions_1 | ref | PRIMARY,Transactions1 | Transactions1 | 70 | const,rtdb.Tickets_2.id | 1 | Using where; Using index |
| 1 | SIMPLE | main | ref | Attachments2 | Attachments2 | 4 | rtdb.Transactions_1.id | 1 | |

Can you easily optimize your tables? Perhaps first, it’s worth running
mysqltuner.pl (http://mysqltuner.pl) and posting the output.

MySQLTuner 1.0.0 - Major Hayden major@mhtx.net
Bug reports, feature requests, and downloads at
http://mysqltuner.com/
Run with ‘–help’ for additional options and output filtering

-------- General Statistics --------------------------------------------------
[–] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.0.32-Debian_7etch8-log
[OK] Operating on 32-bit architecture with less than 2GB RAM

-------- Storage Engine Statistics -------------------------------------------
[–] Status: +Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[–] Data in MyISAM tables: 15M (Tables: 1)
[–] Data in InnoDB tables: 4G (Tables: 20)
[!!] Total fragmented tables: 1

-------- Performance Metrics -------------------------------------------------
[–] Up for: 1h 13m 11s (106K q [24.317 qps], 456 conn, TX: 228M, RX: 41M)
[–] Reads / Writes: 97% / 3%
[–] Total buffers: 1.3G global + 2.6M per thread (100 max threads)
[OK] Maximum possible memory usage: 1.6G (80% of installed RAM)
[OK] Slow queries: 0% (72/106K)
[OK] Highest usage of available connections: 56% (56/100)
[OK] Key buffer size / total MyISAM indexes: 256.0M/209.0M
[!!] Key buffer hit rate: 87.9% (13K cached / 1K reads)
[OK] Query cache efficiency: 53.8% (53K cached / 98K selects)
[OK] Query cache prunes per day: 0
[OK] Sorts requiring temporary tables: 0% (0 temp sorts / 2K sorts)
[!!] Temporary tables created on disk: 26% (1K on disk / 7K total)
[OK] Thread cache hit rate: 87% (57 created / 456 connections)
[OK] Table cache hit rate: 42% (89 open / 207 opened)
[OK] Open file limit used: 4% (43/1K)
[OK] Table locks acquired immediately: 99% (95K immediate / 95K locks)
[!!] InnoDB data size / buffer pool: 5.0G/1.0G

-------- Recommendations -----------------------------------------------------
General recommendations:
Run OPTIMIZE TABLE to defragment tables for better performance
MySQL started within last 24 hours - recommendations may be inaccurate
When making adjustments, make tmp_table_size/max_heap_table_size equal
Reduce your SELECT DISTINCT queries without LIMIT clauses
Variables to adjust:
tmp_table_size (> 64M)
max_heap_table_size (> 16M)
innodb_buffer_pool_size (>= 4G)

It’s a 2GB machine so we can’t up the innodb_buffer_pool_size (much)
more. I think the tmp_table_size/max_heap_table_size is the most recent
tweak we’ve tried.

thanks again,
Dave
** Dave Holland ** Systems Support – Infrastructure Management **
** 01223 496923 ** The Sanger Institute, Hinxton, Cambridge, UK **
“Flattery is flattery, but chocolate gets results.”

The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

I believe so (Tim did it).

Query_time: 240 Lock_time: 0 Rows_sent: 4 Rows_examined: 27

That just looks…crazy. Can you try an OPTIMIZE TABLE Attachments; ?
(It may take … a while.)

Tim’s just got into his car to go home, but I can provide some
answers…

We migrated our RT server to RT 3.8.2 a week ago, and all went well
until today,
From what version?

3.4.2

Anything in mysql’s error logs?

No, only the messages in the slow log, e.g:

Query_time: 240 Lock_time: 0 Rows_sent: 4 Rows_examined: 27

SELECT DISTINCT main.Id AS id, main.Filename AS filename,
main.ContentType AS contenttype, main.Headers AS headers, main.Subject
AS subject, main.Parent AS parent, main.ContentEncoding AS
contentencoding, main.ContentType AS contenttype, main.TransactionId AS
transactionid, main.Created AS created FROM Attachments main JOIN
Transactions Transactions_1 ON ( Transactions_1.id = main.TransactionId
) JOIN Tickets Tickets_2 ON ( Tickets_2.id = Transactions_1.ObjectId )
WHERE (Tickets_2.EffectiveId = ‘97814’) AND (Transactions_1.ObjectType =
‘RT::Ticket’) ORDER BY main.id ASC;

and

Query_time: 275 Lock_time: 0 Rows_sent: 1 Rows_examined: 806085

select count(id) from Attachments;

What does EXPLAIN SELECT COUNT(id) FROM Attachments say?

| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
| 1 | SIMPLE | Attachments | index | NULL | PRIMARY | 4 | NULL | 2698295 | Using index |

MySQL versions > 4.1 and < 5.0.45 are known to have a crippling query
optimizer bug that WILL hurt you on RT, but I’m not sure this is that.

Any pointers, please? I couldn’t see anything in the RT wiki, sorry.

This is on Debian Etch so I expect we can upgrade to 5.0.51 from
backports quite easily if need be.

thanks,
Dave
** Dave Holland ** Systems Support – Infrastructure Management **
** 01223 496923 ** The Sanger Institute, Hinxton, Cambridge, UK **
“An infinite number of mathematicians walk into a bar…”

The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

Tim’s just got into his car to go home, but I can provide some
answers…

We migrated our RT server to RT 3.8.2 a week ago, and all went well
until today,
From what version?

3.4.2

Anything in mysql’s error logs?

No, only the messages in the slow log, e.g:

Query_time: 240 Lock_time: 0 Rows_sent: 4 Rows_examined: 27

I hope you see that mysql examined only 27 rows and spend 240 seconds
to do that. I suspect that it’s related to very high IO activity that
were going parallel with this query and blocked mysql’s IO requests.

[snip]

Best regards, Ruslan.

And just to be sure, you ran the commands in UPGRADING.mysql?

I believe so (Tim did it).

Indeed. All done pukka according to the upgrade docs.

Tim

The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

I believe so (Tim did it).

Query_time: 240 Lock_time: 0 Rows_sent: 4 Rows_examined: 27

That just looks…crazy. Can you try an OPTIMIZE TABLE Attachments; ?
(It may take … a while.)

Might as well do that now, since we’re out of hours. I’ve started
that off.

Tim

The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

I believe so (Tim did it).

Query_time: 240 Lock_time: 0 Rows_sent: 4 Rows_examined: 27

That just looks…crazy. Can you try an OPTIMIZE TABLE Attachments; ?
(It may take … a while.)

OK, OPTIMIZE TABLE done, but performance on SELECT COUNT(id) FROM
Attachments is still terrible:

This does sort of smell like an optimiser bug - it says it’s using
indices, but the level of I/O being done by the machine suggests it’s
actually performing a full table scan.

mysql> optimize table Attachments;
| Table | Op | Msg_type | Msg_text |
| rtdb.Attachments | optimize | status | OK |
1 row in set (23 min 11.29 sec)

mysql> select count(id) from Attachments;
| count(id) |
| 806180 |
1 row in set (3 min 27.42 sec)

mysql> explain select count(id) from Attachments;
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
| 1 | SIMPLE | Attachments | index | NULL | PRIMARY |
4 | NULL | 3065551 | Using index |
1 row in set (0.00 sec)

Now, I may be being silly here, but how can a table with only 806180
id’s in its primary key have more than 3 million rows? And counting
them takes three minutes, which really isn’t many rows a second…

Tim

The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

Now, I may be being silly here, but how can a table with only 806180
id’s in its primary key have more than 3 million rows? And counting
them takes three minutes, which really isn’t many rows a second…

Is this stored in InnoDB by chance? If it were me, I would mysqldump all
databases, nuke the back-end InnoDB files and reload from dump. But that’s
just me.

– ============================
Tom Lahti
BIT Statement LLC

(425)251-0833 x 117
http://www.bitstatement.net/
– ============================

OK, OPTIMIZE TABLE done, but performance on SELECT COUNT(id) FROM
Attachments is still terrible:

This does sort of smell like an optimiser bug - it says it’s using
indices, but the level of I/O being done by the machine suggests it’s
actually performing a full table scan.

Indeed. Before you dig deeply into what’s going on with this version of
mysql, coming up to a current 5.0.x probably makes sense.

mysql> optimize table Attachments;
±-----------------±---------±---------±---------+
| Table | Op | Msg_type | Msg_text |
±-----------------±---------±---------±---------+
| rtdb.Attachments | optimize | status | OK |
±-----------------±---------±---------±---------+
1 row in set (23 min 11.29 sec)

mysql> select count(id) from Attachments;
±----------+
| count(id) |
±----------+
| 806180 |
±----------+
1 row in set (3 min 27.42 sec)

mysql> explain select count(id) from Attachments;
±—±------------±------------±------±--------------±--------
±--------±-----±--------±------------+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
±—±------------±------------±------±--------------±--------
±--------±-----±--------±------------+
| 1 | SIMPLE | Attachments | index | NULL | PRIMARY | 4
| NULL | 3065551 | Using index |
±—±------------±------------±------±--------------±--------
±--------±-----±--------±------------+
1 row in set (0.00 sec)

Now, I may be being silly here, but how can a table with only 806180
id’s in its primary key have more than 3 million rows?

An excellent question. I have a sneaking suspicion there’s some DB
corruption in there somewhere :confused: But maybe it’s just a database bug.

I hope you see that mysql examined only 27 rows and spend 240 seconds
to do that. I suspect that it’s related to very high IO activity that
were going parallel with this query and blocked mysql’s IO requests.

That was my worry. I’ve shuffled the virtual machine onto a physical
server with no other VM’s running, and the performance problem remains
the same. The graph of disk I/O from the VMware control center shows
the system doing no I/O at all until I start the SELECT, and then it
goes to about 15 MB/sec for the duration of the query.

So if there is contention, it’s not at the level of the physical
server’s I/O. However, it could still be contention for the SAN
volume which hosts the virtual machine; I find this hard to believe
since the total I/O of all three physical servers is pretty low, but I
will try tomorrow to see what happens if I move the virtual machine to
its own piece of storage,

Tim

The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

OK, OPTIMIZE TABLE done, but performance on SELECT COUNT(id) FROM
Attachments is still terrible:

This does sort of smell like an optimiser bug - it says it’s using
indices, but the level of I/O being done by the machine suggests it’s
actually performing a full table scan.

Indeed. Before you dig deeply into what’s going on with this version
of
mysql, coming up to a current 5.0.x probably makes sense.

OK, I’ve just moved up to 5.0.51a, as supplied in etch-backports, and
the problem persists. An ANALYZE TABLE on the Attachments table has
helped a bit with the ludicrous statistics from EXPLAIN, but they’re
now just very silly rather than ludicrous:

mysql> explain select count(id) from Attachments;
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
| 1 | SIMPLE | Attachments | index | NULL | PRIMARY |
4 | NULL | 2324446 | Using index |

A reload might be the way forward here.

There is a part of me wondering what suddenly broke this today. I’m
currently searching the table for suspiciously large looking
attachments.

Tim

The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

First time I see something like that. May be it’s better to ask on
mysql lists. And sure you should keep digging into a reason. My
practice shows that there is no such thing as “suddenly” in IT. There
is always some change that turn world up side down.

optimize/analyze on InnoDB don’t do the same job as on myisam tables,
it’s documented in mysql’s docs. You can find way to rebuild indexes
on a table and as the last resort use full re-load.On Wed, Feb 18, 2009 at 2:32 AM, Tim Cutts tjrc@sanger.ac.uk wrote:

On 17 Feb 2009, at 10:52 pm, Jesse Vincent wrote:

OK, OPTIMIZE TABLE done, but performance on SELECT COUNT(id) FROM
Attachments is still terrible:

This does sort of smell like an optimiser bug - it says it’s using
indices, but the level of I/O being done by the machine suggests it’s
actually performing a full table scan.

Indeed. Before you dig deeply into what’s going on with this version
of
mysql, coming up to a current 5.0.x probably makes sense.

OK, I’ve just moved up to 5.0.51a, as supplied in etch-backports, and
the problem persists. An ANALYZE TABLE on the Attachments table has
helped a bit with the ludicrous statistics from EXPLAIN, but they’re
now just very silly rather than ludicrous:

mysql> explain select count(id) from Attachments;
±—±------------±------------±------±--------------±--------
±--------±-----±--------±------------+
| id | select_type | table | type | possible_keys | key |
key_len | ref | rows | Extra |
±—±------------±------------±------±--------------±--------
±--------±-----±--------±------------+
| 1 | SIMPLE | Attachments | index | NULL | PRIMARY |
4 | NULL | 2324446 | Using index |
±—±------------±------------±------±--------------±--------
±--------±-----±--------±------------+

A reload might be the way forward here.

There is a part of me wondering what suddenly broke this today. I’m
currently searching the table for suspiciously large looking
attachments.

Tim

Best regards, Ruslan.

First time I see something like that. May be it’s better to ask on
mysql lists. And sure you should keep digging into a reason. My
practice shows that there is no such thing as “suddenly” in IT. There
is always some change that turn world up side down.

optimize/analyze on InnoDB don’t do the same job as on myisam tables,
it’s documented in mysql’s docs. You can find way to rebuild indexes
on a table and as the last resort use full re-load.

I think that’s what I’m going to do - the slave of our RT instance,
which is also running MySQL 5.0.51, performs the same query in only 15
seconds, not four minutes…

Tim

The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

First time I see something like that. May be it’s better to ask on
mysql lists. And sure you should keep digging into a reason. My
practice shows that there is no such thing as “suddenly” in IT. There
is always some change that turn world up side down.

optimize/analyze on InnoDB don’t do the same job as on myisam tables,
it’s documented in mysql’s docs. You can find way to rebuild indexes
on a table and as the last resort use full re-load.

We’ve had a flash of realisation, and we think we know what the
problem is. This virtual machine has a snapshot on it. This means
that all writes subsequent to the snapshot being taken have been going
to a replay log, rather than to the .vmdk file that the virtual
machine lives in. Presumably this is a much more complex scenario for
I/O to operate in than when snapshots are not present. I hadn’t
realised this was how VMware snapshots work.

So, I’m deleting the snapshots now, but this is a very time consuming
process because it’s having to replay a week’s worth of RT writes back
into the .vmdk file. We’ll let you know whether this solves the
problem.

Tim

The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

Hello,

Some time ago I asked you for hints / clues regarding the setup of RT
3.8.1 running on RHEL5, using Vintela Authentication Services (VAS) and
Active Directory + Kerberos.
I just couldn’t stop thinking about it, so I kept digging …

I finally figured out how to configure it to work correctly, so I
decided to share it in case someone else would have similar issue.

first thing RT_SiteConfig.pm

just add these three lines:
Set($WebExternalAuth , 1);
Set($WebFallbackToInternalAuth , 1);
Set($WebExternalAuto , 1);

apache configuration:
in your virtual host definition
(within <Directory …> … )

AuthType VAS
AuthName "Systems Support Request Tracker"
AuthVasUseBasic On
AuthVasRemoteUserMap local
Require valid-user

after the you will need this:
<LocationMatch “/NoAuth”>
Satisfy Any
Allow from all

otherwise RT’s mailgate will not accept any emails to your RT system.

one final information is, that this setup assumes that you have
installed and configured the following module for your apache:
mod_auth_vas

which is available on http://rc.quest.com

Hope this will save some time to someone :slight_smile:

regards,
Marcin

An excellent question. I have a sneaking suspicion there’s some DB
corruption in there somewhere :confused: But maybe it’s just a database bug.

After chasing some red herrings we found something “interesting”:

mysql> select count() from Attachments;
| count(
) |
| 807113 |
1 row in set (1 min 47.01 sec)

mysql> select count() from Attachments ignore index(PRIMARY);
| count(
) |
| 807113 |
1 row in set (0.66 sec)

So I did a quick “alter table Attachments drop primary key, add primary
key (id);” and now we are seeing much more sensible behaviour:

mysql> select count() from Attachments;
| count(
) |
| 807568 |
1 row in set (0.00 sec)

There is still an occasional hiccup when it goes slower - possibly when
an Attachment has been added - but by and large it’s working as expected
now.

Thanks everyone for your help, I’ve learned rather a lot about MySQL in
the last day or so!

Cheers,
Dave
** Dave Holland ** Systems Support – Infrastructure Management **
** 01223 496923 ** The Sanger Institute, Hinxton, Cambridge, UK **
“I’ve learned so much from my mistakes, I think I’ll make some more.”

The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

An excellent question. I have a sneaking suspicion there’s some DB
corruption in there somewhere :confused: But maybe it’s just a database bug.

After chasing some red herrings we found something “interesting”:

mysql> select count() from Attachments;
±---------+
| count(
) |
±---------+
| 807113 |
±---------+
1 row in set (1 min 47.01 sec)

mysql> select count() from Attachments ignore index(PRIMARY);
±---------+
| count(
) |
±---------+
| 807113 |
±---------+
1 row in set (0.66 sec)

So I did a quick “alter table Attachments drop primary key, add
primary
key (id);” and now we are seeing much more sensible behaviour:

mysql> select count() from Attachments;
±---------+
| count(
) |
±---------+
| 807568 |
±---------+
1 row in set (0.00 sec)

There is still an occasional hiccup when it goes slower - possibly
when
an Attachment has been added - but by and large it’s working as
expected
now.

Thanks everyone for your help, I’ve learned rather a lot about MySQL
in
the last day or so!

Should add that any form of full text search of Content is still
taking so long that the web interface times out, and in some browsers
this can then have a nasty effect on the session and you can’t open
any more RT windows without shutting down your browser (behaviour seen
on both IE and Firefox)

Tim

The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.