RT Scalability

Mark_Chappell · November 26, 2007, 3:58pm

We use RT for a significant chunk of the admissions, enquiries and
support queries. We now have over 1/3 of a million tickets, 300 queues
and 600 privileged users (not to mention a whole batch of users as the
result of spam).

We’ve been running this off a dual Xeon box running MySQL (with a mysql
slave backup server that can be swapped in at short notice), but as you
might imagine, with that many users (a significant chunk of which use
the web interface) it’s been getting a little slow of late.

So we started looking at clustering options.

On the database side, MySQL cluster seems to insist on holding the
entire database in memory, which with a 10 Gig and growing database just
isn’t feasible. Oracle is just a little bit too much black magic and
our DBAs seem a little reticent to play the clustering game. So I
started playing with PGCluster, and I now have a load balanced setup in
the final stages of testing prior to deployment.

For the web side, Apache combined with Cisco switch SLB has worked a treat.

I’m pretty impressed with just how well RT has scaled, however, I’m
curious as to how many other people have tried pushing RT this far.

Has anybody looked into the possibility of “archiving” parts of the RT
database, leaving only a stub with no attachments content?

Mark
With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead.
– RFC 1925

Jesse_Vincent · November 26, 2007, 4:28pm

We use RT for a significant chunk of the admissions, enquiries and
support queries. We now have over 1/3 of a million tickets, 300 queues
and 600 privileged users (not to mention a whole batch of users as the
result of spam).

While “big”, that’s by no means “huge” for MySQL.

We’ve been running this off a dual Xeon box running MySQL (with a mysql
slave backup server that can be swapped in at short notice), but as you
might imagine, with that many users (a significant chunk of which use
the web interface) it’s been getting a little slow of late.

Can you talk about what you’ve done to tune your MySQL server? MySQL
performance tuning can take you a lot larger than you’re currently at.
Having enough RAM and using it are both important for keeping mysql
purring.

I’m pretty impressed with just how well RT has scaled, however, I’m
curious as to how many other people have tried pushing RT this far.

I know of a VOIP provider with (a locally hacked and improved) RT with
16 million tickets in it, a social network with six million tickets in
their RT (though I haven’t ever talked to them about it. just reported
bugs in their site and at least one company running a very stock RT
3.4+ a few custom mysql indexes with well over a million tickets and
pretty impressive performance.

-jesse

Mark_Chappell · November 26, 2007, 4:50pm

Jesse Vincent wrote:

Can you talk about what you’ve done to tune your MySQL server? MySQL
performance tuning can take you a lot larger than you’re currently at.
Having enough RAM and using it are both important for keeping mysql
purring.

The bulk of the tuning was done before my time, it’s mostly a stock RT
with some minor my.conf changes

Performance Tuning

key_buffer = 384M
table_cache = 512
sort_buffer_size = 2M
read_buffer_size = 2M
myisam_sort_buffer_size = 64M
thread_cache = 8

try number of CPU’s*2 for thread_concurrency

thread_concurrency = 8

query_cache_size = 64M
#query_cache_type = 1

innodb_data_home_dir = /var/lib/mysql/
innodb_data_file_path = ibdata1:10M:autoextend
innodb_log_group_home_dir = /var/lib/mysql/
innodb_log_arch_dir = /var/lib/mysql/

You can set …_buffer_pool_size up to 50 - 80 %

of RAM but beware of setting memory usage too high

innodb_buffer_pool_size = 384M
innodb_additional_mem_pool_size = 20M

Set …_log_file_size to 25 % of buffer pool size

innodb_log_file_size = 64M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 50

MySQL seems to be munching on about 1/3rd of the available memory (2Gb),
and splitting the CPU load about 50/50 with apache. During the day load
averages of >5 aren’t that uncommon at the minute.

Last time I tried looking on a test server with just me, a sizable chunk
of time seemed to be being eaten by ACL queries which we played with but
couldn’t seem to get optimised.

With a handful of us testing out the Postgres based cluster things seem
to be moving much faster, but it’s supposed to be able to tune its
self… With 3 sites, we can also now offer automatic failover, which
management types seem to be interested in.

I’m pretty impressed with just how well RT has scaled, however, I’m
curious as to how many other people have tried pushing RT this far.

I know of a VOIP provider with (a locally hacked and improved) RT with
16 million tickets in it.

I was curious since most of the random RT databases I’ve seen appear to
be handing out tickets in the tens on thousands range, and at peak times
of the year we were churning through about 1000 tickets a day.

Database tuning, definately a black art, but pure magic when you get it
right…

Mark
With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead.
– RFC 1925

Kenneth_Marshall · November 26, 2007, 4:36pm

We use RT for a significant chunk of the admissions, enquiries and support
queries. We now have over 1/3 of a million tickets, 300 queues and 600
privileged users (not to mention a whole batch of users as the result of
spam).

We’ve been running this off a dual Xeon box running MySQL (with a mysql
slave backup server that can be swapped in at short notice), but as you
might imagine, with that many users (a significant chunk of which use the
web interface) it’s been getting a little slow of late.

So we started looking at clustering options.

On the database side, MySQL cluster seems to insist on holding the entire
database in memory, which with a 10 Gig and growing database just isn’t
feasible. Oracle is just a little bit too much black magic and our DBAs
seem a little reticent to play the clustering game. So I started playing
with PGCluster, and I now have a load balanced setup in the final stages of
testing prior to deployment.

For the web side, Apache combined with Cisco switch SLB has worked a treat.

I’m pretty impressed with just how well RT has scaled, however, I’m curious
as to how many other people have tried pushing RT this far.

Has anybody looked into the possibility of “archiving” parts of the RT
database, leaving only a stub with no attachments content?

Mark

Mark,

We only have about half the number of tickets in our RT instance, but
are using PostgreSQL with Slony1 replication to a failover database
instance. I would love to see a few more details of your PGCLuster
setup since we are in the process of upgrading to the 3.6 release on
replacement hardware and would like to increase our redundancy, if
possible. Since you are using PostgreSQL, have you looked at constraint
exclusion to partition your tables on the backend. Also, if you recluster
the tables appropriately, the older information will automatically be
segregated from the new active tickets – just some ideas.

Regards,
Ken

Roy_El-Hames · November 26, 2007, 5:00pm

Mark;

One thing I would recommend is to split the web from db and get them
running on 2 different servers , we found that made huge difference to
the front end usability.
Regards;
Roy

Mark Chappell wrote:

Mark_Chappell · November 26, 2007, 5:10pm

Kenneth Marshall wrote:

I would love to see a few more details of your PGCLuster
setup since we are in the process of upgrading to the 3.6 release on
replacement hardware and would like to increase our redundancy, if
possible.

PG Cluster is basically a shared nothing architecture, although they’re
spinning off PG Cluster II which would have a shared backing store.
However we seem to be so read heavy that the write speed benefits of a
shared backing store aren’t really needed.

2 machines dedicated to RT.

Basic layout.
3 replicators, 1 on a totally separate box so that quorum can be
attained if one of the ‘RT’ machines disappears, and the other 2 running
on the 2 ‘RT’ machines.

2 cluster database nodes, one on each of the ‘RT’ machines.

Apache is running on each of the ‘RT’ machines, and queries the database
node on that machine.

We’re contemplating rolling out another pair of DB nodes, and then using
the pglb part of PG Cluster, but for now we’re not using them.

Since you are using PostgreSQL, have you looked at constraint
exclusion to partition your tables on the backend. Also, if you recluster
the tables appropriately, the older information will automatically be
segregated from the new active tickets – just some ideas.

And some of those magic words that make searching for useful information
so much simpler start turning up. Thank you.

Mark
With sufficient thrust, pigs fly just fine. However, this is not
necessarily a good idea. It is hard to be sure where they are going to
land, and it could be dangerous sitting under them as they fly
overhead.
– RFC 1925

Jesse_Vincent · November 26, 2007, 5:25pm

Jesse Vincent wrote:

Can you talk about what you’ve done to tune your MySQL server? MySQL
performance tuning can take you a lot larger than you’re currently at.
Having enough RAM and using it are both important for keeping mysql
purring.

The bulk of the tuning was done before my time, it’s mostly a stock RT
with some minor my.conf changes

You likely want to have a look at the “my innodb 4gb heavy” sample
config file. And then turn on the slow query log and have a look at
indexing.

-jesse