Tables, database size, backups

Alex_Howells · January 28, 2008, 7:23pm

Hi All,

First post to the list, although I’ve been an avid reader for months;
we’ve a reasonably sized RT3 installation at work, probably doing around
15,000 tickets per annum at the moment, with 5-6 staff working with it.
It’s just being moved onto new hardware, which brings me to:

Our ‘sessions’ table is rather huge, at 1.5GB
and most of the timestamps contained within are
literally 1-3 years old. Do I need to keep this
data for any reason, or can it be purged to save
space on the DB host + time when backing up?
Ditto for our attachments table, although I’m able
to see more of a reason here why the data is needed!
Anyone have any hints / tips on this?
I’ve been doing backups with mysqldump or similar
for a while now, and came across mk-parallel-dump [1]
plus mylvmbackup[2]; both look nice, does anyone have
“real world” experience using either with RT?
Obvious downside to mysqldump is it’s rather slow,
and you may as well go for a cuppa when it runs.
We had a ‘problem’ where a lot of connections
to MySQL would remain open and idle for ~6 hours
despite nobody using the application. Reducing
wait_timeout seemed to make this go away, but
seems to have the unexpected side-effect of RT
getting its panties in a bunch and not sending out
correspondance + comments via e-mail properly. Is
there a “safe” limit for wait_timeout with RT?
Furthermore, is there anything I should be looking
out for in particular when my install stops mailing?
It seems to stop silently too, which is frustrating.
We have a dedicated box for ‘web applications’ and
another for running the databases. I’ve made some
attempts to tune MySQL but my experience there mostly
lies with MyISAM tables; our RT3 install seems to be
exclusively InnoDB. Anyone got some home remedies?

innodb_additional_mem_pool_size = 64M
innodb_buffer_pool_size = 1G

innodb_flush_method = O_DIRECT

innodb_log_buffer_size = 8M

innodb_thread_concurrency = 8

webapps is a Opteron 1218HE with 8GB RAM, plus
2 x 320GB 7200rpm SATA HDDs on a 3ware RAID controller;
database server is a Opteron 170HE with 4GB RAM, and
2 x 150GB 10000rpm SATA HDDs on a 3ware controller.
Happy to post other bits of my config, if that’d help

Sorry for being a bit vague with all this grin If my InnoDB settings
make anyone lose their lunch, can I profusely apologise in advance?

Alex

[1] http://maatkit.sourceforge.net/
[2] Blog - Mikula Beutl

Ruslan_Zakirov1 · January 28, 2008, 8:02pm

Hi All,

First post to the list, although I’ve been an avid reader for months;
we’ve a reasonably sized RT3 installation at work, probably doing around
15,000 tickets per annum at the moment, with 5-6 staff working with it.
It’s just being moved onto new hardware, which brings me to:

Our ‘sessions’ table is rather huge, at 1.5GB
and most of the timestamps contained within are
literally 1-3 years old. Do I need to keep this
data for any reason, or can it be purged to save
space on the DB host + time when backing up?
clean it! it’s slow down your system. you don’t need data, only schema
definitions in backups.

Ditto for our attachments table, although I’m able
to see more of a reason here why the data is needed!
Anyone have any hints / tips on this?
no tips. it’s content of tickets. so there is no way to avoid dumping,
however you can compress backups if space is your problem.

[snip]

We had a ‘problem’ where a lot of connections
to MySQL would remain open and idle for ~6 hours
despite nobody using the application. Reducing
wait_timeout seemed to make this go away, but
seems to have the unexpected side-effect of RT
getting its panties in a bunch and not sending out
correspondance + comments via e-mail properly. Is
there a “safe” limit for wait_timeout with RT?
Furthermore, is there anything I should be looking
out for in particular when my install stops mailing?
It seems to stop silently too, which is frustrating.
hm. that’s odd we have code that pings DB and reconnect when it’s not alive.

[snip]

Best regards, Ruslan.

Alex_Howells · January 30, 2008, 1:22am

Ruslan Zakirov wrote:

clean it! it’s slow down your system. you don’t need data, only schema
definitions in backups.

Excellent, I’ve truncated that table and things seem to be going
smoothly. Total size was 1.7GB so it’s nice to be free of that

Ditto for our attachments table, although I’m able
to see more of a reason here why the data is needed!
Anyone have any hints / tips on this?

no tips. it’s content of tickets. so there is no way to avoid dumping,
however you can compress backups if space is your problem.

We’ll live with this one. Does seem somewhat awkward that you’ve got
potentially 5+ year old data lying around, but I suppose storage mediums
do get exponetially bigger, eh? Not a massive concern!

We had a ‘problem’ where a lot of connections
to MySQL would remain open and idle for ~6 hours
despite nobody using the application. Reducing
wait_timeout seemed to make this go away, but
seems to have the unexpected side-effect of RT
getting its panties in a bunch and not sending out
correspondance + comments via e-mail properly. Is
there a “safe” limit for wait_timeout with RT?
Furthermore, is there anything I should be looking
out for in particular when my install stops mailing?
It seems to stop silently too, which is frustrating.
hm. that’s odd we have code that pings DB and reconnect when it’s not alive.

Yeah, I didn’t think it was wait_timeout in MySQL but needed to be sure.
Our “bug” revolves around something getting screwed up and then
responses to tickets by staff are entered into the database but no
outgoing mail is sent; customer doesn’t get a response, gets frustrated.

Several issues for us here:

 *  noticing the problem has happened. Usually this is when a
    customer complains, or we happen to be *looking* at whether
    or not RT is sending mail for some other reason.

 *  once we have a vague timeframe when RT didn't send replies
    it's not trivial to take 'existing' responses to tickets within
    that window and dump them back into the "Send mail" queue.

Questions from all that:

We’ve got fairly verbose logging, should I be looking for anything in
particular with regards to it not sending mail? Any tell-tale signs?

Hows about automatically monitoring whether outgoing e-mails were
generated for comments/replies and popping up some kind of visual
notification if things haven’t succeeded?

Finally, is there any easy way to say “Show me all staff replies on
Friday 41st February 2050 between 2:00pm and 4:00pm” with a view to
feeding that through the system and generating all the mail properly?

Thanks for your response earlier, it was helpful

Alex Howells

Alex_Howells · January 30, 2008, 12:04pm

Further to my previous email…

Several issues for us here:

 *  noticing the problem has happened. Usually this is when a
    customer complains, or we happen to be *looking* at whether
    or not RT is sending mail for some other reason.

Actually seeing a lot of error messages on user-crit via syslog,
generated by RT in response to not being able to send mail:

RT: rt-3.6.1-5511-1201691639-297.49825-5-0@bytemark.co.ukCould not
send mail: Couldn't run /usr/lib/sendmail: Cannot allocate memory at
/home/rt/rt-3.6.1/lib/RT/Action/SendEmail.pm line 274. Stack:
[/home/rt/rt-3.6.1/lib/RT/Action/SendEmail.pm:274]
[/home/rt/rt-3.6.1/lib/RT/Action/SendEmail.pm:103]
[/home/rt/rt-3.6.1/lib/RT/ScripAction_Overlay.pm:240]
[/home/rt/rt-3.6.1/lib/RT/Scrip_Overlay.pm:506]
[/home/rt/rt-3.6.1/lib/RT/Scrips_Overlay.pm:193]
[/home/rt/rt-3.6.1/lib/RT/Transaction_Overlay.pm:179]
[/home/rt/rt-3.6.1/lib/RT/Record.pm:1446]
[/home/rt/rt-3.6.1/lib/RT/Ticket_Overlay.pm:2442]
[/home/rt/rt-3.6.1/lib/RT/Ticket_Overlay.pm:2356]
[/home/rt/rt-3.6.1/lib/RT/Interface/Web.pm:570]
[/home/rt/rt-3.6.1/share/html/Ticket/Display.html:140]
[/home/rt/rt-3.6.1/share/html/Ticket/Update.html:216]
[/home/rt/rt-3.6.1/share/html/autohandler:279]
(/home/rt/rt-3.6.1/lib/RT/Action/SendEmail.pm:289)

Now I can accept that’s a valid response for OOM situations; the
“problem” is that we were out of memory at ~2am, and it takes a full
restart of Apache + FastCGI processes for mail to start going again.

From 01:52 through 11:47 no mail got sent by our RT

 *  once we have a vague timeframe when RT didn't send replies
    it's not trivial to take 'existing' responses to tickets within
    that window and dump them back into the "Send mail" queue.

According to our loghost, that stack trace was thrown ~177 times.
Granted, a lot of those will be comments to CC’s and AdminCC’s on a
queue but it still means finding potentially 30-40 tickets which we’ve
responded to since 9am, may have resolved, and re-sending the message!

We’ll try to solve this with more regular restarts of RT, and throwing
more RAM into that server, but do you have any ideas?

Alex