Isolated MySQL Server has gone away problem

Hi Everyone,

First, some background information. Since we upgraded to RT 3.4.4, we
have been seeing the “MySQL server has gone away” message, which results
in tickets not being written to the database. All told, this has
happened to us about 50 times since December, 2005. While the failure
rate is less than .1%, this still posed problems for the parties
involved. That is, the one submitting a ticket thought the ticket made
it in the queue; they get an e-mail. Also, the watchers of the queue
also get an e-mail message, but in the end no ticket makes it in the
database. Finally, the only time that this problem has occurred is when
a user submits a ticket via the web interface.

In researching the MySQL web site, I found that the “MySQL server has
gone away message” could be the results of several things, i.e.
incorrect parameters set in my.cnf, incorrect host name resolution (bad
entry in /etc/resolv.conf), closing the MySQL connection prematurely or
not opening up the connection at all. After investigating the first
three possibilities over time, I turned my attention to the final
possibility. Research indicated that my.cnf was fine (timeouts really
were set to large numbers and the transaction failure was immediate),
/etc/resolv.conf was cleaned up because the third nameserver entry was
no longer valid (not likely this would have caused the problem and this
issue was caught after about 20 occurrences of the problem)), premature
disconnection (usually occurrs with a MySQL timeout; again not likely).

With that said, I decided to turn on MySQL General Logging to capture
everything that is sent to MySQL, but in text format (this generates
about 1 gb/day). It took about a month, but I finally was able to
capture the elusive error. I also decided to capture a good web based
transaction, as well. After comparing the two, I determined that in the
errant situation, the transactions required to create a ticket, are sent
to MySQL before the MySQL “Connect” is issued. Thus, in a normal log one
would see:

060824 9:26:34 Connect rt_user@localhost on rt3

                        Query        set autocommit=1

Some housekeeping is done and an Apache Mason session is created

Next, the transaction is processed, about 300 or so lines of Query,
Insert and Update MySQL commands.

Housekeeping to end the transaction.

In the case of when the error occurs, the sequence is:

Some housekeeping is done and an Apache Mason session is created

The transaction is processed, about 300 or so lines of Query, Insert and
Update MySQL commands.

060823 12:38:00 Connect rt_user@localhost on rt3

                         Query        set autocommit=1

Housekeeping to end the transaction.

When one views syslog, each RT transaction to MySQL fails, because it
cannot write to the database. The MySQL logs do not show a problem has
occurred, nor does the end user see any problems, either. However, when
a user enters a ticket, a correctly created ticket is displayed back to
the user; if the ticket was not correctly created they are displayed an
error message, which is probably ignored. As they receive an e-mail, the
user thinks the ticket actually did make it in the database and ignores
the error returned by the web interface. Some user education is a good
work around, but this problem should not be happening in the first

Request Tracker is a rather complicated code base and this problem looks
like that under certain conditions the subroutine to connect to the
database is being executed out of proper order; possibly a flag not
being set properly. This could be in Request Tracker proper or in the
DBIx::SearchBuilder module, as this module is the main interface between
Request Tracker and MySQL.

While this message does not provide a solution, it at least provides an
explanation. Also, one hopes that this issue has been fixed in RT3.6.1.

Take care!


Nick Metrowsky

Consulting System Administrator

303-684-4785 Office

303-684-4100 Fax

DigitalGlobe (r), An Imaging and Information Company