RT/Apache suddenly hangs

System:
Red Hat Enterprise Linux WS release 3 (Taroon Update 8)
RT 3.6.1
Apache v2.0.59
Perl 5.8.7
mod_fcgi-2.4.2
Postgres 8.1.4

Approximately 80.000 tickets.

Problem:
RT/Apache suddenly becomes unavailable/hangs (normaly once a day), and
requires Apache restart so RT can work again.

We are not sure what causes the problem, and if others have similar
problems, we would be gladly to hear about it!

List of processes running and load on server:

ps aux | grep apache

root 22066 0.0 0.0 7964 3360 ? S Feb01 0:08
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 13898 0.0 0.0 7964 3320 ? S 03:59 0:00
/local/opt/apache2/bin/fcgi- -k start -DSSL
nobody 21699 0.0 0.0 8216 3832 ? S 14:02 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22628 0.0 0.0 8216 3840 ? S 14:14 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22648 0.0 0.0 8216 3840 ? S 14:15 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22650 0.0 0.0 8216 3820 ? S 14:15 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22939 0.0 0.0 8252 3756 ? S 14:18 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22941 0.0 0.0 8216 3848 ? S 14:18 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22945 0.0 0.0 8216 3804 ? S 14:18 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22953 0.0 0.0 8216 3756 ? S 14:18 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22955 0.0 0.0 8216 3796 ? S 14:18 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22959 0.0 0.0 8216 3812 ? S 14:19 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22961 0.0 0.0 8216 3800 ? S 14:19 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22962 0.0 0.0 8216 3788 ? S 14:19 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22965 0.0 0.0 8236 3804 ? S 14:19 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22966 0.0 0.0 8216 3788 ? S 14:19 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22967 0.0 0.0 8216 3812 ? S 14:19 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23217 0.0 0.0 8228 3792 ? S 14:21 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23218 0.0 0.0 8228 3744 ? S 14:21 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23219 0.0 0.0 8244 3740 ? S 14:21 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23224 0.0 0.0 8232 3768 ? S 14:21 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23225 0.0 0.0 8216 3752 ? S 14:21 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23230 0.0 0.0 8228 3776 ? S 14:22 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23240 0.0 0.0 8220 3780 ? S 14:22 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23241 0.0 0.0 8220 3740 ? S 14:22 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23242 0.0 0.0 8248 3728 ? S 14:22 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23250 0.0 0.0 8216 3732 ? S 14:22 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23254 0.0 0.0 8216 3744 ? S 14:22 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23255 0.0 0.0 8216 3732 ? S 14:22 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23286 0.0 0.0 8216 3772 ? S 14:22 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23290 0.0 0.0 8216 3760 ? S 14:23 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23292 0.0 0.0 8248 3724 ? S 14:23 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23294 0.0 0.0 8216 3764 ? S 14:23 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23299 0.0 0.0 8108 3696 ? S 14:23 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23326 0.0 0.0 8108 3672 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23327 0.0 0.0 8108 3708 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23328 0.0 0.0 8216 3696 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23336 0.1 0.0 8248 3744 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23337 0.0 0.0 8108 3692 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23338 0.0 0.0 8108 3680 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23339 0.0 0.0 8108 3692 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23352 0.0 0.0 8236 3712 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23353 0.0 0.0 8236 3720 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23354 0.0 0.0 8236 3712 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23355 0.0 0.0 8236 3712 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23356 0.0 0.0 8236 3716 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23358 0.0 0.0 8236 3716 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23557 0.1 0.0 8108 3708 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23558 0.0 0.0 8108 3656 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23559 0.0 0.0 8108 3656 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23560 0.0 0.0 8108 3692 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23561 0.0 0.0 8108 3672 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23562 0.0 0.0 8108 3660 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23563 0.0 0.0 8108 3660 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
root 23569 0.0 0.0 1616 468 pts/0 S 14:26 0:00 grep apache

uptime

14:26:08 up 45 days, 22:31, 2 users, load average: 8.14, 8.23, 8.01

It normaly spaws 5 Apache processes when starting. Here, it’s unusual many
processes.

Apache logs messages as before and even rt.log logs as nothing has
happened.

Apache is normaly restarted once a night, due to memory leek which
Mason/Perl/FastCGI is responsible for in some strange way. But this should
not be the problem here.

I noticed that there was a mail-loop from a spam, that looped in the same
time-frame as the server suddenly stopped. But I cannot draw any
connections between those problems. I cannot find anything directly in the
logs that says some problems/alerts with Apache. It just hang, and needs a
restart.

How can I debug and find out what’s wrong? Is there some kind of diffuse
searches in RT that causes hang (search bug) … that may be fixed in
3.6.3 or… the release of 3.6.3 was quite fast after 3.6.2.

Sincerely,
Tomas

Tomas A. P. Olaj, email: tomas.olaj@usit.uio.no, web: folk.uio.no/tomaso
University of Oslo / USIT (Center for Information Technology Services)
System- and Application Management / Applications Management Group

Hi Tomas,

yes, we have exactly the same under RHAS4 with Apache, FastCGI and mysql…
but we have to restart 4 times the day :frowning:

Torsten2007/2/2, Tomas Olaj tomas.olaj@usit.uio.no:

System:

Red Hat Enterprise Linux WS release 3 (Taroon Update 8)
RT 3.6.1
Apache v2.0.59
Perl 5.8.7
mod_fcgi-2.4.2
Postgres 8.1.4

Approximately 80.000 tickets.

Problem:

RT/Apache suddenly becomes unavailable/hangs (normaly once a day), and
requires Apache restart so RT can work again.

We are not sure what causes the problem, and if others have similar
problems, we would be gladly to hear about it!

List of processes running and load on server:

ps aux | grep apache

root 22066 0.0 0.0 7964 3360 ? S Feb01 0:08
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 13898 0.0 0.0 7964 3320 ? S 03:59 0:00
/local/opt/apache2/bin/fcgi- -k start -DSSL
nobody 21699 0.0 0.0 8216 3832 ? S 14:02 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22628 0.0 0.0 8216 3840 ? S 14:14 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22648 0.0 0.0 8216 3840 ? S 14:15 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22650 0.0 0.0 8216 3820 ? S 14:15 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22939 0.0 0.0 8252 3756 ? S 14:18 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22941 0.0 0.0 8216 3848 ? S 14:18 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22945 0.0 0.0 8216 3804 ? S 14:18 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22953 0.0 0.0 8216 3756 ? S 14:18 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22955 0.0 0.0 8216 3796 ? S 14:18 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22959 0.0 0.0 8216 3812 ? S 14:19 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22961 0.0 0.0 8216 3800 ? S 14:19 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22962 0.0 0.0 8216 3788 ? S 14:19 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22965 0.0 0.0 8236 3804 ? S 14:19 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22966 0.0 0.0 8216 3788 ? S 14:19 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 22967 0.0 0.0 8216 3812 ? S 14:19 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23217 0.0 0.0 8228 3792 ? S 14:21 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23218 0.0 0.0 8228 3744 ? S 14:21 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23219 0.0 0.0 8244 3740 ? S 14:21 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23224 0.0 0.0 8232 3768 ? S 14:21 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23225 0.0 0.0 8216 3752 ? S 14:21 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23230 0.0 0.0 8228 3776 ? S 14:22 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23240 0.0 0.0 8220 3780 ? S 14:22 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23241 0.0 0.0 8220 3740 ? S 14:22 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23242 0.0 0.0 8248 3728 ? S 14:22 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23250 0.0 0.0 8216 3732 ? S 14:22 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23254 0.0 0.0 8216 3744 ? S 14:22 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23255 0.0 0.0 8216 3732 ? S 14:22 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23286 0.0 0.0 8216 3772 ? S 14:22 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23290 0.0 0.0 8216 3760 ? S 14:23 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23292 0.0 0.0 8248 3724 ? S 14:23 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23294 0.0 0.0 8216 3764 ? S 14:23 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23299 0.0 0.0 8108 3696 ? S 14:23 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23326 0.0 0.0 8108 3672 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23327 0.0 0.0 8108 3708 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23328 0.0 0.0 8216 3696 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23336 0.1 0.0 8248 3744 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23337 0.0 0.0 8108 3692 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23338 0.0 0.0 8108 3680 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23339 0.0 0.0 8108 3692 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23352 0.0 0.0 8236 3712 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23353 0.0 0.0 8236 3720 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23354 0.0 0.0 8236 3712 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23355 0.0 0.0 8236 3712 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23356 0.0 0.0 8236 3716 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23358 0.0 0.0 8236 3716 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23557 0.1 0.0 8108 3708 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23558 0.0 0.0 8108 3656 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23559 0.0 0.0 8108 3656 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23560 0.0 0.0 8108 3692 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23561 0.0 0.0 8108 3672 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23562 0.0 0.0 8108 3660 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
nobody 23563 0.0 0.0 8108 3660 ? S 14:25 0:00
/local/opt/apache2/bin/httpd -k start -DSSL
root 23569 0.0 0.0 1616 468 pts/0 S 14:26 0:00 grep apache

uptime

14:26:08 up 45 days, 22:31, 2 users, load average: 8.14, 8.23, 8.01

It normaly spaws 5 Apache processes when starting. Here, it’s unusual many
processes.

Apache logs messages as before and even rt.log logs as nothing has
happened.

Apache is normaly restarted once a night, due to memory leek which
Mason/Perl/FastCGI is responsible for in some strange way. But this should
not be the problem here.

I noticed that there was a mail-loop from a spam, that looped in the same
time-frame as the server suddenly stopped. But I cannot draw any
connections between those problems. I cannot find anything directly in the
logs that says some problems/alerts with Apache. It just hang, and needs a
restart.

How can I debug and find out what’s wrong? Is there some kind of diffuse
searches in RT that causes hang (search bug) … that may be fixed in
3.6.3 or… the release of 3.6.3 was quite fast after 3.6.2.

Sincerely,
Tomas


Tomas A. P. Olaj, email: tomas.olaj@usit.uio.no, web: folk.uio.no/tomaso
University of Oslo / USIT (Center for Information Technology Services)
System- and Application Management / Applications Management Group


The rt-users Archives

Community help: http://wiki.bestpractical.com
Commercial support: sales@bestpractical.com

Discover RT’s hidden secrets with RT Essentials from O’Reilly Media.
Buy a copy at http://rtbook.bestpractical.com

MFG

Torsten Brumm

http://www.torsten-brumm.de

Don’t know if this helps or not but do you have debug set in your
RT_SiteConfig.pm file ?

e.g.

Set($LogToSyslog , ‘debug’);
Set($LogToScreen , ‘error’);
Set($LogToFile , ‘debug’);
Set($LogDir, ‘/opt/rt/var/log’);
Set($LogToFileNamed , “rt.log”); #log to rt.log

It has helped me out on several occasions!

Alison

Alison Downie, Computing Officer
School of Informatics, University of Edinburgh
Room B21, 5 Forrest Hill, EDINBURGH, EH1 2QL

Tel: 650 3095

The only thing i found sometimes are messages: Mysql server has gone away,
but this is causing not in all cases the problems. Nightly, without RT
usage, the connections from FastCGI are timing out and don’t connect
automatically, but not during the business hours ;-(2007/2/2, Alison Downie alisond@inf.ed.ac.uk:

Don’t know if this helps or not but do you have debug set in your
RT_SiteConfig.pm file ?

e.g.

Set($LogToSyslog , ‘debug’);
Set($LogToScreen , ‘error’);
Set($LogToFile , ‘debug’);
Set($LogDir, ‘/opt/rt/var/log’);
Set($LogToFileNamed , “rt.log”); #log to rt.log

It has helped me out on several occasions!

Alison


++++++++++++++++++++++++++++++++++++++++++++++
Alison Downie, Computing Officer
School of Informatics, University of Edinburgh
Room B21, 5 Forrest Hill, EDINBURGH, EH1 2QL

Tel: 650 3095
++++++++++++++++++++++++++++++++++++++++++++++


The rt-users Archives

Community help: http://wiki.bestpractical.com
Commercial support: sales@bestpractical.com

Discover RT’s hidden secrets with RT Essentials from O’Reilly Media.
Buy a copy at http://rtbook.bestpractical.com

MFG

Torsten Brumm

http://www.torsten-brumm.de

The only thing i found sometimes are messages: Mysql server has gone away,
but this is causing not in all cases the problems. Nightly, without RT
usage, the connections from FastCGI are timing out and don’t connect
automatically, but not during the business hours ;-(

If it fails during not-business-hours, that suggests “the mysql morning
bug” which might be a reasonable google term.

I’m having what might be a related issue. My setup:

RHEL4
Perl 5.8.6 (compiled ourselves)
RT 3.6.3
Apache 2.0.58
mod_perl 2.0.3
mysql-4.1.20-1.RHEL4.1

Our hang happens whenever anyone authenticates to the SelfService interface.
The page header displays, down to “RT Self Service / Open Tickets” header, and
then the web page sits there spinning forever. While it is spinning, mysqld
is using 100% of CPU, and other RT requests are delayed or also hang indefinitely.

The only solution is to restart mysqld. Sometimes that causes the server to
return an error (attached below), but sometimes I also see the runaway Apache
behavior that Tomas Olaj is seeing, and I need to restart Apache as well.

DBD::mysql::st execute failed: MySQL server has gone away at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/Apache/Session/Lock/MySQL.pm
line 70.

Trace begun at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/HTML/Mason/Exceptions.pm
line 129
HTML::Mason::Exceptions::rethrow_exception(‘DBD::mysql::st execute failed:
MySQL server has gone away at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/Apache/Session/Lock/MySQL.pm
line 70.^J’) called at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/Apache/Session/Lock/MySQL.pm
line 70
Apache::Session::lock::MySQL::release_read_lock(‘Apache::Session::lock::MySQL=HASH(0xa37e170)’)
called at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/Apache/Session/Lock/MySQL.pm
line 81
Apache::Session::lock::MySQL::release_all_locks(‘Apache::Session::lock::MySQL=HASH(0xa37e170)’)
called at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/Apache/Session/Lock/MySQL.pm
line 87
Apache::Session::lock::MySQL::DESTROY(‘Apache::Session::lock::MySQL=HASH(0xa37e170)’)
called at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/HTML/Mason/Request.pm
line 1250
eval {…} at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/HTML/Mason/Request.pm
line 1250
HTML::Mason::Request::comp(undef, undef, undef) called at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/HTML/Mason/Request.pm
line 460
eval {…} at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/HTML/Mason/Request.pm
line 460
eval {…} at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/HTML/Mason/Request.pm
line 412
HTML::Mason::Request::exec(‘HTML::Mason::Request::ApacheHandler=HASH(0xa335574)’)
called at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/HTML/Mason/ApacheHandler.pm
line 168
HTML::Mason::Request::ApacheHandler::exec(‘HTML::Mason::Request::ApacheHandler=HASH(0xa335574)’)
called at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/HTML/Mason/ApacheHandler.pm
line 826
HTML::Mason::ApacheHandler::handle_request(‘HTML::Mason::ApacheHandler=HASH(0x94b8c18)’,
‘Apache2::RequestRec=SCALAR(0x936c180)’) called at
/local/rh/rhel4/depot/rt-3.6.3/bin/webmux.pl line 123
eval {…} at /local/rh/rhel4/depot/rt-3.6.3/bin/webmux.pl line 123
RT::Mason::handler(‘Apache2::RequestRec=SCALAR(0x936c180)’) called at -e line 0
eval {…} at -e line 0

Tom Holub (tom_holub@LS.Berkeley.EDU, 510-642-9069)
Director of Computing, College of Letters & Science
249 Campbell Hall
http://LS.berkeley.edu/lscr/

You’re hang is of another sort, it’s performance problem, search in
archives message from Dirk Pape just few days ago, as well you can
grab a patch from SVN.On 2/3/07, Tom Holub tom@ls.berkeley.edu wrote:

I’m having what might be a related issue. My setup:

RHEL4
Perl 5.8.6 (compiled ourselves)
RT 3.6.3
Apache 2.0.58
mod_perl 2.0.3
mysql-4.1.20-1.RHEL4.1

Our hang happens whenever anyone authenticates to the SelfService interface.
The page header displays, down to “RT Self Service / Open Tickets” header, and
then the web page sits there spinning forever. While it is spinning, mysqld
is using 100% of CPU, and other RT requests are delayed or also hang indefinitely.

The only solution is to restart mysqld. Sometimes that causes the server to
return an error (attached below), but sometimes I also see the runaway Apache
behavior that Tomas Olaj is seeing, and I need to restart Apache as well.

DBD::mysql::st execute failed: MySQL server has gone away at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/Apache/Session/Lock/MySQL.pm
line 70.

Trace begun at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/HTML/Mason/Exceptions.pm
line 129
HTML::Mason::Exceptions::rethrow_exception(‘DBD::mysql::st execute failed:
MySQL server has gone away at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/Apache/Session/Lock/MySQL.pm
line 70.^J’) called at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/Apache/Session/Lock/MySQL.pm
line 70
Apache::Session::lock::MySQL::release_read_lock(‘Apache::Session::lock::MySQL=HASH(0xa37e170)’)
called at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/Apache/Session/Lock/MySQL.pm
line 81
Apache::Session::lock::MySQL::release_all_locks(‘Apache::Session::lock::MySQL=HASH(0xa37e170)’)
called at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/Apache/Session/Lock/MySQL.pm
line 87
Apache::Session::lock::MySQL::DESTROY(‘Apache::Session::lock::MySQL=HASH(0xa37e170)’)
called at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/HTML/Mason/Request.pm
line 1250
eval {…} at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/HTML/Mason/Request.pm
line 1250
HTML::Mason::Request::comp(undef, undef, undef) called at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/HTML/Mason/Request.pm
line 460
eval {…} at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/HTML/Mason/Request.pm
line 460
eval {…} at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/HTML/Mason/Request.pm
line 412
HTML::Mason::Request::exec(‘HTML::Mason::Request::ApacheHandler=HASH(0xa335574)’)
called at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/HTML/Mason/ApacheHandler.pm
line 168
HTML::Mason::Request::ApacheHandler::exec(‘HTML::Mason::Request::ApacheHandler=HASH(0xa335574)’)
called at
/local/rh/rhel4/depot/perl-5.8.6/lib/site_perl/5.8.6/HTML/Mason/ApacheHandler.pm
line 826
HTML::Mason::ApacheHandler::handle_request(‘HTML::Mason::ApacheHandler=HASH(0x94b8c18)’,
‘Apache2::RequestRec=SCALAR(0x936c180)’) called at
/local/rh/rhel4/depot/rt-3.6.3/bin/webmux.pl line 123
eval {…} at /local/rh/rhel4/depot/rt-3.6.3/bin/webmux.pl line 123
RT::Mason::handler(‘Apache2::RequestRec=SCALAR(0x936c180)’) called at -e line 0
eval {…} at -e line 0


Tom Holub (tom_holub@LS.Berkeley.EDU, 510-642-9069)
Director of Computing, College of Letters & Science
249 Campbell Hall
http://LS.berkeley.edu/lscr/


The rt-users Archives

Community help: http://wiki.bestpractical.com
Commercial support: sales@bestpractical.com

Discover RT’s hidden secrets with RT Essentials from O’Reilly Media.
Buy a copy at http://rtbook.bestpractical.com

Best regards, Ruslan.

Ruslan Zakirov wrote:

You’re hang is of another sort, it’s performance problem, search in
archives message from Dirk Pape just few days ago, as well you can
grab a patch from SVN.

Thanks, you’re right, Dick Pape’s fix worked for me.
(It’s still a bit slow, but doesn’t hang indefinitely).

Tom Holub (tom_holub@LS.Berkeley.EDU, 510-642-9069)
Director of Computing, College of Letters & Science
249 Campbell Hall
http://LS.berkeley.edu/computing/

Ruslan Zakirov wrote:

You’re hang is of another sort, it’s performance problem, search in
archives message from Dirk Pape just few days ago, as well you can
grab a patch from SVN.

Thanks, you’re right, Dick Pape’s fix worked for me.
(It’s still a bit slow, but doesn’t hang indefinitely).
I’m working on fix that will make that query fast and several other queries.


Tom Holub (tom_holub@LS.Berkeley.EDU, 510-642-9069)
Director of Computing, College of Letters & Science
249 Campbell Hall
http://LS.berkeley.edu/computing/

Best regards, Ruslan.

On the marvelous Fri, 2 Feb 2007, Torsten Brumm wrote kindly to me …

yes, we have exactly the same under RHAS4 with Apache, FastCGI and mysql…
but we have to restart 4 times the day :frowning:

Torsten

Apache is normaly restarted once a night, due to memory leek which
Mason/Perl/FastCGI is responsible for in some strange way. But this should
not be the problem here.

But, I don’t think that this problem we experience is due to the mem leak
caused by Mason/Perl/FastCGI. I read at the wicki that FastCGI is still
recommended to use, since the other on-going fcgid is still un-stable.

But, RT/Apache just suddenly hangs in business hours.

Regards,
Tomas

Tomas A. P. Olaj, email: tomas.olaj@usit.uio.no, web: folk.uio.no/tomaso
University of Oslo / USIT (Center for Information Technology Services)
System- and Application Management / Applications Management Group

On the marvelous Fri, 2 Feb 2007, Alison Downie wrote kindly to me …

Set($LogToSyslog , ‘debug’);
Set($LogToScreen , ‘error’);
Set($LogToFile , ‘debug’);
Set($LogDir, ‘/opt/rt/var/log’);
Set($LogToFileNamed , “rt.log”); #log to rt.log

It has helped me out on several occasions!

Alison

Yes, but not LogToFile.

Also:
@LogToSyslogConf = (socket => ‘inet’) unless (@LogToSyslogConf);

is set.

Sincerely,
Tomas

Tomas A. P. Olaj, email: tomas.olaj@usit.uio.no, web: folk.uio.no/tomaso
University of Oslo / USIT (Center for Information Technology Services)
System- and Application Management / Applications Management Group

On the marvelous Fri, 2 Feb 2007, Torsten Brumm wrote kindly to me …

The only thing i found sometimes are messages: Mysql server has gone away,
but this is causing not in all cases the problems. Nightly, without RT
usage, the connections from FastCGI are timing out and don’t connect
automatically, but not during the business hours ;-(

We are very satisfied with our Postgres installation. :slight_smile:

Our Postgres admins has created an administration framework used to
administrate postgreSQL installations at The University of Oslo.
This framework is a set of scripts, websides, C code, SQL definitions
and standards that define storage,backups,maintenance and administration
procedures. We can run this framework in a standalone server or in a
SG-cluster from HP. Our RT production instance runs on a service guard
two-node postgres cluster. Still faaar away from the sizes of all of our
Oracle installations. :wink:

Thanks, Torsten, good to hear that I am not lonely about this problem.

Cheers,
Tomas

Tomas A. P. Olaj, email: tomas.olaj@usit.uio.no, web: folk.uio.no/tomaso
University of Oslo / USIT (Center for Information Technology Services)
System- and Application Management / Applications Management Group

Hi,

We have checked the problem more closesly, and the reason why our
RT/Apache server stops is due to different variants of mail loops.

We still have problems with other loop variants consuming all resources.
Extended mailgate errors leading to loops, spam going into loop. How can
we make mailgate a bit more smarter?

Does anyone have a mail loop detector in front of their RT installations.

We have a high severity now with our RT in production. The load is
increasing all the time.

Our commonly used cron-script removing some mail loops:

#!/site/perl-5.8.7/bin/perl

Author: Petter Reinholdtsen

Date: 2006-01-13

License: GNU Public License v2 or later

Look at all tickets, and remove all queue addresses from requestor,

cc and admincc. This reduces the amount of bounce emails sent to

the RT admins.

use warnings;
use strict;
use Getopt::Std;

Location of RT’s libs and scripts

Remember to change to correct path on current RT instance

use lib (“/site/rt3/local/lib”, “/site/rt3/lib”);

package RT;

use RT::Interface::CLI qw(CleanEnv);
use RT::Queue;
use RT::Queues;
use RT::Tickets;

my %opts;
Getopt::Std::getopts(“dn”, %opts);

my $debug = $opts{‘d’} || 0;
my $dryrun = $opts{‘n’} || 0;

$| = 1;

Find all queue addresses of enabled queues

my @queueaddrs =
(
# Aliases for e-mail lists are listed here:
‘e-mail-list@domain.com’,
);

my $ticketcount = 0;
my $starttime = time();

CleanEnv(); # Clean our the environment
RT::LoadConfig(); # Load the RT configuration
RT::Init(); # Initialise RT

my $user = RT::User->new( $RT::SystemUser );

Merge static list with dynamic list

@queueaddrs = (load_queue_addresses(), @queueaddrs);

Loop over all addresses, remove them from the tickets were they are

registered as watchers

for my $address( sort @queueaddrs ) {
print “Removing address ‘$address’ from all tickets\n” if $debug;
my $tickets = new RT::Tickets($RT::SystemUser);
$tickets->FromSQL( “Watcher.EmailAddress = ‘$address’” );
while( my $ticket = $tickets->Next ) {
$ticketcount++;
my $id = $ticket->Id;
if ($dryrun) {
print “Want to remove $address as watcher from ticket #$id\n”;
} else {
$RT::Logger->info("Removed queue address $address as watcher
".
“from ticket #$id”);
$ticket->DeleteWatcher(Email => $address, Type => $_ )
for( qw(Cc AdminCc Requestor) );
}
}
}

my $duration = time() - $starttime;
$RT::Logger->info(“Processing of $ticketcount tickets took $duration
seconds”);

sub load_queue_addresses {
my $queues = new RT::Queues($RT::SystemUser);
$queues->LimitToEnabled();
my @queueaddrs;
foreach my $queue (@{$queues->ItemsArrayRef()}) {
for my $address ($queue->CorrespondAddress,
$queue->CommentAddress) {
next unless $user->LoadByEmail( $address );
push @queueaddrs, $address;
print “Found queue address ‘$address’\n” if $debug;
}
}
return @queueaddrs;
}

Tomas A. P. Olaj, email: tomas.olaj@usit.uio.no, web: folk.uio.no/tomaso
University of Oslo / USIT (Center for Information Technology Services)
System- and Application Management / Applications Management Group

Have you seen option RTAddressRegexp in the RT config?
Read Request Tracker Wiki and
Request Tracker Wiki 2/7/07, Tomas Olaj tomas.olaj@usit.uio.no wrote:

Hi,

We have checked the problem more closesly, and the reason why our
RT/Apache server stops is due to different variants of mail loops.

We still have problems with other loop variants consuming all resources.
Extended mailgate errors leading to loops, spam going into loop. How can
we make mailgate a bit more smarter?

Does anyone have a mail loop detector in front of their RT installations.

We have a high severity now with our RT in production. The load is
increasing all the time.

Our commonly used cron-script removing some mail loops:

#!/site/perl-5.8.7/bin/perl

Author: Petter Reinholdtsen

Date: 2006-01-13

License: GNU Public License v2 or later

Look at all tickets, and remove all queue addresses from requestor,

cc and admincc. This reduces the amount of bounce emails sent to

the RT admins.

use warnings;
use strict;
use Getopt::Std;

Location of RT’s libs and scripts

Remember to change to correct path on current RT instance

use lib (“/site/rt3/local/lib”, “/site/rt3/lib”);

package RT;

use RT::Interface::CLI qw(CleanEnv);
use RT::Queue;
use RT::Queues;
use RT::Tickets;

my %opts;
Getopt::Std::getopts(“dn”, %opts);

my $debug = $opts{‘d’} || 0;
my $dryrun = $opts{‘n’} || 0;

$| = 1;

Find all queue addresses of enabled queues

my @queueaddrs =
(
# Aliases for e-mail lists are listed here:
‘e-mail-list@domain.com’,
);

my $ticketcount = 0;
my $starttime = time();

CleanEnv(); # Clean our the environment
RT::LoadConfig(); # Load the RT configuration
RT::Init(); # Initialise RT

my $user = RT::User->new( $RT::SystemUser );

Merge static list with dynamic list

@queueaddrs = (load_queue_addresses(), @queueaddrs);

Loop over all addresses, remove them from the tickets were they are

registered as watchers

for my $address( sort @queueaddrs ) {
print “Removing address ‘$address’ from all tickets\n” if $debug;
my $tickets = new RT::Tickets($RT::SystemUser);
$tickets->FromSQL( “Watcher.EmailAddress = ‘$address’” );
while( my $ticket = $tickets->Next ) {
$ticketcount++;
my $id = $ticket->Id;
if ($dryrun) {
print “Want to remove $address as watcher from ticket #$id\n”;
} else {
$RT::Logger->info("Removed queue address $address as watcher
".
“from ticket #$id”);
$ticket->DeleteWatcher(Email => $address, Type => $_ )
for( qw(Cc AdminCc Requestor) );
}
}
}

my $duration = time() - $starttime;
$RT::Logger->info(“Processing of $ticketcount tickets took $duration
seconds”);

sub load_queue_addresses {
my $queues = new RT::Queues($RT::SystemUser);
$queues->LimitToEnabled();
my @queueaddrs;
foreach my $queue (@{$queues->ItemsArrayRef()}) {
for my $address ($queue->CorrespondAddress,
$queue->CommentAddress) {
next unless $user->LoadByEmail( $address );
push @queueaddrs, $address;
print “Found queue address ‘$address’\n” if $debug;
}
}
return @queueaddrs;
}


Tomas A. P. Olaj, email: tomas.olaj@usit.uio.no, web: folk.uio.no/tomaso
University of Oslo / USIT (Center for Information Technology Services)
System- and Application Management / Applications Management Group


The rt-users Archives

Community help: http://wiki.bestpractical.com
Commercial support: sales@bestpractical.com

Discover RT’s hidden secrets with RT Essentials from O’Reilly Media.
Buy a copy at http://rtbook.bestpractical.com

Best regards, Ruslan.