We’re actually seeing something very similar, but we’re using lighttpd instead of Apache. It does seem that it’s always a fastcgi process that hangs for us. Instead of returning a 500 error, though, our FastCGI process just starts eating up 99% of the CPU until we restart lighttpd.
We’re using 3.6.3. We’re hoping that upgrading RT to a more recent version will solve this problem.-----Original Message-----
Sent: Tuesday, June 17, 2008 1:09pm
To: “RT Users” email@example.com
Subject: [rt-users] RT goes down with 500 error - “(2)No such file or directory: FastCGI: failed to connect to server “/opt/rt3/bin/mason_handler.fcgi”:”
My users starting calling and emailing today, apparently the RT3 web
site was down with a 500 error. I am currently running 3.6.6 on RHEL4.
The error in the apache log was;
[Tue Jun 17 14:15:48 2008] [error] [client 10.127.5.6] (2)No such file
FastCGI: failed to connect to server
"/opt/rt3/bin/mason_handler.fcgi": connect() failed, referer:
the file “/opt/rt3/bin/mason_handler.fcgi” exists with executable
permissions, and read permissions for the apache user.
Restarting apache was a work-around to the problem and the site is
back up and running now. However I am interested to know what the root
cause was and other than monitoring for it, is there any way to
prevent it causing downtime again?
This message was sent using IMP, the Internet Messaging Program.
Community help: http://wiki.bestpractical.com
Commercial support: firstname.lastname@example.org
Discover RT’s hidden secrets with RT Essentials from O’Reilly Media.
Buy a copy at http://rtbook.bestpractical.com