RT flaky under FastCGI?

I’ve searched the docs and archives, and can’t seem to find anyone else
experiencing this problem, so I’m afraid it might just be somehow
peculiar to our setup – but here goes:

We recently switched from running RT via mod_perl to running it as a
FastCGI, so that we could have two different instances on the same
server. Everything is configured right and it works fine, most of the
time. But when first accessed every morning, and occasionally at random
times during the day, we just get a 500 Internal Server Error when
trying to pull up the Web page, and the only thing that fixes it is an
Apache restart. It’s the same problem both in a very old version and a
relatively recent version of RT, though not the most recent.

Just killing the mason_handler.fcgi process has no effect. Running it
as a dynamic vs. static FastCGI has no effect. And we get no errors
anywhere I can see, either in the Apache error log or in the RT log
under /tmp.

The system:
FreeBSD 4.9
RT 3.4.2 (first instance), RT 2.0.14 (second instance)
Apache 1.3.33
Perl 5.8.3
CGI.pm 3.17
mod_perl 1.29 (if that matters)
mod_fastcgi 2.4.2

The recurrence every morning must be related either to the cleaning of
old session data or the stop and start of Apache every night for log
rotation, except we can’t replicate it via playing with the session data
and Apache stops/starts during the day, so I’m at a loss. Are people
running RT under FastCGI with no problems? Does anyone happen to know
of some bug that would explain this problem, or at least have ideas as
to how to pin it down further?

I’d be much obliged. Thanks!

  ab

But when first accessed every morning, and occasionally at random
times during the day, we just get a 500 Internal Server Error when
trying to pull up the Web page, and the only thing that fixes it is
an Apache restart.

This reminds me of a somewhat similar (but less severe) problem with
the RT install I work with. I always use it with IE (displaying via
remote desktop, in case that matters), and, the first time through, I
log in and get the RT-at-a-glance page normally. When I browser-back
to that page, or pick it off the history dropdown next to the Back
button, the first time, I get IE’s “this page cannot be displayed"
page. More precisely, I get a page whose body says “The page cannot be
displayed” with boilerplate list of suggestions, and the title bar says
"Cannot find server”. I hit the reload button and it comes up
normally, and for the rest of that session, I can Back to that page, or
jump to it off the history dropdown, and it works fine. Start a new
IE (with or without killing the old), though, and log in again, and the
first time I Back or history-dropdown to the RT-at-a-glance page, the
same symptom occurs.

Now, this might actually be a problem with IE - goodness knows it
wouldn’t be the first time Microsoft broke something - but with it
manifesting only with RT in my (admittedly very limited) experience,
there presumably is something RT is doing that’s responsible for, at
least, provoking the problem.

Versions:
Windows Server 2003 Enterprise Edition
IE 6.0.3790.1830
RT 3.4.2
"Server: Apache/1.3.33 (Unix) PHP/5.0.4 mod_perl/1.29 mod_ssl/2.8.22 OpenSSL/0.9.7f"

If there are any other relevant version numbers, tell me how to get
them and I’ll be happy to supply them. I don’t really know very much
about this stuff yet…

/~\ The ASCII der Mouse
\ / Ribbon Campaign
X Against HTML mouse@rodents.montreal.qc.ca
/ \ Email! 7D C8 61 52 5D E7 2D 39 4E F1 31 3E E8 B3 27 4B

page. More precisely, I get a page whose body says “The page cannot be
displayed” with boilerplate list of suggestions, and the title bar says
"Cannot find server". I hit the reload button and it comes up
normally, and for the rest of that session, I can Back to that page, or
jump to it off the history dropdown, and it works fine. Start a new
IE (with or without killing the old), though, and log in again, and the
first time I Back or history-dropdown to the RT-at-a-glance page, the
same symptom occurs.

I’ve seen that with IE, especially under SSL. If I recall correctly,
there are Apache settings one can tweak to work around the issue.

Adam Bernstein said the following on 3/10/2006 5:52 PM:

I’ve searched the docs and archives, and can’t seem to find anyone else
experiencing this problem, so I’m afraid it might just be somehow
peculiar to our setup – but here goes:

I just asked this question the other day… :smiley: What seems to work for me
is adding:

-appConnTimeout 240 -startDelay 20

to the FastCGI options stanza in httpd.conf

Best,
–Glenn
"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety."
~Benjamin Franklin, Historical Review of Pennsylvania, 1759

The recurrence every morning must be related either to the cleaning
of old session data or the stop and start of Apache every night for
log rotation, except we can’t replicate it via playing with the
session data and Apache stops/starts during the day, so I’m at a
loss. Are people running RT under FastCGI with no problems? Does
anyone happen to know of some bug that would explain this problem,
or at least have ideas as to how to pin it down further?

There used to be this effect called “the morning bug” which was often
observed with apache and mod_perl. It is observed when you have a DB
that times out connections and a DB connection cache layer in your
app that doesn’t notice the DB went away. The Apache::DBD got
smarter and the problem mostly vanished.

I’d look to that timeout type of thing since you say it happens in
the morning.

Also, FWIW, I run RT 3.4.4 from FreeBSD ports on a FreeBSD 6.0 system
and apache2, perl 5.8.7. The mod_fastcgi in apache 1 was not all
that great. If you can, try upgrading it all to the latest of
everything, or at least move to apache2 and a newer mod_fastcgi.

Our setup is completely stable and pretty darned fast.

I had similar sounding problems when running RT under FastCGI coupled
with ‘incomplete header’ errors in Apache logs.

I fixed this by increasing the number of FastCGI processes in the
Apache configuration. i.e.

FastCgiServer /usr/local/rt3/bin/mason_handler.fcgi -idle-timeout 120
-processes 6

Rather than ‘processes 2 or 4’ . Also I was advised to add
’-init-start-delay 30’ although I don’t know how delaying start-up
times would really help…

I had similar sounding problems when running RT under FastCGI coupled
with ‘incomplete header’ errors in Apache logs.

I fixed this by increasing the number of FastCGI processes in the
Apache configuration. i.e.

FastCgiServer /usr/local/rt3/bin/mason_handler.fcgi -idle-timeout 120
-processes 6

You want to bump up the fastcgi timeout to be longer than the RT
timeout.

For me, RT’s timeout (or is it the web browser?) is 300 and I set
FastCGIServer timeout to 305 and that pretty much solved all of my
FastCGI problems. I run with 8 processes, but that’s just me :slight_smile: