RT 3.6.5 causes connection aborts resulting in 500 error

Kage · May 15, 2009, 9:17pm

First and foremost, I use 3.6.5 since that’s what exists in Ubuntu
Hardy’s repository.

Essentially what happens is I can use RT for an extended period of
time (from 1 hour to 10 hours), and eventually, it’ll stop working,
resulting in a 500 Internal Server Error.

Error log from Apache2 via debug mode:

[Fri May 15 16:07:28 2009] [notice] Apache/2.2.8 (Ubuntu)
mod_ssl/2.2.8 OpenSSL/0.9.8g mod_perl/2.0.3 Perl/v5.8.8 configured –
resuming normal operations
[Fri May 15 20:11:10 2009] [crit]: Apache2::RequestIO::rflush: (103)
Software caused connection abort at
/usr/share/perl5/HTML/Mason/ApacheHandler.pm line 1035
(/usr/share/request-tracker3.6/libexec/webmux.pl:127)
[Fri May 15 20:11:11 2009] [crit]: Apache2::RequestIO::rflush: (103)
Software caused connection abort at
/usr/share/perl5/HTML/Mason/ApacheHandler.pm line 1035
(/usr/share/request-tracker3.6/libexec/webmux.pl:127)
[Fri May 15 20:11:13 2009] [crit]: Apache2::RequestIO::rflush: (103)
Software caused connection abort at
/usr/share/perl5/HTML/Mason/ApacheHandler.pm line 1035
(/usr/share/request-tracker3.6/libexec/webmux.pl:127)
[Fri May 15 20:11:16 2009] [crit]: Apache2::RequestIO::rflush: (103)
Software caused connection abort at
/usr/share/perl5/HTML/Mason/ApacheHandler.pm line 1035
(/usr/share/request-tracker3.6/libexec/webmux.pl:127)
[Fri May 15 20:11:19 2009] [crit]: Apache2::RequestIO::rflush: (103)
Software caused connection abort at
/usr/share/perl5/HTML/Mason/ApacheHandler.pm line 1035
(/usr/share/request-tracker3.6/libexec/webmux.pl:127)
[Fri May 15 20:11:53 2009] [crit]: Apache2::RequestIO::rflush: (104)
Connection reset by peer at
/usr/share/perl5/HTML/Mason/ApacheHandler.pm line 1035
(/usr/share/request-tracker3.6/libexec/webmux.pl:127)

~ Kage
http://vitund.com

Kage · May 18, 2009, 2:14pm

Also sent this mail to rt-devel…

First and foremost, I use 3.6.5 since that’s what exists in Ubuntu
Hardy’s repository.

Essentially what happens is I can use RT for an extended period of
time (from 1 hour to 10 hours), and eventually, it’ll stop working,
resulting in a 500 Internal Server Error.

Error log from Apache2 via debug mode:

[Fri May 15 16:07:28 2009] [notice] Apache/2.2.8 (Ubuntu)
mod_ssl/2.2.8 OpenSSL/0.9.8g mod_perl/2.0.3 Perl/v5.8.8 configured –
resuming normal operations
[Fri May 15 20:11:10 2009] [crit]: Apache2::RequestIO::rflush: (103)
Software caused connection abort at
/usr/share/perl5/HTML/Mason/ApacheHandler.pm line 1035
(/usr/share/request-tracker3.6/libexec/webmux.pl:127)
[Fri May 15 20:11:11 2009] [crit]: Apache2::RequestIO::rflush: (103)
Software caused connection abort at
/usr/share/perl5/HTML/Mason/ApacheHandler.pm line 1035
(/usr/share/request-tracker3.6/libexec/webmux.pl:127)
[Fri May 15 20:11:13 2009] [crit]: Apache2::RequestIO::rflush: (103)
Software caused connection abort at
/usr/share/perl5/HTML/Mason/ApacheHandler.pm line 1035
(/usr/share/request-tracker3.6/libexec/webmux.pl:127)
[Fri May 15 20:11:16 2009] [crit]: Apache2::RequestIO::rflush: (103)
Software caused connection abort at
/usr/share/perl5/HTML/Mason/ApacheHandler.pm line 1035
(/usr/share/request-tracker3.6/libexec/webmux.pl:127)
[Fri May 15 20:11:19 2009] [crit]: Apache2::RequestIO::rflush: (103)
Software caused connection abort at
/usr/share/perl5/HTML/Mason/ApacheHandler.pm line 1035
(/usr/share/request-tracker3.6/libexec/webmux.pl:127)
[Fri May 15 20:11:53 2009] [crit]: Apache2::RequestIO::rflush: (104)
Connection reset by peer at
/usr/share/perl5/HTML/Mason/ApacheHandler.pm line 1035
(/usr/share/request-tracker3.6/libexec/webmux.pl:127)

~ Kage
http://vitund.com

Kage · May 19, 2009, 9:23pm

I hate to bump my own thread, but I kind of need to find a fix for
this extremely soon. My bosses are breathing down my neck. Thanks!On Mon, May 18, 2009 at 10:14 AM, Kage kagekonjou@gmail.com wrote:

Also sent this mail to rt-devel…

First and foremost, I use 3.6.5 since that’s what exists in Ubuntu
Hardy’s repository.

Essentially what happens is I can use RT for an extended period of
time (from 1 hour to 10 hours), and eventually, it’ll stop working,
resulting in a 500 Internal Server Error.

Error log from Apache2 via debug mode:

[Fri May 15 16:07:28 2009] [notice] Apache/2.2.8 (Ubuntu)
mod_ssl/2.2.8 OpenSSL/0.9.8g mod_perl/2.0.3 Perl/v5.8.8 configured –
resuming normal operations
[Fri May 15 20:11:10 2009] [crit]: Apache2::RequestIO::rflush: (103)
Software caused connection abort at
/usr/share/perl5/HTML/Mason/ApacheHandler.pm line 1035
(/usr/share/request-tracker3.6/libexec/webmux.pl:127)
[Fri May 15 20:11:11 2009] [crit]: Apache2::RequestIO::rflush: (103)
Software caused connection abort at
/usr/share/perl5/HTML/Mason/ApacheHandler.pm line 1035
(/usr/share/request-tracker3.6/libexec/webmux.pl:127)
[Fri May 15 20:11:13 2009] [crit]: Apache2::RequestIO::rflush: (103)
Software caused connection abort at
/usr/share/perl5/HTML/Mason/ApacheHandler.pm line 1035
(/usr/share/request-tracker3.6/libexec/webmux.pl:127)
[Fri May 15 20:11:16 2009] [crit]: Apache2::RequestIO::rflush: (103)
Software caused connection abort at
/usr/share/perl5/HTML/Mason/ApacheHandler.pm line 1035
(/usr/share/request-tracker3.6/libexec/webmux.pl:127)
[Fri May 15 20:11:19 2009] [crit]: Apache2::RequestIO::rflush: (103)
Software caused connection abort at
/usr/share/perl5/HTML/Mason/ApacheHandler.pm line 1035
(/usr/share/request-tracker3.6/libexec/webmux.pl:127)
[Fri May 15 20:11:53 2009] [crit]: Apache2::RequestIO::rflush: (104)
Connection reset by peer at
/usr/share/perl5/HTML/Mason/ApacheHandler.pm line 1035
(/usr/share/request-tracker3.6/libexec/webmux.pl:127)

–
~ Kage
http://vitund.com
http://hackthissite.org

~ Kage
http://vitund.com

Tom_Lahti · May 19, 2009, 9:31pm

Kage wrote:

Essentially what happens is I can use RT for an extended period of
time (from 1 hour to 10 hours), and eventually, it’ll stop working,
resulting in a 500 Internal Server Error.

Sounds like resource exhaustion of some kind, perhaps a memory or some other
type of leak in mason, perl, apache, or RT. I hate to be vague, but it
could be anything. You probably need to step outside “what is in hardy’s
repository” and start upgrading things, probably starting with perl itself.

But I would start by looking for more clues when the system is in the “not
working” state. Look at memory usage, CPU usage, and the like. See if
apache is responding to other non-RT page requests. Doing so will help you
narrow it down.

– ============================
Tom Lahti
BIT Statement LLC

(425)251-0833 x 117
http://www.bitstatement.net/
– ============================

Nick_Geron · May 19, 2009, 10:21pm

Kage,

I’m seeing similar issues with 3.8.2. Take a look at my post
"apache/mason software caused connection abort."

Running with Tom’s thought, maybe we can compare setups. My two systems
are identical builds of a gentoo stage 4 on VMWare ESX3i. My default vm
resources are pretty low. Each host has 256M and a single, virtual cpu
witch the systems see as a Xeon E5410. Now I don’t see how a nearly
idle system (in testing) could have resource issues with regard to the
CPU, but looking at my puny alloted memory, I could see how that might
cause a crunch.

What’s your memory capacity and usage look like?

-Nick

Tom Lahti wrote:

Kage · May 20, 2009, 1:13pm

Memory capacity is currently set to 512MB on our Hardy RT VM. CPU is
capped to a whole core for itself (so, something like 2.8GHz). Usage
is practically none. I’m not so sure memory is the issue, but I’ll
bump the VM’s memory up and start pounding the Hardy RT VM and see if
that fixes it.On Tue, May 19, 2009 at 6:21 PM, Nick Geron ngeron@corenap.com wrote:

Kage,

I’m seeing similar issues with 3.8.2. Take a look at my post “apache/mason
software caused connection abort.”

Running with Tom’s thought, maybe we can compare setups. My two systems are
identical builds of a gentoo stage 4 on VMWare ESX3i. My default vm
resources are pretty low. Each host has 256M and a single, virtual cpu
witch the systems see as a Xeon E5410. Now I don’t see how a nearly idle
system (in testing) could have resource issues with regard to the CPU, but
looking at my puny alloted memory, I could see how that might cause a
crunch.

What’s your memory capacity and usage look like?

-Nick

Tom Lahti wrote:

Kage wrote:

Essentially what happens is I can use RT for an extended period of
time (from 1 hour to 10 hours), and eventually, it’ll stop working,
resulting in a 500 Internal Server Error.

Sounds like resource exhaustion of some kind, perhaps a memory or some
other
type of leak in mason, perl, apache, or RT. I hate to be vague, but it
could be anything. You probably need to step outside “what is in hardy’s
repository” and start upgrading things, probably starting with perl
itself.

But I would start by looking for more clues when the system is in the “not
working” state. Look at memory usage, CPU usage, and the like. See if
apache is responding to other non-RT page requests. Doing so will help
you
narrow it down.

~ Kage
http://vitund.com

Kage · May 20, 2009, 1:41pm

Same error is occurring with 1GB of memory on the VM. Everything else
in Apache works just fine, but RT is dead until I restart Apache2.On Wed, May 20, 2009 at 9:13 AM, Kage kagekonjou@gmail.com wrote:

Memory capacity is currently set to 512MB on our Hardy RT VM. CPU is
capped to a whole core for itself (so, something like 2.8GHz). Usage
is practically none. I’m not so sure memory is the issue, but I’ll
bump the VM’s memory up and start pounding the Hardy RT VM and see if
that fixes it.

On Tue, May 19, 2009 at 6:21 PM, Nick Geron ngeron@corenap.com wrote:

Kage,

I’m seeing similar issues with 3.8.2. Take a look at my post “apache/mason
software caused connection abort.”

Running with Tom’s thought, maybe we can compare setups. My two systems are
identical builds of a gentoo stage 4 on VMWare ESX3i. My default vm
resources are pretty low. Each host has 256M and a single, virtual cpu
witch the systems see as a Xeon E5410. Now I don’t see how a nearly idle
system (in testing) could have resource issues with regard to the CPU, but
looking at my puny alloted memory, I could see how that might cause a
crunch.

What’s your memory capacity and usage look like?

-Nick

Tom Lahti wrote:

Kage wrote:

Essentially what happens is I can use RT for an extended period of
time (from 1 hour to 10 hours), and eventually, it’ll stop working,
resulting in a 500 Internal Server Error.

Sounds like resource exhaustion of some kind, perhaps a memory or some
other
type of leak in mason, perl, apache, or RT. I hate to be vague, but it
could be anything. You probably need to step outside “what is in hardy’s
repository” and start upgrading things, probably starting with perl
itself.

But I would start by looking for more clues when the system is in the “not
working” state. Look at memory usage, CPU usage, and the like. See if
apache is responding to other non-RT page requests. Doing so will help
you
narrow it down.

–
~ Kage
http://vitund.com
http://hackthissite.org

~ Kage
http://vitund.com

Tom_Lahti · May 20, 2009, 7:51pm

Kage wrote:

Same error is occurring with 1GB of memory on the VM. Everything else
in Apache works just fine, but RT is dead until I restart Apache2.

As I said before:

But I would start by looking for more clues when the system is in the “not
working” state. Look at memory usage, CPU usage, and the like. See if
apache is responding to other non-RT page requests. Doing so will help
you narrow it down.

In other words, when it breaks next, DON’T just restart apache2. Log into
the system and poke around while its broken. Try to load a web page
through apache that is not RT-related while its broken. Look at the
memory usage while its broken. Look at CPU load while its broken. Poke
around in all the logs you have in /var/log for recent messages. See if you
can narrow it down any.

Taking wild stabs and guesses at stuff is a pet peeve of mine; it is not
“problem-solving”. Be deterministic rather than guessing and you’ll be more
efficient (and learn to be more self-sufficient at the same time).

– ============================
Tom Lahti
BIT Statement LLC

(425)251-0833 x 117
http://www.bitstatement.net/
– ============================

Kage · May 20, 2009, 9:31pm

Well, to reiterate what I said, I did try other Apache2 pages while it
was broken. They load just fine with no errors, including Perl
scripts. CPU load is 0%, Load is 0.01 or around there across the
board. Memory is about the same as after the VM boots up (about 100MB
in use). The logs say exactly the same thing as in my first E-Mail.
I’m not sure how else to narrow it down. Nothing else is
disfunctional in the VM except for RT. I have also rebuilt this VM
from scratch about 4 times now trying to see if perhaps that is an
issue in and of itself, and the error is recurring.

Any other ideas? I can’t seem to narrow it down any more using these methods.On Wed, May 20, 2009 at 3:51 PM, Tom Lahti toml@bitstatement.net wrote:

Kage wrote:

Same error is occurring with 1GB of memory on the VM. Everything else
in Apache works just fine, but RT is dead until I restart Apache2.

As I said before:

But I would start by looking for more clues when the system is in the “not
working” state. Look at memory usage, CPU usage, and the like. See if
apache is responding to other non-RT page requests. Doing so will help
you narrow it down.

In other words, when it breaks next, DON’T just restart apache2. Log into
the system and poke around while its broken. Try to load a web page
through apache that is not RT-related while its broken. Look at the
memory usage while its broken. Look at CPU load while its broken. Poke
around in all the logs you have in /var/log for recent messages. See if you
can narrow it down any.

Taking wild stabs and guesses at stuff is a pet peeve of mine; it is not
“problem-solving”. Be deterministic rather than guessing and you’ll be more
efficient (and learn to be more self-sufficient at the same time).

–
– ============================
Tom Lahti
BIT Statement LLC

(425)251-0833 x 117
http://www.bitstatement.net/
– ============================

~ Kage
http://vitund.com

Nick_Geron · May 20, 2009, 10:15pm

I know you said you don’t suspect a memory issue on your end, but I have
to report, once I upped our 3.8.2 VMs from 256M to 1G per, I have yet to
see the error repeated. Something that may be a quite different between
our systems is user load. I’m the only one poking around on ours.
Therefore our test systems only have to support one live session.

I suspect from your first posts that systems are live (lots of users)?
I can’t speak beyond my own anecdotal evidence, but maybe someone on the
list can give us a quick calculation for the average memory required per
live user/session. If so, you could at least use that to verify that
you’re not hitting a resource limit.

-Nick

Kage wrote:

Tom_Lahti · May 20, 2009, 10:58pm

Kage wrote:

Well, to reiterate what I said, I did try other Apache2 pages while it
was broken. They load just fine with no errors, including Perl
scripts. CPU load is 0%, Load is 0.01 or around there across the
board. Memory is about the same as after the VM boots up (about 100MB
in use). The logs say exactly the same thing as in my first E-Mail.
I’m not sure how else to narrow it down. Nothing else is
disfunctional in the VM except for RT. I have also rebuilt this VM
from scratch about 4 times now trying to see if perhaps that is an
issue in and of itself, and the error is recurring.

Any other ideas? I can’t seem to narrow it down any more using these methods.

OK, that’s excellent. It means its confined to one of RT or the RT/Apache
interface you are using. Are you using fastcgi or …? What version is it?

You could also try upgrading to RT 3.8.2.

– ============================
Tom Lahti
BIT Statement LLC

(425)251-0833 x 117
http://www.bitstatement.net/
– ============================