Possibly OT: RT's FCGI server randomly fails, no log

Alex_Hall · October 11, 2016, 4:55pm

Hello list,
This may be off-topic, but I’m serving RT with Nginx and FCGI. Randomly, it
seems, the FCGI server is failing. Nginx works, but users see “error 502:
bad gateway”. I see the same in the logs, with connect() failing. All I
have to do is run the spawn-fcgi command to get things back.

Why this is happening, with some frequency, is the question. My Nginx, RT,
and system logs all show nothing, and to my knowledge, there are no FCGI
logs at all. The first error for today in Nginx is when a client failed to
connect after the server went down; there’s nothing that says what the
actual problem was. This happened Saturday, then again today.

The server has the latest updates for Debian 8.6, and has 4GB of ram. It’s
serving a few dozen users at most, so the load can’t be the problem. I’m
using Nginx 1.6.2 with four workers and 768 threads per worker. Users see
nothing unusual before this happens, just a 502 instead of the page they
expected.

If anyone else is using Nginx and has ever seen this, I’d love some input.
As this could be considered off topic, feel free to respond directly to
ahall@autodist.com. If I need to provide more details, please let me know.
Thank you.

Alex Hall
Automatic Distributors, IT department
ahall@autodist.com

Kenneth_Marshall · October 11, 2016, 5:59pm

Hello list,
This may be off-topic, but I’m serving RT with Nginx and FCGI. Randomly, it
seems, the FCGI server is failing. Nginx works, but users see “error 502:
bad gateway”. I see the same in the logs, with connect() failing. All I
have to do is run the spawn-fcgi command to get things back.

Why this is happening, with some frequency, is the question. My Nginx, RT,
and system logs all show nothing, and to my knowledge, there are no FCGI
logs at all. The first error for today in Nginx is when a client failed to
connect after the server went down; there’s nothing that says what the
actual problem was. This happened Saturday, then again today.

The server has the latest updates for Debian 8.6, and has 4GB of ram. It’s
serving a few dozen users at most, so the load can’t be the problem. I’m
using Nginx 1.6.2 with four workers and 768 threads per worker. Users see
nothing unusual before this happens, just a 502 instead of the page they
expected.

If anyone else is using Nginx and has ever seen this, I’d love some input.
As this could be considered off topic, feel free to respond directly to
ahall@autodist.com. If I need to provide more details, please let me know.
Thank you.

–
Alex Hall
Automatic Distributors, IT department
ahall@autodist.com

Hi Alex,

You will get the 502 error when there are no more RT backends running. I
tracked down verious errors in the RT logs that resulted in a backend
exits. Most were of the ‘cannot believe I did that type’ by people
setting up the system, i.e. not really fixable with a distributed
management environment. We ended up using ‘multiwatch’ in RHEL6
and systemd in RHEL7 to keep an appropriate number of backends always
available.

Regards,
Ken

Alex_Hall · October 11, 2016, 8:12pm

Don’t ask me why, but /var/log/messages recorded the problem this most
recent crash. Does this mean anything to anyone?

Oct 11 14:47:24 RTServer RT: [6632] Argument “myork” isn’t numeric in
numeric ne (!=) at /usr/share/request-tracker4/lib/RT/Interface/Web.pm line
2949.
Oct 11 14:47:41 RTServer RT: [6632] Argument “myork” isn’t numeric in
numeric ne (!=) at /usr/share/request-tracker4/lib/RT/Interface/Web.pm line
2949.

I didn’t paste that twice, it appeared twice in the log. This is still
4.2.8.On Tue, Oct 11, 2016 at 1:59 PM, Kenneth Marshall ktm@rice.edu wrote:

On Tue, Oct 11, 2016 at 12:55:13PM -0400, Alex Hall wrote:

Hello list,
This may be off-topic, but I’m serving RT with Nginx and FCGI. Randomly,
it
seems, the FCGI server is failing. Nginx works, but users see “error 502:
bad gateway”. I see the same in the logs, with connect() failing. All I
have to do is run the spawn-fcgi command to get things back.

Why this is happening, with some frequency, is the question. My Nginx,
RT,
and system logs all show nothing, and to my knowledge, there are no FCGI
logs at all. The first error for today in Nginx is when a client failed
to
connect after the server went down; there’s nothing that says what the
actual problem was. This happened Saturday, then again today.

The server has the latest updates for Debian 8.6, and has 4GB of ram.
It’s
serving a few dozen users at most, so the load can’t be the problem. I’m
using Nginx 1.6.2 with four workers and 768 threads per worker. Users see
nothing unusual before this happens, just a 502 instead of the page they
expected.

If anyone else is using Nginx and has ever seen this, I’d love some
input.
As this could be considered off topic, feel free to respond directly to
ahall@autodist.com. If I need to provide more details, please let me
know.
Thank you.

–
Alex Hall
Automatic Distributors, IT department
ahall@autodist.com

Hi Alex,

You will get the 502 error when there are no more RT backends running. I
tracked down verious errors in the RT logs that resulted in a backend
exits. Most were of the ‘cannot believe I did that type’ by people
setting up the system, i.e. not really fixable with a distributed
management environment. We ended up using ‘multiwatch’ in RHEL6
and systemd in RHEL7 to keep an appropriate number of backends always
available.

Regards,
Ken

Alex Hall
Automatic Distributors, IT department
ahall@autodist.com