Why does RT's FCGI server not handle SIGPIPE?

Alex_Hall · March 22, 2021, 9:45am

Hello all,

TL;DR: RT4.4.x using the FCGI server. Workers die all the time, and monitoring restarts RT when the last one is gone. We think the problem is unhandled SIGPIPE signals, so I added a line of Perl to rt-server.fcgi to ignore SIGPIPE in hopes that this will let the workers stop dying when they get this signal. Is there any reason I should not have added this line, or am I okay to keep it in? Does anyone know why it’s never been added to RT? Even RT5 is missing it.

We’ve used RT for years now, always through its FCGI server. We’ve always had our FCGI workers die, one by one, until a monitoring job restarted them all after the last one went away. Finally, we figured out why (we think), thanks to help from
this post.
When the FCGI server receives SIGPIPE, which can happen from time to time based on user activity, there’s nothing to handle it. Thus, the worker dies.

I copied rt-server.fcgi, and added a snippet I found online to ignore SIGPIPE, then pointed RT at that server file instead. Twelve hours later, there have been no problems, though of course, the test will be once our users start accessing the site today.

Is there a reason that ignoring SIGPIPE hasn’t been made part of the source code yet, after all this time? I imagine it’s just something no one really considered. In the back of my mind is a concern that, somehow, including this could cause serious problems that no one was able to work around, so the decision was to let the worker die rather than deal with whatever issues adding the SIGPIPE handler introduced. I doubt this is the case, but modifying the source of files in sbin makes me nervous. Thanks for any thoughts on this.

Rob_Lister · April 1, 2021, 1:13pm

Interesting. It falls over for me a few times a day, most days. It only takes a few seconds for it to restart, but it is annoying. It only started to occur when we upgraded to RT 4.4.3 + nginx, and since that was quite a big upgrade, wasn’t sure if it was something to do with our nginx config (the previous RT install was on Apache, which seems slightly better supported by RT than nginx.)

Are you able to say if your SIGPIPE ignore mod seems to have worked yet?

Alex_Hall · April 1, 2021, 2:35pm

Hello,

So far, so good. Rather than our ten workers failing one by one until Monit has to restart spawn-fcgi, the workers have stayed alive pretty much indefinitely. I’ve had to manually restart them a few times because of increasing memory usage, but they haven’t died on their own how they used to. Of course, I didn’t notice the memory usage problem until they started staying alive long enough for it to appear, so I guess this is progress. Still, better a scheduled restart that takes two seconds than a random crash that it might take Monit 90 seconds to detect.

In short, yes, adding a handler for SIGPIPE did seem to help a lot.

Rob_Lister · April 1, 2021, 2:39pm

Thanks,

Can you confirm if this is on the right lines: rt-server.fcgi:

$SIG{PIPE} = 'IGNORE';

R.

Alex_Hall · April 1, 2021, 2:59pm

That’s what I have. I put the line just above the SIGINT handler, though I doubt placement matters too much. Note that I don’t really speak Perl, so please don’t take my word for it. I can confirm that your code matches mine, and where I put it has worked so far.

Andrew_Ruthven · December 1, 2021, 8:59am

I’ve submitted a bug report to Best Practical with a patch, visible here: Login

My intention is to include this in future releases in Debian until it is resolved by BP.

ktm · February 7, 2022, 7:45pm

We ended up using ‘multiwatch’ to address this problem here with the fcgi processes exiting.

Regards,
Ken

Andrew_Ruthven · September 21, 2023, 12:53pm

Good news, the patch for this issue is in 4.4.7beta1 and 5.0.5beta1.

(It has been in the Debian packages of both request-tracker4 and request-tracker5 for a while now.)