Rt4-fcgi processes are dying after some debian package updates

Gary_Mason · August 5, 2014, 1:13pm

Hi,

I’m running the following:
Debian Wheezy RT 4.0.7 nginx fcgi

Up until the end of last week, RT had been very stable when running with
three rt4-fcgi backends. I would find that about every 3-4 months I would
have to restart RT as the fcgi processes had died and I was getting “502 bad
gateway” error messages.

I then did some Debian package updates and since then, the fcgi processes
have been dying within an our or so, with users getting the 502 Bad gateway
error message. I am now running with 10 fcgi processes which is giving me a
little breathing space as they seem to last for up to an hour before all
dying and needing RT to be restarted, but that isn’t always the case -
sometimes it can be 20 minutes or 90 minutes.

It’s pretty obvious that one or more of the package updates I did last week
has upset RT but I can’t see anything in any log to indicate why the fcgi
processes are dying like they are. I have the RT log set to debug level and
even that holds no clues, neither does syslog or the nginx log.

The packages I updated are as follows:
libcups2 libcupsimage2 libdatetime-timezone-perl libdbi-perl
libdevmapper1.02.1 libjpeg8 liblcms2-2 libperl-dev libperl5.14 perl
perl-base perl-modules openssh-client openssh-server snmpd tzdata librsvg2-2
libsnmp-base libsnmp15 libapr1 initscripts sysv-rc sysvinit sysvinit-utils
base-files postgresql-9.2 postgresql-client-9.2 postgresql-client-common
postgresql-common libpq-dev libpq5

I’m guessing one or more of the perl related packages is to blame, but
without any kind of log content to give me any clues, I’m at a loss to
understand what to try and fix.

Anyone got any suggestions as to what I can do to firstly get some useful
log feedback regarding the dying fcgi processes, and secondly what I could
do to resolve this.

Thanks,
Gary

View this message in context: http://requesttracker.8502.n7.nabble.com/rt4-fcgi-processes-are-dying-after-some-debian-package-updates-tp58212.html

Kenneth_Marshall · August 5, 2014, 1:27pm

Hi,

I’m running the following:
Debian Wheezy RT 4.0.7 nginx fcgi

Up until the end of last week, RT had been very stable when running with
three rt4-fcgi backends. I would find that about every 3-4 months I would
have to restart RT as the fcgi processes had died and I was getting “502 bad
gateway” error messages.

…

I’m guessing one or more of the perl related packages is to blame, but
without any kind of log content to give me any clues, I’m at a loss to
understand what to try and fix.

Anyone got any suggestions as to what I can do to firstly get some useful
log feedback regarding the dying fcgi processes, and secondly what I could
do to resolve this.

Thanks,
Gary

Hi Gary,

As far as figuring out where the problem is, you will need to bump up
your logging level across the systems involved. To fix your immediate
problem of service outages caused when all the fcgi processes exit I
would recommend something like multiwatch to manage them and fire up
a new process automatically as needed:

We use it with spawn-fcgi here and it works well.

Regards,
Ken

Gary_Mason · August 6, 2014, 4:00pm

I’ve turned up the logging in both RT and nginx to debug level but still
nothing to see that indicates why these fcgi processes are quietly dying.

So having looked back at my original installation process, I came across the
DBD::Pg module. At the time, I had installed the latest version, which I am
still running, which is 2.19.3. I notice that this is now two years old and
a number of versions behind, including a major release point.

Reading up on what this module hooks in to, it seems to be heavily related
to the libpq packages that I updated.

So, before I go ahead and update the CPAN module for DBD::Pg, anyone have
any thoughts/opinions on whether this is a possible suspect for my issues?

Thanks,
Gary

View this message in context: http://requesttracker.8502.n7.nabble.com/rt4-fcgi-processes-are-dying-after-some-debian-package-updates-tp58212p58237.html

Alex_Vandiver · August 6, 2014, 4:21pm

So, before I go ahead and update the CPAN module for DBD::Pg, anyone have
any thoughts/opinions on whether this is a possible suspect for my issues?

It’s possible, though it would surprise me slightly. The only thing to
note is that DBD::Pg 3.3.0 is incompatible with all current releases of
RT, due to changes in how it handles UTF-8 data – non-ASCII data will
be corrupted when inserted into the database. If you upgrade, 3.2.1 is
the latest safe version.

Alex

Gary_Mason · August 6, 2014, 4:38pm

Thanks for that info Alex.

I’m on RT release 4.0.7 (form the Debian backports repo), with no imminent
upgrade in mind, so that should be fine.

View this message in context: http://requesttracker.8502.n7.nabble.com/rt4-fcgi-processes-are-dying-after-some-debian-package-updates-tp58212p58239.html

Alex_Vandiver · August 6, 2014, 4:43pm

I’m on RT release 4.0.7 (form the Debian backports repo), with no imminent
upgrade in mind, so that should be fine.

No, it shouldn’t be; re-read my message. DBD::Pg 3.3.0 breaks all
versions of RT. You will have data corruption if you install DBD::Pg
on your version of RT.

Alex

Gary_Mason · August 8, 2014, 4:12pm

I’ve upgraded to DBD:Pg-3.2.1 but still no joy. fcgi backends regularly die.

Having upped the logging for both nginx and RT to its maximum level, I still
found nothing. Couldn’t find anything in any system emails or logs that
might have been generated by the fcgi processes dying and spluttering to
stdout or stderr.

There must be something else I have missed - I can’t believe that
RT4.0.7/PG9.2 and nginx on Debian Wheezy is such an unusual system to want
to run RT on. Indeed, it was fine before those package updates last week.

Anyone got any other clues/thoughts ?

View this message in context: http://requesttracker.8502.n7.nabble.com/rt4-fcgi-processes-are-dying-after-some-debian-package-updates-tp58212p58286.html

Dominic_Hargreaves2 · August 11, 2014, 9:23pm

I’m on RT release 4.0.7 (form the Debian backports repo), with no imminent
upgrade in mind, so that should be fine.

No, it shouldn’t be; re-read my message. DBD::Pg 3.3.0 breaks all
versions of RT. You will have data corruption if you install DBD::Pg
on your version of RT.

This is a pretty serious issue. I don’t see any sign of a bug against
DBD::Pg in the CPAN bugtracker, and Debian now has 3.3.0 in unstable
and testing. Could you say a bit more about the problem and what plans
there are to fix/workaround it for RT? Forcing a lower version of
DBD::Pg isn’t a practical option in a packaged environment like
Debian.

Cheers,
Domninic.

Dominic Hargreaves, Systems Development and Support Section
IT Services, University of Oxford

signature.asc (198 Bytes)

Alex_Vandiver · August 11, 2014, 10:15pm

I don’t see any sign of a bug against DBD::Pg in the CPAN bugtracker,
and Debian now has 3.3.0 in unstable and testing.

The bug isn’t DBD::Pg’s fault – hence why there’s nothing we’ve
reported – but rather a case of it becoming more correct, and there
being lurking code in the bowels of DBIx::SearchBuilder that was
incorrect, and now interacts poorly. Specifically:

github.com

bestpractical/dbix-searchbuilder/blob/master/lib/DBIx/SearchBuilder/Handle.pm#L577-L579


      
          =cut
          
          sub SimpleQuery {

…which takes characters that we’re trying to insert into the database
and encodes them in UTF-8[1] – which is then double encoded when
DBD::Pg 3.3.0 realizes that the database column is textual. Previous to
3.3.0, it accepted bytes and inserted bytes, which we would later read
out as characters. Now, it accepts bytes and attempts to insert them as
character codepoints, so that the data round-trips and we get the same
character codepoints out. Which is more correct, as 3.2.1 relied on the
“UTF-8” flag to guess if the incoming data was codepoints or bytes,
which was a false presmise.

Those lines are, unfortunately, only part of the problem. Other places
exist in RT which blindly pass bytes (not characters) to textual
columns, which need to be resolved in order for RT to work properly with
DBD::Pg. In other words, the internals of RT are riddled with places
that make the same false assumptions about the “UTF-8” flag as DBD::Pg
3.2.1 did, which mostly canceled each other out.

Could you say a bit more about the problem and what plans there are
to fix/workaround it for RT? Forcing a lower version of DBD::Pg isn’t
a practical option in a packaged environment like Debian.

I’ve pushed https://github.com/bestpractical/rt/tree/4.0/utf8-reckoning
which addresses the deeper issues needed for RT to work. It is
currently in review, and will be merged in as short order as a branch of
that size can be. It passes all tests on both versions of DBD::Pg, but
further testing (carefully, as it might cause data corruption with
non-ASCII characters) would be appreciated.

This is a pretty serious issue.

Fixing this is indeed high priority for us, as mostly-unrecoverable data
corruption is never a good thing. Once the branch gets merged, I expect
we’ll roll release candidates in short order.

Alex

[1] This is a slight lie, due to perl internals. In some rare cases,
for strings which contain only codepoints which exist in ISO-8859-1, it
instead encodes them in ISO-8859-1 before treating those bytes as
codepoints and double-encoding in UTF-8, for all of your mojibake needs.
Wonderful, no?