Fcgi read timeout error in 4.2.5

Hello everyone,

I have been struggling with a consistent error in the 4.2.x line since it was released. On my development server I consistently get this error:

==> /var/log/httpd/rt4devel_error_log <==
[Mon Jun 16 09:01:17 2014] [warn] [client 137.238.60.9] mod_fcgid: read data timeout in 40 seconds, referer: https://rtdevel.geneseo.edu/index.html?HomeRefreshInterval=120
[Mon Jun 16 09:01:17 2014] [error] [client 137.238.60.9] Premature end of script headers: rt-server.fcgi, referer: https://rtdevel.geneseo.edu/index.html?HomeRefreshInterval=120

It occurs sporadically. When navigating around I would say it around 1 in 10 clicks. When I leave it on the home page it occurs after 5-10 reloads. In my testing it seems able to happen on just about any page.

I am running on RHEL 6.5
Apache 2.2.15
mod_fcgid-2.3.9
Oracle Instant Client 11.2 against a 11g database.

I ran the development server on the same code as our production (4.0.17) for a day before upgrading and did not experience this error so I feel it is related to 4.2.x

I enabled debug logging and stack traces but when this error occurs nothing is logged other than the http error log above.

Does anyone have any idea what could be causing this or ideas on how to troubleshoot this further?

Thanks you!

Shawn Plummer
Systems Manager | SUNY Geneseo
South Hall 119 | 585-245-5577 | Computing & Information Technology | SUNY Geneseo

Does anyone have any idea what could be causing this or ideas on how to
troubleshoot this further?

Look for slow queries in your database (there are oracle specific
reports for this) or try turning on MasonX::Profiler in development
and see what RT is doing when it times out.

-kevin

Look for slow queries in your database (there are oracle specific
reports for this) or try turning on MasonX::Profiler in development
and see what RT is doing when it times out.

I enabled MasonX::Profiler as you suggested. Here is the relevant log entries

RT::Config::ANON(‘RT::Config=HASH(0x14eab98)’, undef) called at /opt/rt4devel/sbin/…/lib/RT/Config.pm line 1159
RT::Config::PostLoadCheck(‘RT::Config=HASH(0x14eab98)’) called at /opt/rt4devel/sbin/…/lib/RT.pm line 199
RT::Init(‘RT’, ‘Heavy’, 1) called at /opt/rt4devel/sbin/rt-server.fcgi line 126

==> /var/log/httpd/error_log <==
=Mason= localhost - /index.html BEGINS {{{
=Mason= localhost - /autohandler {{{
=Mason= localhost - /Elements/SetupSessionCookie {{{

At this point there is a 3 minute (I upped it to see if it would ever finish) wait until fcgi times out and throws the internal server error.

==> /var/log/httpd/rt4devel_error_log <==
[Tue Jun 24 14:17:06 2014] [warn] [client 137.238.60.9] mod_fcgid: read data timeout in 180 seconds, referer: https://rtdevel.geneseo.edu/index.html?HomeRefreshInterval=120
[Tue Jun 24 14:17:06 2014] [error] [client 137.238.60.9] Premature end of script headers: rt-server.fcgi, referer: https://rtdevel.geneseo.edu/index.html?HomeRefreshInterval=120

==> /var/log/httpd/error_log <==
=Mason= localhost - /favicon.ico BEGINS {{{
=Mason= localhost - /autohandler {{{
=Mason= localhost - /Elements/SetupSessionCookie }}} 180.1433
TONS more Mason logs

Not sure what this really tells me.

Our DBA is looking into the database side. Apparently It is blocking itself going against the 11g database. Sessions are blocking other sessions. We are looking into any possible database parameters new to 11g that cause this behavior to be different from our production 10g RT database.

Shawn Plummer
Systems Manager | SUNY Geneseo
South Hall 119 | 585-245-5577 | Computing & Information Technology (CIT) | SUNY Geneseo

Not sure what this really tells me.

Our DBA is looking into the database side. Apparently It is blocking
itself going against the 11g database. Sessions are blocking other
sessions. We are looking into any possible database parameters new to
11g that cause this behavior to be different from our production 10g RT
database.

Sounds like it’s indeed related to session locking. You can try
switching to on-disk sessions and see if it resolves the issue:

Set($WebSessionClass, "Apache::Session::File");

However, we’d also be interested to hear what your DBA discovers about
the deadlocks, as this issue may well bite other Oracle users, and we’d
like to resolve it.

  • Alex

[snip]

Please keep replies on-list.

  • Alex

I have done some more digging and am unable to determine the code causing the FOR UPDATE to be issued.

Does anyone have rt 4.2.x running against Oracle successfully?

Shawn Plummer
Systems Manager | SUNY Geneseo
South Hall 119 | 585-245-5577 | http://www.geneseo.edu/citOn Jun 24, 2014, at 4:24 PM, Alex Vandiver alexmv@bestpractical.com wrote:

On 06/24/2014 04:03 PM, Shawn Plummer wrote:

[snip]

Please keep replies on-list.

  • Alex

Sounds like it’s indeed related to session locking. You can try
switching to on-disk sessions and see if it resolves the issue:

Set($WebSessionClass, "Apache::Session::File”);

Running this way did indeed prevent the error from occurring. We
switched back to database sessions to perform more testing.

However, we’d also be interested to hear what your DBA discovers about
the deadlocks, as this issue may well bite other Oracle users, and we’d
like to resolve it.

Our DBA has been doing some research and believes FOR UPDATE in this
statement is the issue

SELECT a_session
FROM sessions
WHERE id = :p1 FOR UPDATE

From Oracle Doc:
How Oracle Database Locks Data

Locks are mechanisms that prevent destructive interaction between
transactions accessing the same resource—either user objects such as
tables and rows or system objects not visible to users, such as shared
data structures in memory and data dictionary rows.

In all cases, Oracle Database automatically obtains necessary locks when
executing SQL statements, so users need not be concerned with such
details. Oracle Database automatically uses the lowest applicable level
of restrictiveness to provide the highest degree of data concurrency yet
also provide fail-safe data integrity. Oracle Database also allows the
user to lock data manually.

More
details: Data Concurrency and Consistency

There is also a doc showing that SELECT FOR UPDATE behavior is different
from 10g to 11g. Doc ID 858518.1

And further information on FOR
UPDATE Unintended Consequences « Oracle Explorations and Investigations


Shawn Plummer
Systems Manager | SUNY Geneseo
South Hall 119 | 585-245-5577 | Computing & Information Technology (CIT) | SUNY Geneseo

Doing some more research I see that RT using Oracle used to use File Sesssions and switched in 4.2 to using Oracle. Then a change was made to make sessions default to the MaxAttachment size which, from reading DBD::Oracle documentation, could trigger FOR UPDATE being used causing the locking issue.

I guess the answer for me is to just use Apache Sesssion Files.

I would be happy to test any solutions for using Oracle for sessions on my devel instance but don’t want to hold up my upgrade any longer.

The two commits in question

I did confirm this bug occurs against a 10g and 11g Oracle database using 4.2

Shawn Plummer
Systems Manager | SUNY Geneseo
South Hall 119 | 585-245-5577 | http://www.geneseo.edu/citOn Jul 29, 2014, at 10:49 AM, Shawn Plummer plummer@geneseo.edu wrote:

I have done some more digging and am unable to determine the code causing the FOR UPDATE to be issued.

Does anyone have rt 4.2.x running against Oracle successfully?


Shawn Plummer
Systems Manager | SUNY Geneseo
South Hall 119 | 585-245-5577 | Computing & Information Technology (CIT) | SUNY Geneseo

On Jun 24, 2014, at 4:24 PM, Alex Vandiver alexmv@bestpractical.com wrote:

On 06/24/2014 04:03 PM, Shawn Plummer wrote:

[snip]

Please keep replies on-list.

  • Alex

Sounds like it’s indeed related to session locking. You can try
switching to on-disk sessions and see if it resolves the issue:

Set($WebSessionClass, "Apache::Session::File”);

Running this way did indeed prevent the error from occurring. We
switched back to database sessions to perform more testing.

However, we’d also be interested to hear what your DBA discovers about
the deadlocks, as this issue may well bite other Oracle users, and we’d
like to resolve it.

Our DBA has been doing some research and believes FOR UPDATE in this
statement is the issue

SELECT a_session
FROM sessions
WHERE id = :p1 FOR UPDATE

From Oracle Doc:
How Oracle Database Locks Data

Locks are mechanisms that prevent destructive interaction between
transactions accessing the same resource—either user objects such as
tables and rows or system objects not visible to users, such as shared
data structures in memory and data dictionary rows.

In all cases, Oracle Database automatically obtains necessary locks when
executing SQL statements, so users need not be concerned with such
details. Oracle Database automatically uses the lowest applicable level
of restrictiveness to provide the highest degree of data concurrency yet
also provide fail-safe data integrity. Oracle Database also allows the
user to lock data manually.

More
details: Data Concurrency and Consistency

There is also a doc showing that SELECT FOR UPDATE behavior is different
from 10g to 11g. Doc ID 858518.1

And further information on FOR
UPDATE Unintended Consequences « Oracle Explorations and Investigations


Shawn Plummer
Systems Manager | SUNY Geneseo
South Hall 119 | 585-245-5577 | Computing & Information Technology (CIT) | SUNY Geneseo

Doing some more research I see that RT using Oracle used to use File
Sesssions and switched in 4.2 to using Oracle. Then a change was made to
make sessions default to the MaxAttachment size which, from reading
DBD::Oracle documentation, could trigger FOR UPDATE being used causing
the locking issue.

I guess the answer for me is to just use Apache Sesssion Files.

I would be happy to test any solutions for using Oracle for sessions on
my devel instance but don�t want to hold up my upgrade any longer.

The two commits in question
Default to storing sessions in the database on Oracle · bestpractical/rt@36be526 · GitHub
Increase maximum session size to match maximum attachment size · bestpractical/rt@fd66493 · GitHub

I did confirm this bug occurs against a 10g and 11g Oracle database
using 4.2

What steps do you use to reproduce the error? If we can trigger it
locally (we test against 10g), we can more easily fix the error.

  • Alex

I did confirm this bug occurs against a 10g and 11g Oracle database
using 4.2

What steps do you use to reproduce the error? If we can trigger it
locally (we test against 10g), we can more easily fix the error.

  • Alex

I didn’t need to do anything specific to reproduce it. I just upgraded to 4.2 and it started. It is somewhat intermittent though. I usually login to the main dashboard in a tab and set it to refresh every 10 minutes and it happens within about half an hour. I also can make it happen by refreshing the main page over and over.

If I can provide anything else let me know.

Shawn Plummer
Systems Manager | SUNY Geneseo
South Hall 119 | 585-245-5577 | Computing & Information Technology (CIT) | SUNY Geneseo

What steps do you use to reproduce the error? If we can trigger it
locally (we test against 10g), we can more easily fix the error.

I didn�t need to do anything specific to reproduce it. I just upgraded
to 4.2 and it started. It is somewhat intermittent though. I usually
login to the main dashboard in a tab and set it to refresh every 10
minutes and it happens within about half an hour. I also can make it
happen by refreshing the main page over and over.

If I can provide anything else let me know.

I’m ideally looking for a reproduction strategy against a clean RT 4.2.
Can you show the results of (filling in … for the value of your RT
session cookie):

SELECT id, LENGTH(a_session) FROM sessions
WHERE id = ‘…’

…as well as the value that your $MaxAttachmentSize is set to?

  • Alex