RT 2.0.14 hangs on certain queries

We have a Debian “sarge” (testing) system that was working for some time,
using the Debian package “request-tracker” 2.0.14-2 that has been orphaned
to make way for RT3. We did not upgrade to RT3 mainly because the system
has been working for us and we were concerned about the migration path for
the existing database. Can I install RT3 and have it share access to the
same database used for RT2, or is there a one-way migration required?

We were primarily using the web interface with Apache 1.3.27, mod_perl
1.27, HTML::Mason 1.21, and perl 5.8.0. This was a working system, but
since we are running on the Debian testing distribution, it is possible
that the package manager swapped something subtle that we didn’t notice.

The only slightly unusual thing about our installation is that we access
the database over the network. As explained, this has been working for
some time as well. The database server machine is running the Debian
"woody" (stable) distribution, with package “postgresql” 7.2.1-2woody2.

Yesterday, some queries from the web interface started hanging. For
example, if a user tries to log in with the wrong password, the web site
(correctly) returns an error immediately after consulting the database.
However, if the user logs in with the correct password, the web site hangs
and the server shows the relevant instance of Apache pegged approaching
100% of CPU utilization. In order to stop the runaway instance, we need
to kill and restart Apache.

The “rt-mailgate” tool continues to work, and users continue to be able to
submit tickets and get responses via e-mail.

Looking into this further, I discovered that the “rt” command line utility
hangs on some types of queries and not others. For example, a relatively
complicated query with a bunch of constraints like this succeeds:

./rt --limit-owner=mikebw --limit-status=new --limit-status=open
   --limit-last-update=20031009- --summary

However, adding a constraint for requestor causes a hang until killed with
Ctrl-C. In fact, even a constraint for only the requestor hangs:

./rt '--limit-requestor=mike@bilow.com' --summary

My inference is that the web site hangs after logging in because one of
the regions that is displayed to a user on their RT home is the list of
tickets for which they are the requstor.

The fact that the “rt” command line tool hangs combined with the fact that
web site works in some cases indicates to me that the problem is somewhere
inside the Perl code and not in the Apache-specific stuff such as mod_perl
and Mason, but I suppose I could be wrong about this. Regardless, since I
have a failure at the command line, I can use tools such as strace to
examime what is going on.

Using strace to compare the cases where the request does not hang as
opposed to where it does hang, both open the log files in the standard
place used by the Debian package, /var/log/request-tracker/. Oddly, in
both cases, the log files are zero-length. The last shared activity
between the non-hanging and hanging instances is some stuff about the
owner constraint on both queries:

rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
send(3, “QSELECT * FROM Users WHERE lower(Name) = ‘mikebw’\0”, 51,
0) = 51
rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0
select(4, [3], [], [3], NULL) = 1 (in [3])
recv(3,
“Pblank\0T\0"id\0\0\0\0\27\0\4\377\377\377\377name\0\0\0\4\23\377\377
0\0\0|password\0\0\0\4\23\377\377\0\0\0,comments\0\0\0\0\31\377\377\377\377\377
377signature\0\0\0\0\31\377\377\377\377\377\377emailaddress\0\0\0\4\23\377\377\0
\0\0|freeformc”…, 16384, 0) = 951
rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
send(3, “QSELECT * FROM Users WHERE id = ‘4’\0”, 37, 0) = 37
rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0
select(4, [3], [], [3], NULL) = 1 (in [3])
recv(3,
“Pblank\0T\0"id\0\0\0\0\27\0\4\377\377\377\377name\0\0\0\4\23\377\377
0\0\0|password\0\0\0\4\23\377\377\0\0\0,comments\0\0\0\0\31\377\377\377\377\377
377signature\0\0\0\0\31\377\377\377\377\377\377emailaddress\0\0\0\4\23\377\377\0
\0\0|freeformc”…, 16384, 0) = 951

Following this, both instances write the header to the stdout console:

brk(0x87f3000) = 0x87f3000
brk(0) = 0x87f3000
write(1, " id Stat Queue Subject
Requestor \n", 81) = 81

But then the non-hanging version begins sending and receiving database
queries (the leading “-” on each line comes from “diff -u” used for this
comparison of the strace logs):

-rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
-send(3, “QSELECT main.* FROM Tickets main WHERE ((main.EffectiveId =
main.id)) AND ((main.Owner = ‘4’)) AND ((main.Status = ‘new’)OR(ma”…, 151,
0) = 151
-rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0
-select(4, [3], [], [3], NULL) = 1 (in [3])
-recv(3,
“Pblank\0T\0\30id\0\0\0\0\27\0\4\377\377\377\377effectiveid\0\0\0\0\27
0\4\377\377\377\377queue\0\0\0\0\27\0\4\377\377\377\377type\0\0\0\4\23\377\377\0
\0\0\24issuestatement\0\0\0\0\27\0\4\377\377\377\377resolution\0\0\0\0\27\0\4
\377\377\377\377own”…, 16384, 0) = 1448
-select(4, [3], [], [3], NULL) = 1 (in [3])
-recv(3, “49:05+00\0\0\0\0321970-01-01 00:00:00+00\0\0\0\0322003-07-15
14:58:29+00\0\0\0\0322003-07-16 14:30:24+00\0\0\0\0322003-08-13
07:01:25+00\0\0\0\0051\0\0\0\0322003-08”…, 16271, 0) = 742

By comparison, the hanging version is, well, hung: nothing further is
emitted into the strace log and it just sits there until Ctrl-C is hit.

We are kind of up the creek here without being able to use the web UI,
even if we could live without ever searching by requestor. Replying to
tickets using the command line interface and a Unix editor is less than
ideal. (When doing that, is there any way to quote in the reply?) I am
hoping that someone very familiar with the code will recognize this
problem as relatively simple and point me in the right direction for a
patch. As explained, the Debian RT2 package stopped updating at 2.0.14
because the package maintainers switched focus to RT3, so upgrading to
2.0.15 would require pulling the system out from under package management,
which I am keen to avoid.

– Mike

Here is some follow-up on my message below.

In trying to diagnose this problem, we put our old RT test server back
into operation. This is a separate server which runs Debian “woody”
(stable) with an older version of the the Debian “request-tracker”
package, 2.0.13-4. We used this system to test RT before rolling it out
into production, where it was subsequently upgraded to 2.0.14-2. The
older server is running Apache 1.2.26, mod_perl 1.26, HTML::Mason 1.04,
and Perl 5.6.1.

The RT installations on both the old and new servers access exactly the
same database and the same live data, which is critically important to
understanding the significance of this test.

On the old server, both the command line interface and web UI work fine,
just as they did until Friday on the new server. Searches using
constraints that fail on the new server (such as “–limit-requestor”)
complete normally on the old server. This rules out, in my opinion, any
possibility of database corruption or other problems.

– MikeOn 2003-10-11 at 23:25 -0400, Michael Bilow wrote:

We have a Debian “sarge” (testing) system that was working for some time,
using the Debian package “request-tracker” 2.0.14-2 that has been orphaned
to make way for RT3. We did not upgrade to RT3 mainly because the system
has been working for us and we were concerned about the migration path for
the existing database. Can I install RT3 and have it share access to the
same database used for RT2, or is there a one-way migration required?

We were primarily using the web interface with Apache 1.3.27, mod_perl
1.27, HTML::Mason 1.21, and perl 5.8.0. This was a working system, but
since we are running on the Debian testing distribution, it is possible
that the package manager swapped something subtle that we didn’t notice.

The only slightly unusual thing about our installation is that we access
the database over the network. As explained, this has been working for
some time as well. The database server machine is running the Debian
“woody” (stable) distribution, with package “postgresql” 7.2.1-2woody2.

Yesterday, some queries from the web interface started hanging. For
example, if a user tries to log in with the wrong password, the web site
(correctly) returns an error immediately after consulting the database.
However, if the user logs in with the correct password, the web site hangs
and the server shows the relevant instance of Apache pegged approaching
100% of CPU utilization. In order to stop the runaway instance, we need
to kill and restart Apache.

The “rt-mailgate” tool continues to work, and users continue to be able to
submit tickets and get responses via e-mail.

Looking into this further, I discovered that the “rt” command line utility
hangs on some types of queries and not others. For example, a relatively
complicated query with a bunch of constraints like this succeeds:

./rt --limit-owner=mikebw --limit-status=new --limit-status=open
–limit-last-update=20031009- --summary

However, adding a constraint for requestor causes a hang until killed with
Ctrl-C. In fact, even a constraint for only the requestor hangs:

./rt ‘–limit-requestor=mike@bilow.com’ --summary

My inference is that the web site hangs after logging in because one of
the regions that is displayed to a user on their RT home is the list of
tickets for which they are the requstor.

The fact that the “rt” command line tool hangs combined with the fact that
web site works in some cases indicates to me that the problem is somewhere
inside the Perl code and not in the Apache-specific stuff such as mod_perl
and Mason, but I suppose I could be wrong about this. Regardless, since I
have a failure at the command line, I can use tools such as strace to
examime what is going on.

Using strace to compare the cases where the request does not hang as
opposed to where it does hang, both open the log files in the standard
place used by the Debian package, /var/log/request-tracker/. Oddly, in
both cases, the log files are zero-length. The last shared activity
between the non-hanging and hanging instances is some stuff about the
owner constraint on both queries:

rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
send(3, “QSELECT * FROM Users WHERE lower(Name) = 'mikebw'\0”, 51,
0) = 51
rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0
select(4, [3], [], [3], NULL) = 1 (in [3])
recv(3,
“Pblank\0T\0"id\0\0\0\0\27\0\4\377\377\377\377name\0\0\0\4\23\377\377
0\0\0|password\0\0\0\4\23\377\377\0\0\0,comments\0\0\0\0\31\377\377\377\377\377
377signature\0\0\0\0\31\377\377\377\377\377\377emailaddress\0\0\0\4\23\377\377\0
\0\0|freeformc”…, 16384, 0) = 951
rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
send(3, “QSELECT * FROM Users WHERE id = '4'\0”, 37, 0) = 37
rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0
select(4, [3], [], [3], NULL) = 1 (in [3])
recv(3,
“Pblank\0T\0"id\0\0\0\0\27\0\4\377\377\377\377name\0\0\0\4\23\377\377
0\0\0|password\0\0\0\4\23\377\377\0\0\0,comments\0\0\0\0\31\377\377\377\377\377
377signature\0\0\0\0\31\377\377\377\377\377\377emailaddress\0\0\0\4\23\377\377\0
\0\0|freeformc”…, 16384, 0) = 951

Following this, both instances write the header to the stdout console:

brk(0x87f3000) = 0x87f3000
brk(0) = 0x87f3000
write(1, " id Stat Queue Subject
Requestor \n", 81) = 81

But then the non-hanging version begins sending and receiving database
queries (the leading “-” on each line comes from “diff -u” used for this
comparison of the strace logs):

-rt_sigaction(SIGPIPE, {SIG_IGN}, {SIG_DFL}, 8) = 0
-send(3, “QSELECT main.* FROM Tickets main WHERE ((main.EffectiveId =
main.id)) AND ((main.Owner = '4')) AND ((main.Status = 'new')OR(ma”…, 151,
0) = 151
-rt_sigaction(SIGPIPE, {SIG_DFL}, {SIG_IGN}, 8) = 0
-select(4, [3], [], [3], NULL) = 1 (in [3])
-recv(3,
“Pblank\0T\0\30id\0\0\0\0\27\0\4\377\377\377\377effectiveid\0\0\0\0\27
0\4\377\377\377\377queue\0\0\0\0\27\0\4\377\377\377\377type\0\0\0\4\23\377\377\0
\0\0\24issuestatement\0\0\0\0\27\0\4\377\377\377\377resolution\0\0\0\0\27\0\4
\377\377\377\377own”…, 16384, 0) = 1448
-select(4, [3], [], [3], NULL) = 1 (in [3])
-recv(3, “49:05+00\0\0\0\0321970-01-01 00:00:00+00\0\0\0\0322003-07-15
14:58:29+00\0\0\0\0322003-07-16 14:30:24+00\0\0\0\0322003-08-13
07:01:25+00\0\0\0\0051\0\0\0\0322003-08”…, 16271, 0) = 742

By comparison, the hanging version is, well, hung: nothing further is
emitted into the strace log and it just sits there until Ctrl-C is hit.

We are kind of up the creek here without being able to use the web UI,
even if we could live without ever searching by requestor. Replying to
tickets using the command line interface and a Unix editor is less than
ideal. (When doing that, is there any way to quote in the reply?) I am
hoping that someone very familiar with the code will recognize this
problem as relatively simple and point me in the right direction for a
patch. As explained, the Debian RT2 package stopped updating at 2.0.14
because the package maintainers switched focus to RT3, so upgrading to
2.0.15 would require pulling the system out from under package management,
which I am keen to avoid.

– Mike