Possible bug in Rt 3.8.7 SelfService?

Hi,

I upgraded RT from 3.6.6 to 3.8.7. Everything went fine but, when I
connect to the selfservice as a non privileged user, there is a loop
and the open/closed ticket pages load forever, and I get 100% cpu
process on the server. It doesn’t happen when I click on “create
ticket”.

I fixed the problem by giving permissions “show ticket” (and “reply
ticket” but I think this one didn’t help) to requestor and cc. But
this loop (in the more or less default configuration) isn’t normal I
think.

L.B.

Hi,

I upgraded RT from 3.6.6 to 3.8.7. Everything went fine but, when I
connect to the selfservice as a non privileged user, there is a loop
and the open/closed ticket pages load forever, and I get 100% cpu
process on the server. It doesn’t happen when I click on “create
ticket”.

I can’t replicate this with an unpriv user on 3.8.7 who has no rights
but is the requestor of a ticket entered in the system.
I think we’re going to need a clearer picture of your apache/rt
configuration and some apache logs of what is happening.

-kevin

Hmm, ok, maybe it’s my configuration.

My unprivileged user is in Cc of one ticket (closed now).

I did some tests :

If requestor and cc haven’t ShowTicket right, I have the problem.
If one of them or both has the right, it works normally.

I keep on investigating…
L.B.

Hmm, ok, maybe it’s my configuration.

My unprivileged user is in Cc of one ticket (closed now).

(… and that’s all)

I did some tests :

If requestor and cc haven’t ShowTicket right, I have the problem.
If one of them or both has the right, it works normally.

I enabled debug mode in RT and Apache, but I get nothing interesting.
I suspect /SelfService/Elements/MyRequests as it’s only used by
index.html and closed.html, the two pages with the problem.

L.B.

I ran wireshark to check where the page stopped loading (or started looping).

When I have the problem, the page looks like this :

[…]

RT Self Service / Open tickets

     

and nothing else.

When I don’t have the problem, it looks like this :

[…]

RT Self Service / Open tickets

     

237

[...]

It looks like the problem starts in share/html/Elements/PageLayout at
these lines :
% $m->callback( %ARGS, CallbackName => ‘BeforeBody’ );
% $m->flush_buffer(); # we’ve got the page laid out, let’s flush the buffer;

(I don’t have such callback AFAIK)

loaded by the html/SelfService/Elements/Tabs page.

I have no idea where this 237 comes from, but it might be the source
of the problem (or I might be completely wrong also :slight_smile: )
L.B.

It looks like the problem starts in share/html/Elements/PageLayout at
these lines :
% $m->callback( %ARGS, CallbackName => ‘BeforeBody’ );
% $m->flush_buffer(); # we’ve got the page laid out, let’s flush the buffer;

(I don’t have such callback AFAIK)

loaded by the html/SelfService/Elements/Tabs page.

That is unlikely to be the problem…
There are some more things you can try tho:

a) run wireshark and check if queries are being executed on the
database (and what queries)

b) modify /opt/rt3/bin/webmux.pl and add the code :

BEGIN {
$SIG{USR1} = sub {
my $i = 0;
$RT::Logger->debug(“Recieved USR1 signal; dumping stack!”);
while (my @c = caller($i)) {
$RT::Logger->debug(“STACK: $i: " . join(” & ", @c));
$i++;
}
$RT::Logger->debug(“Stack dupmed”);
die “the end”;
};
}

When the process is looping:
kill -USR1 process_id_of_process_that_is_looping

That should show the stack in the rt log file…

Best regards,

Bram

Hi,

I tried a) by enabling the log queries in Postgres, but it’s
difficult to debug (many queries), and I don’t see any taking too much
time.

I tried b) but I don’t have the dump in the logs. All the log options
are set to debug in RT config, I added your code at the beginning of
webmux.pl juster after the comments, restarted RT, tail -f
/var/log/rt.log but after a kill -USR1 pid_which_is_looping, nothing
is dumped in the logs. Are you sure of the code or did I do something
wrong?

L.B.

Are you sure of the code or did I do something
wrong?

The problem is that apache itself also listens to the USR1 signal…

Update the code to: (replace USR1 with USR2)

BEGIN {
$SIG{USR2} = sub {
my $i = 0;
$RT::Logger->debug(“Recieved USR2 signal; dumping stack!”);
while (my @c = caller($i)) {
$RT::Logger->debug(“STACK: $i: " . join(” & ", @c));
$i++;
}
$RT::Logger->debug(“Stack dupmed”);
die “the end”;
};
}

and then use kill -USR2 pid_of_process

Best regards,

Bram

Bram,

First, thanks for your help.

Attached is the stack content.

L.B.

stack.txt (9.76 KB)

Old problem, but still existing in my configuration.

I found that if I remove “UseSQLForACLChecks” option, it works. I see
in RT wiki that “this option is beta. In some cases it result in
performance improvements but some setups can not handle it”.

I remember I enabled this option because when searching for tickets, I
had a results of for example 30 tickets with 8 on the first page, 4 in
the second page, 12 on the third etc… and this option returned the
good number of pages with the good number of tickets per page.

Do you have some details about the “setups that can not handle it” ?

I’m running 3.8.7 on RHEL with postgres 8.3

L.B.

This gets stranger and stranger since I don’t know anybody that has the
pagination problem. Was it introduced when you upgraded to 3.8.7?On 2010-05-07 10:38:26AM +0200, L B wrote:

Old problem, but still existing in my configuration.

I found that if I remove “UseSQLForACLChecks” option, it works. I see
in RT wiki that “this option is beta. In some cases it result in
performance improvements but some setups can not handle it”.

I remember I enabled this option because when searching for tickets, I
had a results of for example 30 tickets with 8 on the first page, 4 in
the second page, 12 on the third etc… and this option returned the
good number of pages with the good number of tickets per page.

Do you have some details about the “setups that can not handle it” ?

I’m running 3.8.7 on RHEL with postgres 8.3


L.B.


List info: The rt-devel Archives

Peter C. Lai | Bard College at Simon’s Rock
Systems Administrator | 84 Alford Rd.
Information Technology Svcs. | Gt. Barrington, MA 01230 USA
peter AT simons-rock.edu | (413) 528-7428

This gets stranger and stranger since I don’t know anybody that has the
pagination problem. Was it introduced when you upgraded to 3.8.7?

I don’t have this pagination problem anymore. Before running 3.8.7, I
was running 3.8.6 for some instances and 3.6.6 for others. I’m sure
this parameter fixed this problem, on a old version.

I removed the UseSQLForACLChecks on 3.8.7, and pagination looks good.

I did several tests and it was clear that enabling this ended with a
loading of selfservice looping for a non privileged users (as I said
above, without any rights), and processes running with 100% CPU until
the server crashes. Disabling it fixed the problem, no doubt.

L.B.