RT 4 Upgrade Slow Performance - CustomFields?

I’m going to try and separate this thread since my issue doesn’t seem to be
related to the Search page or the SelectOwner field. Using the Mason
Profiler did give me some info though, it looks like it might be due to
some custom fields:

=Mason= localhost -
/Ticket/Elements/ShowCustomFields {{{
=Mason= localhost -
/Elements/ShowCustomFields {{{
=Mason= localhost -
/Elements/ShowCustomFields }}} 2.0221
=Mason= localhost -
/Ticket/Elements/ShowCustomFields }}} 2.0261

=Mason= localhost - /autohandler }}} 3.2073

Is it normal that having 1-2 custom fields with about 10-20 options for
each would cause this extra delay. I know 2 seconds doesn’t seem like
much, but it can sometimes be longer. Can anyone give me any tips as to
what the issue could be? Is Mason maybe caching that differently?

Thanks,
Nate

I’m going to try and separate this thread since my issue doesn’t seem to be
related to the Search page or the SelectOwner field. Using the Mason
Profiler did give me some info though, it looks like it might be due to some
custom fields:

=Mason= localhost -
/Ticket/Elements/ShowCustomFields {{{
=Mason= localhost -
/Elements/ShowCustomFields {{{
=Mason= localhost -
/Elements/ShowCustomFields }}} 2.0221
=Mason= localhost -
/Ticket/Elements/ShowCustomFields }}} 2.0261

=Mason= localhost - /autohandler }}} 3.2073

Is it normal that having 1-2 custom fields with about 10-20 options for each
would cause this extra delay. I know 2 seconds doesn’t seem like much, but
it can sometimes be longer. Can anyone give me any tips as to what the
issue could be? Is Mason maybe caching that differently?

Nate,

Separating threads is correct, but skipping details about your system
is not so good.

I think here it’s either slow one query or many semi-fast
queries. If you collect all queries for this page, it wouldn’t be too
hard to isolate those that executed by ShowCustomFields.

The fact that subsequent requests take less time suggests that queries
end up in mysql query cache and are fast until cache is flushed. To
confirm you can disable cache in mysql for some time and check if
problem is repeatable on every request.

Thanks,
Nate

Best regards, Ruslan.

Ruslan,

I actually started the other thread, so my details are in the first post.
That thread was sort of hijacked (no big deal) and getting messy, so I
wanted to separate them. I’m using Postgresql, not MySQL, and had already
turned on the SQL statement log and all queries seemed to be completing
quickly.

I think I might have been too quick to assume it’s the custom fields
though, I’ve been testing it for a few hours now and I don’t see that part
being slow anymore. To be honest the only part I see being consistently
slow right now is:

=Mason= localhost - /Elements/SetupSessionCookie {{{
=Mason= localhost - /Elements/SetupSessionCookie }}} 1.9105

That seems to take about 2 seconds on about 30% of page loads. Is that
normal?

I will try to turn off caching in Postgresql though to see if that verifies
your theory about slow queries.

Thanks!
NateOn Wed, May 30, 2012 at 11:37 AM, Ruslan Zakirov ruz@bestpractical.comwrote:

On Wed, May 30, 2012 at 6:37 PM, Nathan Baker bakern@gmail.com wrote:

I’m going to try and separate this thread since my issue doesn’t seem to
be
related to the Search page or the SelectOwner field. Using the Mason
Profiler did give me some info though, it looks like it might be due to
some
custom fields:

=Mason= localhost -
/Ticket/Elements/ShowCustomFields {{{
=Mason= localhost -
/Elements/ShowCustomFields {{{
=Mason= localhost -
/Elements/ShowCustomFields }}} 2.0221
=Mason= localhost -
/Ticket/Elements/ShowCustomFields }}} 2.0261

=Mason= localhost - /autohandler }}} 3.2073

Is it normal that having 1-2 custom fields with about 10-20 options for
each
would cause this extra delay. I know 2 seconds doesn’t seem like much,
but
it can sometimes be longer. Can anyone give me any tips as to what the
issue could be? Is Mason maybe caching that differently?

Nate,

Separating threads is correct, but skipping details about your system
is not so good.

I think here it’s either slow one query or many semi-fast
queries. If you collect all queries for this page, it wouldn’t be too
hard to isolate those that executed by ShowCustomFields.

The fact that subsequent requests take less time suggests that queries
end up in mysql query cache and are fast until cache is flushed. To
confirm you can disable cache in mysql for some time and check if
problem is repeatable on every request.

Thanks,
Nate


Best regards, Ruslan.

Ruslan,

I actually started the other thread, so my details are in the first post.
That thread was sort of hijacked (no big deal) and getting messy, so I
wanted to separate them. I’m using Postgresql, not MySQL, and had already
turned on the SQL statement log and all queries seemed to be completing
quickly.

I know that your thread was hijacked, but this one lacks history and
for me it’s hard
to jump between threads to figure out background. Anyway, see comments below.

Do you log SQL statements on RT side or in Pg? In your case I would recommend
to enable both method for a while.

I think I might have been too quick to assume it’s the custom fields though,
I’ve been testing it for a few hours now and I don’t see that part being
slow anymore. To be honest the only part I see being consistently slow
right now is:

=Mason= localhost - /Elements/SetupSessionCookie {{{
=Mason= localhost - /Elements/SetupSessionCookie }}} 1.9105

That seems to take about 2 seconds on about 30% of page loads. Is that
normal?

Nope. It’s either big session size or you don’t clean sessions table from old
records with rt-clean-sessions.

SELECT COUNT(1) FROM sessions;

Above should say how many records you have in the table.

Also, sessions are locked and if you click a link or refresh while page
is still loading then time on SetupSessionCookie can contain time it
was waiting for lock to be released by previous request.

I’m not sure how you’re testing. If your DB is actively used by many RT
users then it’s hard to tell what’s slow.

I will try to turn off caching in Postgresql though to see if that verifies
your theory about slow queries.

Ah. You migrated to Pg. Caching in Pg is different from mysql.

Thanks!
Nate

Best regards, Ruslan.

Ruslan,

I wasn’t aware that sessions had to be cleared, but now that you mentioned
it I looked and there were almost 10k sessions in our table. I cleared
that out and it does not seem to be slow in that section anymore. I’ve
also added that command to crontab to run daily.

It seems much better after doing that, I don’t see any pages taking over
3-4 seconds to load anymore. I do still see some taking about 2-3 seconds
to load. That is more usable, although I would like to get it better if
possible. I turned on logging of SQL statements in RT only for now, I’m
not as familiar with Postgresql as I am with MySQL, but I could look into
logging in there if necessary.

We only have about 5 staff using the RT web interface. The longest queries
I see are still fairly quick I think, but here are a couple examples of the
longer ones for Display.html:

[Wed May 30 17:52:44 2012] [debug]: SQL(0.136027s): SELECT * FROM Groups
WHERE LOWER(Domain) = LOWER(?) AND LOWER(Type) = LOWER(?); [ bound values:
‘SystemInternal’ ‘Privileged’ ]
(/usr/share/request-tracker4/lib/RT/Interface/Web.pm:1115)

[Wed May 30 17:49:20 2012] [debug]: SQL(0.387888s): SELECT DISTINCT main.*
FROM Users main JOIN Principals Principals_1 ON ( Principals_1.id =
main.id) JOIN CachedGroupMembers CachedGroupMembers_2 ON (
CachedGroupMembers_2.MemberId = Principals_1.id ) WHERE
(Principals_1.Disabled = ‘0’) AND (CachedGroupMembers_2.GroupId = ‘25472’)
AND (CachedGroupMembers_2.Disabled = ‘0’) AND
(LOWER(Principals_1.PrincipalType) = ‘user’) ORDER BY main.Name ASC ;
(/usr/share/request-tracker4/lib/RT/Interface/Web.pm:1115)

Here is an example summary of a page load (with SQL logging on and Mason
profiling off) for Display.html:
48 Queries
4.1 Seconds page load
Longest query:
[Wed May 30 17:56:19 2012] [debug]: SQL(0.118095s): SELECT * FROM Groups
WHERE LOWER(Domain) = LOWER(?) AND LOWER(Type) = LOWER(?); [ bound values:
‘SystemInternal’ ‘Unprivileged’ ]
(/usr/share/request-tracker4/lib/RT/Interface/Web.pm:1115)

All other queries < 0.1 seconds, most are < 0.01 seconds

Thanks,
NateOn Wed, May 30, 2012 at 12:09 PM, Ruslan Zakirov ruz@bestpractical.comwrote:

On Wed, May 30, 2012 at 7:53 PM, Nathan Baker bakern@gmail.com wrote:

Ruslan,

I actually started the other thread, so my details are in the first post.
That thread was sort of hijacked (no big deal) and getting messy, so I
wanted to separate them. I’m using Postgresql, not MySQL, and had
already
turned on the SQL statement log and all queries seemed to be completing
quickly.

I know that your thread was hijacked, but this one lacks history and
for me it’s hard
to jump between threads to figure out background. Anyway, see comments
below.

Do you log SQL statements on RT side or in Pg? In your case I would
recommend
to enable both method for a while.

I think I might have been too quick to assume it’s the custom fields
though,
I’ve been testing it for a few hours now and I don’t see that part being
slow anymore. To be honest the only part I see being consistently slow
right now is:

=Mason= localhost - /Elements/SetupSessionCookie {{{
=Mason= localhost - /Elements/SetupSessionCookie }}} 1.9105

That seems to take about 2 seconds on about 30% of page loads. Is that
normal?

Nope. It’s either big session size or you don’t clean sessions table from
old
records with rt-clean-sessions.

SELECT COUNT(1) FROM sessions;

Above should say how many records you have in the table.

Also, sessions are locked and if you click a link or refresh while page
is still loading then time on SetupSessionCookie can contain time it
was waiting for lock to be released by previous request.

I’m not sure how you’re testing. If your DB is actively used by many RT
users then it’s hard to tell what’s slow.

I will try to turn off caching in Postgresql though to see if that
verifies
your theory about slow queries.

Ah. You migrated to Pg. Caching in Pg is different from mysql.

Thanks!
Nate


Best regards, Ruslan.

Ruslan,

I wasn’t aware that sessions had to be cleared, but now that you mentioned
it I looked and there were almost 10k sessions in our table. I cleared that
out and it does not seem to be slow in that section anymore. I’ve also
added that command to crontab to run daily.

It seems much better after doing that, I don’t see any pages taking over 3-4
seconds to load anymore. I do still see some taking about 2-3 seconds to
load. That is more usable, although I would like to get it better if
possible. I turned on logging of SQL statements in RT only for now, I’m not
as familiar with Postgresql as I am with MySQL, but I could look into
logging in there if necessary.

We only have about 5 staff using the RT web interface. The longest queries
I see are still fairly quick I think, but here are a couple examples of the
longer ones for Display.html:

[Wed May 30 17:52:44 2012] [debug]: SQL(0.136027s): SELECT * FROM Groups
WHERE LOWER(Domain) = LOWER(?) AND LOWER(Type) = LOWER(?); [ bound values:
‘SystemInternal’ ‘Privileged’ ]
(/usr/share/request-tracker4/lib/RT/Interface/Web.pm:1115)

This one is sort of “known issue”. We have code paths where LOWER is
used and is not used, but indexes for Pg on Groups table don’t account
this. Proper way would be to collect as many queries as possible that
select or join Groups table, do analysis and come up with set of
indexes for the table. I would love to take a look at such log.

For this particular query index on LOWER(Domain) would be enough or
(LOWER(Domain), LOWER(Type)) pair.

[Wed May 30 17:49:20 2012] [debug]: SQL(0.387888s): SELECT DISTINCT main.*
FROM Users main JOIN Principals Principals_1 ON ( Principals_1.id = main.id
) JOIN CachedGroupMembers CachedGroupMembers_2 ON (
CachedGroupMembers_2.MemberId = Principals_1.id ) WHERE
(Principals_1.Disabled = ‘0’) AND (CachedGroupMembers_2.GroupId = ‘25472’)
AND (CachedGroupMembers_2.Disabled = ‘0’) AND
(LOWER(Principals_1.PrincipalType) = ‘user’) ORDER BY main.Name ASC ;
(/usr/share/request-tracker4/lib/RT/Interface/Web.pm:1115)

Hard to tell anything without EXPLAIN ANALYZE… This query selects
users who are not disabled and recursive members of a group. Select
from Groups by id to see how important this query is.

Here is an example summary of a page load (with SQL logging on and Mason
profiling off) for Display.html:
48 Queries
4.1 Seconds page load
Longest query:
[Wed May 30 17:56:19 2012] [debug]: SQL(0.118095s): SELECT * FROM Groups
WHERE LOWER(Domain) = LOWER(?) AND LOWER(Type) = LOWER(?); [ bound values:
‘SystemInternal’ ‘Unprivileged’ ]
(/usr/share/request-tracker4/lib/RT/Interface/Web.pm:1115)

All other queries < 0.1 seconds, most are < 0.01 seconds

Use the following command to calculate how much time all 48 queries took:

cat part.of.rt.log | perl -ne ‘$res += (/SQL(([0-9.]+s))/)[0] || 0;
END { print “$res\n”}’

Question is how close the sum to 4 seconds.

Thanks,
Nate

Best regards, Ruslan.

Ruslan,

I guess what I was getting at is I don’t think the SQL queries are the
problem here. The sum (by using your perl code) was 0.324813 seconds, much
less than 4 seconds.

I’m still trying different Apache settings, Postgresql settings, etc., but
here’s a different way to explain the issue:

I can click on Home, then click on a ticket, then Home, then a ticket,
etc., for as long as I want and it is very fast (page load time is about
0.2-0.3 seconds).

If I wait for about 5 seconds, I can continue this without any change.

If I wait for even 10 seconds, or 30 seconds, or anything longer, when I
click on either Home or a ticket, the page takes about 2-4 seconds to load.
This will happen for both the Home page and a ticket Display page the first
time they are each displayed, and after that they are very fast again.
This is why it seems to me it has to be an Apache or Mason cache/timeout
issue. It’s almost like it’s compiling or building something again, which
is why I originally was drawn to the Javascript minifier suggestion.

I really appreciate your advice on this.

-NateOn Wed, May 30, 2012 at 3:49 PM, Ruslan Zakirov ruz@bestpractical.comwrote:

On Wed, May 30, 2012 at 10:15 PM, Nathan Baker bakern@gmail.com wrote:

Ruslan,

I wasn’t aware that sessions had to be cleared, but now that you
mentioned
it I looked and there were almost 10k sessions in our table. I cleared
that
out and it does not seem to be slow in that section anymore. I’ve also
added that command to crontab to run daily.

It seems much better after doing that, I don’t see any pages taking over
3-4
seconds to load anymore. I do still see some taking about 2-3 seconds to
load. That is more usable, although I would like to get it better if
possible. I turned on logging of SQL statements in RT only for now, I’m
not
as familiar with Postgresql as I am with MySQL, but I could look into
logging in there if necessary.

We only have about 5 staff using the RT web interface. The longest
queries
I see are still fairly quick I think, but here are a couple examples of
the
longer ones for Display.html:

[Wed May 30 17:52:44 2012] [debug]: SQL(0.136027s): SELECT * FROM Groups
WHERE LOWER(Domain) = LOWER(?) AND LOWER(Type) = LOWER(?); [ bound
values:
‘SystemInternal’ ‘Privileged’ ]
(/usr/share/request-tracker4/lib/RT/Interface/Web.pm:1115)

This one is sort of “known issue”. We have code paths where LOWER is
used and is not used, but indexes for Pg on Groups table don’t account
this. Proper way would be to collect as many queries as possible that
select or join Groups table, do analysis and come up with set of
indexes for the table. I would love to take a look at such log.

For this particular query index on LOWER(Domain) would be enough or
(LOWER(Domain), LOWER(Type)) pair.

[Wed May 30 17:49:20 2012] [debug]: SQL(0.387888s): SELECT DISTINCT
main.*
FROM Users main JOIN Principals Principals_1 ON ( Principals_1.id =
main.id
) JOIN CachedGroupMembers CachedGroupMembers_2 ON (
CachedGroupMembers_2.MemberId = Principals_1.id ) WHERE
(Principals_1.Disabled = ‘0’) AND (CachedGroupMembers_2.GroupId =
‘25472’)
AND (CachedGroupMembers_2.Disabled = ‘0’) AND
(LOWER(Principals_1.PrincipalType) = ‘user’) ORDER BY main.Name ASC ;
(/usr/share/request-tracker4/lib/RT/Interface/Web.pm:1115)

Hard to tell anything without EXPLAIN ANALYZE… This query selects
users who are not disabled and recursive members of a group. Select
from Groups by id to see how important this query is.

Here is an example summary of a page load (with SQL logging on and Mason
profiling off) for Display.html:
48 Queries
4.1 Seconds page load
Longest query:
[Wed May 30 17:56:19 2012] [debug]: SQL(0.118095s): SELECT * FROM Groups
WHERE LOWER(Domain) = LOWER(?) AND LOWER(Type) = LOWER(?); [ bound
values:
‘SystemInternal’ ‘Unprivileged’ ]
(/usr/share/request-tracker4/lib/RT/Interface/Web.pm:1115)

All other queries < 0.1 seconds, most are < 0.01 seconds

Use the following command to calculate how much time all 48 queries took:

cat part.of.rt.log | perl -ne ‘$res += (/SQL(([0-9.]+s))/)[0] || 0;
END { print “$res\n”}’

Question is how close the sum to 4 seconds.

Thanks,
Nate


Best regards, Ruslan.

Okay…just an update. This is definitely directly tied to the Apache
KeepAliveTimeout setting. The default was 15 seconds for my installation,
and if I change it to 10 seconds or 60 seconds that is exactly how long of
a wait is required to make it “slow” again.

So from here it looks like the options are:

  1. Something is rebuilding after the KeepAliveTimeout expires that
    shouldn’t be
    or
  2. Something should be rebuilding and my server is not doing that fast
    enough.

The server I’m using has one Xeon 2.4GHz CPU and 2GB Memory (it is
virtual). Any pointers on what’s happening when the KeepAliveTimout
expires would be greatly appreciated.

-Nate

Nathan,

It could be caused by granting wholesale permissions Globally for everyone
to Queues/Tickets and Custom Fields and that would make RT spend a lot of
time checking for permissions.

just a thought.

KennOn Wed, May 30, 2012 at 7:37 AM, Nathan Baker bakern@gmail.com wrote:

I’m going to try and separate this thread since my issue doesn’t seem to
be related to the Search page or the SelectOwner field. Using the Mason
Profiler did give me some info though, it looks like it might be due to
some custom fields:

=Mason= localhost -
/Ticket/Elements/ShowCustomFields {{{
=Mason= localhost -
/Elements/ShowCustomFields {{{
=Mason= localhost -
/Elements/ShowCustomFields }}} 2.0221
=Mason= localhost -
/Ticket/Elements/ShowCustomFields }}} 2.0261

=Mason= localhost - /autohandler }}} 3.2073

Is it normal that having 1-2 custom fields with about 10-20 options for
each would cause this extra delay. I know 2 seconds doesn’t seem like
much, but it can sometimes be longer. Can anyone give me any tips as to
what the issue could be? Is Mason maybe caching that differently?

Thanks,
Nate

Thanks Kenn, I checked and didn’t see any permissions globally set for
everyone, except the Create Ticket right is set for Everyone on each of our
queues.

I made a few more changes though and am considering the problem fixed at
this point. I found that the system was doing a lot of memory swapping,
even though I increased the memory from 512MB with RT 3.8.8 (and MySQL) to
2GB with 4.0.5 (and Postgresql). I disabled all debugging and heavy
logging, and adjusted the Apache configuration to increase the
KeepAliveTimeout to 60 and reduce the MinSpareServers and MaxSpareServers.
The apache processes were using between 60-100MB each (because of modperl
I think), so if you have 15 apache processes running that’s potentially
1.5GB. After making that change the system is “lightning fast” again. I
still might add 1-2GB of memory just to be safe, I just didn’t think that
much should be necessary.

I also have rt-clean-sessions running every night, which should help some.

Thank you everyone that helped!

Thanks Kenn, I checked and didn’t see any permissions globally set for
everyone, except the Create Ticket right is set for Everyone on each of our
queues.

I made a few more changes though and am considering the problem fixed at
this point. I found that the system was doing a lot of memory swapping,
even though I increased the memory from 512MB with RT 3.8.8 (and MySQL) to
2GB with 4.0.5 (and Postgresql). I disabled all debugging and heavy
logging, and adjusted the Apache configuration to increase the
KeepAliveTimeout to 60 and reduce the MinSpareServers and MaxSpareServers.
The apache processes were using between 60-100MB each (because of modperl I
think), so if you have 15 apache processes running that’s potentially 1.5GB.
After making that change the system is “lightning fast” again. I still
might add 1-2GB of memory just to be safe, I just didn’t think that much
should be necessary.

It shouldn’t be necessary if you know how to fit things in.

You don’t want KeepAliveTimeout to be very high. Keep alive at 60
seconds means that user when touched apache process holds it from
serving other users for 60 seconds even if he doesn’t do anything. 10
users hit the server within a minute - you need 11 apache processes to
serve next user. Your deployment is not configured for such values.

For big keep alive values you need two step processing with light
frontend and heavy backend. Frontend keep connections open and can
hold many of them with low footprint. For example take a look at the
following blog post:
http://blog.webfaction.com/a-little-holiday-present, especially memory
footprint chart.

As the backend you either use FCGI server running RT, your current
apache setup or something else.

Take a look at the following extension:

It generates config for nginx where a few features of the server and
knowledge of RT are used to lower memory footprint, increase
concurrency, lower page load times.

I also have rt-clean-sessions running every night, which should help some.

Thank you everyone that helped!

Best regards, Ruslan.

Ruslan,

I agree with your recommendation in general for most installations,
especially ones larger than ours. I don’t think increasing the
KeepAliveTimeout is necessary anymore now that I fixed the swapping issue,
because the initial page load does not take a long time anymore. However,
for our environment I know that there will never be more than 5 people
accessing the site at once, so we will never run into the scenario you were
giving as an example. The reason I left the KeepAliveTimeout at 60 was
because I’d like to have it as fast as possible once a user starts doing
something, and I think a user could look at a ticket for more than 15
seconds (the default KeepAliveTimeout in my apache configuration) and then
continue on, but it’s not as likely they would take longer than 60 seconds
in between clicks. For our environment it would be fine to have 5 apache
processes dedicated for 5 users at once.

Thanks for the information on the lightweight front-end, it looks like it
would help a lot, although I think it might be overkill for our relatively
small installation.

-NateOn Sat, Jun 2, 2012 at 6:00 PM, Ruslan Zakirov ruz@bestpractical.comwrote:

It shouldn’t be necessary if you know how to fit things in.

You don’t want KeepAliveTimeout to be very high. Keep alive at 60
seconds means that user when touched apache process holds it from
serving other users for 60 seconds even if he doesn’t do anything. 10
users hit the server within a minute - you need 11 apache processes to
serve next user. Your deployment is not configured for such values.

For big keep alive values you need two step processing with light
frontend and heavy backend. Frontend keep connections open and can
hold many of them with low footprint. For example take a look at the
following blog post:
http://blog.webfaction.com/a-little-holiday-present, especially memory
footprint chart.

As the backend you either use FCGI server running RT, your current
apache setup or something else.

Take a look at the following extension:

RT::Extension::Nginx - optimized request tracker within minutes - metacpan.org

It generates config for nginx where a few features of the server and
knowledge of RT are used to lower memory footprint, increase
concurrency, lower page load times.


Best regards, Ruslan.