Poor performance in upgrade from RT2 to RT3

Carl_Gibbons · August 27, 2003, 5:36am

I am trying to migrate an 8,000+ ticket system.

Existing system:
Red Hat 7.1
perl 5.6.0 installed from RPMs
RT 2.0.13
Apache 1.3.23/mod_perl 1.23
PostgreSQL 7.1.3, installed from RPMs
New system:
Red Hat ES 2.1
perl 5.8.0 (compiled from source)
RT 3.0.4
Apache 1.3.28/mod_perl 1.28
PostgreSQL 7.3.4, compiled from source

Both systems are dual-processor 700MHz PIII with 1024MB RAM.
(I’m not sure, but I think both have the same SCSI disk and
RAID configuration, too.) The rt2-to-rt3-v1.20 tool successfully
moved users and tickets from the existing system to the new one.

Logging into the RT2 system and clicking on “Home” to display the
"25 highest priority tickets I own" takes a fraction of a second.

Logging into the RT3 system and clicking on “Home” to display the
"10 highest priority tickets I own" takes about 50 seconds.

Running “top” on the new RT3 system while I click on “Home” shows
that a PostgreSQL “postmaster” process pegs one of the CPUs at 99%
throughout those 50 seconds.

(I tried increasing the shared_buffers parameter in postgresql.conf,
but it did not help.) Any ideas on how to make RT3 give me the
fraction-of-a-second response to which I am accustomed?

– Carl Gibbons, Network Security Engineer, University of Denver

Jim_Rowan · August 27, 2003, 2:51pm

I don’t know how much it will help, but Postgresql’s defaults aren’t
appropriate for a production machine with 1G. This page is the best
I’ve found for tuning instructions:

http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

I’d start there.
Jim Rowan-----Original Message-----
From: Carl Gibbons [mailto:cgibbons@du.edu]
Sent: Wednesday, August 27, 2003 12:36 AM
To: rt-users@lists.fsck.com
Subject: [rt-users] Poor performance in upgrade from RT2 to RT3

I am trying to migrate an 8,000+ ticket system.

Existing system:
Red Hat 7.1
perl 5.6.0 installed from RPMs
RT 2.0.13
Apache 1.3.23/mod_perl 1.23
PostgreSQL 7.1.3, installed from RPMs
New system:
Red Hat ES 2.1
perl 5.8.0 (compiled from source)
RT 3.0.4
Apache 1.3.28/mod_perl 1.28
PostgreSQL 7.3.4, compiled from source

Both systems are dual-processor 700MHz PIII with 1024MB RAM.
(I’m not sure, but I think both have the same SCSI disk and
RAID configuration, too.) The rt2-to-rt3-v1.20 tool successfully
moved users and tickets from the existing system to the new one.

Logging into the RT2 system and clicking on “Home” to display the
“25 highest priority tickets I own” takes a fraction of a second.

Logging into the RT3 system and clicking on “Home” to display the
“10 highest priority tickets I own” takes about 50 seconds.

Running “top” on the new RT3 system while I click on “Home” shows
that a PostgreSQL “postmaster” process pegs one of the CPUs at 99%
throughout those 50 seconds.

(I tried increasing the shared_buffers parameter in postgresql.conf,
but it did not help.) Any ideas on how to make RT3 give me the
fraction-of-a-second response to which I am accustomed?

– Carl Gibbons, Network Security Engineer, University of Denver

rt-users mailing list
rt-users@lists.fsck.com
http://lists.fsck.com/mailman/listinfo/rt-users

Have you read the FAQ? The RT FAQ Manager lives at http://fsck.com/rtfm

Vivek_Khera · August 27, 2003, 3:11pm

“JR” == Jim Rowan jim.rowan@starcore-dsp.com writes:

JR> I don’t know how much it will help, but Postgresql’s defaults aren’t
JR> appropriate for a production machine with 1G. This page is the best
JR> I’ve found for tuning instructions:

Here’s my 2GB RAM server config. I haven’t figured out what I’m gonna
do with my replacement which has 4GB…

The most important settings for me have been the max_fsm_* settings.
for RT, you probably don’t need as big max_fsm_pages, but this DB
server does many other things… for 1GB you may wish to lower
sort_mem a bit depending on what other things you’re doing.
everything else is pretty much minor tweaking for my main application.

tcpip_socket = true

max_connections = 64

shared_buffers = 60000
max_fsm_relations = 1000
max_fsm_pages = 1000000

sort_mem = 8192
vacuum_mem = 131702

effective_cache_size = 25600 # about 200M
random_page_cost = 2

syslog = 2 # range 0-2
syslog_facility = ‘LOCAL0’
syslog_ident = ‘postgres’

stats_command_string = true
stats_row_level = true
stats_block_level = true

Andrew_J_Korty · August 28, 2003, 12:35pm

Carl Gibbons cgibbons@du.edu writes:

Logging into the RT2 system and clicking on “Home” to display the
“25 highest priority tickets I own” takes a fraction of a second.

Logging into the RT3 system and clicking on “Home” to display the
“10 highest priority tickets I own” takes about 50 seconds.

I have the same problem only slightly worse. Looks like the offending
query is

SELECT DISTINCT main.* FROM ((((Tickets main JOIN Groups as
Groups_1 ON ( main.id = Groups_1.Instance)) JOIN Principals as
Principals_2 ON ( Groups_1.id = Principals_2.ObjectId)) JOIN
CachedGroupMembers as CachedGroupMembers_3 ON ( Principals_2.id =
CachedGroupMembers_3.GroupId)) JOIN Users as Users_4 ON (
CachedGroupMembers_3.MemberId = Users_4.id)) WHERE
((main.EffectiveId = main.id)) AND ((main.Type = 'ticket')) AND (
( ( (lower(Users_4.EmailAddress) =
'w2kadmin@exchange.indiana.edu')AND(Groups_1.Domain =
'RT::Ticket-Role')AND(Groups_1.Type =
'Requestor')AND(Principals_2.PrincipalType = 'Group') ) ) AND (
(main.Status = 'new')OR(main.Status = 'open') ) ) ORDER BY
main.Priority DESC LIMIT 10

A similar query is done when displaying an individual ticket so it can
generate the list of ten most interesting tickets by the same
originator.

EXPLAIN ANALYZE gives me the attached query plan. The only way I
could avoid the sequential scan (apart from disabling them altogether)
was to create the very specific index

CREATE INDEX groups_domain_role_type_requestor ON groups (domain, type)
WHERE domain = 'RT::Ticket-Role' AND type = 'Requestor';

That index reduces the duration of the query from 75 to 50 seconds for
me. Still not good enough, but it helps. I couldn’t get the query
planner to use a full index on (domain, type) or even simple indexes
on (domain) and (type). Not sure why.

I’ve tried other permutations of indexes but haven’t found one that
contributes more than 1%.

Andrew J. Korty, Principal Security Engineer, GCIA, GCFA
Office of the Vice President for Information Technology
Indiana University

query_plan (3.79 KB)

Andrew_J_Korty · August 29, 2003, 7:26pm

I just discovered this query is used by the Mason components that list
the highest priority tickets requested by you (on the Home page) or
by the requestor (on a Ticket page). We don’t really need these, so I
just commented them out.

Unfortunately, loading a ticket still takes 5 seconds or so.

Andrew J. Korty, Principal Security Engineer, GCIA, GCFA
Office of the Vice President for Information Technology
Indiana University

Carl_Gibbons · September 17, 2003, 4:56pm

Thanks for the replies. I have since upgraded to RT 3.0.5, and I have
tuned a few more things in postgresql.conf. Still very slow. I set the
following in postgresql.conf:

log_statement = true
log_duration = true
log_timestamp = true

to reveal the inefficient RT3 query, copied below. I suspect that some
“create index…” hacking might improve this horrible query, but I’m not
a DBA and I don’t know how to analyze this one. Would somebody on this
list please help me figure out what to do about this?

Carl

2003-09-17 16:19:16 LOG: query: SELECT DISTINCT main.* FROM Tickets
main , Groups Groups_1, Principals Principals_2, CachedGroupMembers
CachedGroupMembers_3, Users Users_4 WHERE ((main.EffectiveId =
main.id)) AND ((main.Type = ‘ticket’)) AND ( ( (
(lower(Users_4.EmailAddress) = ‘cgibbons@du.edu’)AND(Groups_1.Domain =
‘RT::Ticket-Role’)AND(Groups_1.Type =
‘Requestor’)AND(Principals_2.PrincipalType = ‘Group’) ) ) AND (
(main.Status = ‘new’)OR(main.Status = ‘open’) ) ) AND main.id =
Groups_1.Instance AND Groups_1.id = Principals_2.ObjectId AND
Principals_2.id = CachedGroupMembers_3.GroupId AND
CachedGroupMembers_3.MemberId = Users_4.id ORDER BY main.Priority DESC
LIMIT 10
2003-09-17 16:20:27 LOG: duration: 71.308111 sec

About a couple of weeks ago, Carl Gibbons wrote:

wesley.sheldahl · September 17, 2003, 5:24pm

First, run that query inside of psql, then run the same query with EXPLAIN
ANALYZE added to the front; it should look like:
EXPLAIN ANALYZE SELECT DISTINCT main.* FROM Tickets main …

with the rest the same. The output should suggest which parts of the query
are taking longest, which indexes are being used (if any), and so on.
Indexing fields that are used in WHERE and ORDER BY clauses generally makes
SELECT statements faster.

If you need more detailed query analysis after doing that, I would post the
query and the output from EXPLAIN ANALYZE to the postgresql users mailing
list. They’ll be able to make more specific suggestions based on that
information.

Wes Sheldahl
iHigh NetOpsFrom: Carl Gibbons [mailto:cgibbons@du.edu]
Sent: Wednesday, September 17, 2003 12:57 PM
To: rt-users@lists.fsck.com
Cc: bshafer@du.edu
Subject: Re: [rt-users] Poor performance in upgrade from RT2 to RT3

Thanks for the replies. I have since upgraded to RT 3.0.5, and I have
tuned a few more things in postgresql.conf. Still very slow. I set the
following in postgresql.conf:

log_statement = true
log_duration = true
log_timestamp = true

to reveal the inefficient RT3 query, copied below. I suspect that some
“create index…” hacking might improve this horrible query, but I’m not
a DBA and I don’t know how to analyze this one. Would somebody on this
list please help me figure out what to do about this?

Carl

2003-09-17 16:19:16 LOG: query: SELECT DISTINCT main.* FROM Tickets
main , Groups Groups_1, Principals Principals_2, CachedGroupMembers
CachedGroupMembers_3, Users Users_4 WHERE ((main.EffectiveId =
main.id)) AND ((main.Type = ‘ticket’)) AND ( ( (
(lower(Users_4.EmailAddress) = ‘cgibbons@du.edu’)AND(Groups_1.Domain =
‘RT::Ticket-Role’)AND(Groups_1.Type =
‘Requestor’)AND(Principals_2.PrincipalType = ‘Group’) ) ) AND (
(main.Status = ‘new’)OR(main.Status = ‘open’) ) ) AND main.id =
Groups_1.Instance AND Groups_1.id = Principals_2.ObjectId AND
Principals_2.id = CachedGroupMembers_3.GroupId AND
CachedGroupMembers_3.MemberId = Users_4.id ORDER BY main.Priority DESC
LIMIT 10
2003-09-17 16:20:27 LOG: duration: 71.308111 sec

About a couple of weeks ago, Carl Gibbons wrote:

I am trying to migrate an 8,000+ ticket system.

Existing system:
Red Hat 7.1
perl 5.6.0 installed from RPMs
RT 2.0.13
Apache 1.3.23/mod_perl 1.23
PostgreSQL 7.1.3, installed from RPMs
New system:
Red Hat ES 2.1
perl 5.8.0 (compiled from source)
RT 3.0.4
Apache 1.3.28/mod_perl 1.28
PostgreSQL 7.3.4, compiled from source

Both systems are dual-processor 700MHz PIII with 1024MB RAM.
(I’m not sure, but I think both have the same SCSI disk and
RAID configuration, too.) The rt2-to-rt3-v1.20 tool successfully
moved users and tickets from the existing system to the new one.

Logging into the RT2 system and clicking on “Home” to display the
“25 highest priority tickets I own” takes a fraction of a second.

Logging into the RT3 system and clicking on “Home” to display the
“10 highest priority tickets I own” takes about 50 seconds.

Running “top” on the new RT3 system while I click on “Home” shows
that a PostgreSQL “postmaster” process pegs one of the CPUs at 99%
throughout those 50 seconds.

(I tried increasing the shared_buffers parameter in postgresql.conf,
but it did not help.) Any ideas on how to make RT3 give me the
fraction-of-a-second response to which I am accustomed?

– Carl Gibbons, Network Security Engineer, University of Denver

rt-users mailing list
rt-users@lists.fsck.com
http://lists.fsck.com/mailman/listinfo/rt-users

Have you read the FAQ? The RT FAQ Manager lives at http://fsck.com/rtfm

rt-users mailing list
rt-users@lists.fsck.com
http://lists.fsck.com/mailman/listinfo/rt-users

Have you read the FAQ? The RT FAQ Manager lives at http://fsck.com/rtfm

Jesse_Vincent · September 17, 2003, 5:52pm

Thanks for the replies. I have since upgraded to RT 3.0.5, and I have
tuned a few more things in postgresql.conf. Still very slow. I set the
following in postgresql.conf:

log_statement = true
log_duration = true
log_timestamp = true

to reveal the inefficient RT3 query, copied below. I suspect that some
“create index…” hacking might improve this horrible query, but I’m not
a DBA and I don’t know how to analyze this one. Would somebody on this
list please help me figure out what to do about this?

Ok. It appears that the fix for infinite looping in postgres with
DBIx::SearchBuilder 0.92 broke some of the optimization we do there.

Jesse

Carl

2003-09-17 16:19:16 LOG: query: SELECT DISTINCT main.* FROM Tickets
main , Groups Groups_1, Principals Principals_2, CachedGroupMembers
CachedGroupMembers_3, Users Users_4 WHERE ((main.EffectiveId =
main.id)) AND ((main.Type = ‘ticket’)) AND ( ( (
(lower(Users_4.EmailAddress) = ‘cgibbons@du.edu’)AND(Groups_1.Domain =
‘RT::Ticket-Role’)AND(Groups_1.Type =
‘Requestor’)AND(Principals_2.PrincipalType = ‘Group’) ) ) AND (
(main.Status = ‘new’)OR(main.Status = ‘open’) ) ) AND main.id =
Groups_1.Instance AND Groups_1.id = Principals_2.ObjectId AND
Principals_2.id = CachedGroupMembers_3.GroupId AND
CachedGroupMembers_3.MemberId = Users_4.id ORDER BY main.Priority DESC
LIMIT 10
2003-09-17 16:20:27 LOG: duration: 71.308111 sec
Request Tracker... So much more than a help desk — Best Practical Solutions – Trouble Ticketing. Free.

Ruslan_U_Zakirov · September 18, 2003, 9:24am

Jesse Vincent wrote:

Thanks for the replies. I have since upgraded to RT 3.0.5, and I have
tuned a few more things in postgresql.conf. Still very slow. I set the
following in postgresql.conf:

log_statement = true
log_duration = true
log_timestamp = true

to reveal the inefficient RT3 query, copied below. I suspect that some
“create index…” hacking might improve this horrible query, but I’m not
a DBA and I don’t know how to analyze this one. Would somebody on this
list please help me figure out what to do about this?

Ok. It appears that the fix for infinite looping in postgres with
DBIx::SearchBuilder 0.92 broke some of the optimization we do there.

Jesse

This request is one of major perfomance issues for me now.
Search by requestor email is slow under mySQL too, but if I add another
email then mySQL hangs for ages.
It’s very offten when our workers do misstake in email and then without
deleting condition add another one and mySQL try to do its job. This
search locks user’s session for long time(~15 hours, one of this
requests has finished ) Same things happen with other search
conditions when you add another one of same type and they joins on OR
clause.
Best regards. Ruslan.

Jesse_Vincent · September 18, 2003, 2:18pm

This request is one of major perfomance issues for me now.
Search by requestor email is slow under mySQL too, but if I add another
email then mySQL hangs for ages.

Interesting. The problem manifested there was something postgres
specific. RT 3.0.5 was supposed to have largely fixed the perf issue
with searching on multiple requestors. Can you capture one of the
queries it generates and show us?

It’s very offten when our workers do misstake in email and then without
deleting condition add another one and mySQL try to do its job. This
search locks user’s session for long time(~15 hours, one of this
requests has finished )

Interesting. Session lockfiles are supposed to time out a good deal more
quickly than that, I thought.

Same things happen with other search
conditions when you add another one of same type and they joins on OR
clause.
Best regards. Ruslan.

Request Tracker... So much more than a help desk — Best Practical Solutions – Trouble Ticketing. Free.

Ruslan_U_Zakirov · September 18, 2003, 2:27pm

Jesse Vincent wrote:

This request is one of major perfomance issues for me now.
Search by requestor email is slow under mySQL too, but if I add another
email then mySQL hangs for ages.

Interesting. The problem manifested there was something postgres
specific. RT 3.0.5 was supposed to have largely fixed the perf issue
with searching on multiple requestors. Can you capture one of the
queries it generates and show us?

You are saying about wrong additional alias that was removed in 3.0.5?
If so then it’s not an issue. Now I’m writing more corect bug report for
this. Just one second.

It’s very offten when our workers do misstake in email and then without
deleting condition add another one and mySQL try to do its job. This
search locks user’s session for long time(~15 hours, one of this
requests has finished )

Interesting. Session lockfiles are supposed to time out a good deal more
quickly than that, I thought.

No. Timeout of ‘select get_lock’ is ok, but query is spinning and thus
session is still locked.

Carl_Gibbons · September 25, 2003, 9:54pm

Thanks again for the replies. I tried “EXPLAIN ANALYZE” on the
postgresql query, and it showed two costly sequential scans:

                                   ->  Seq Scan on tickets main

(cost=0.00…578.76 rows=1 width=176) (actual time=35.04…97.15 rows=133
loops=1)
Filter: ((effectiveid = id)
AND (“type” = ‘ticket’::character varying) AND ((status =
‘new’::character varying) OR (status = ‘open’::character varying)))
→ Seq Scan on groups groups_1
(cost=0.00…749.70 rows=4077 width=11) (actual time=12.57…493.43
rows=8011 loops=133)
Filter: ((“domain” =
‘RT::Ticket-Role’::character varying) AND (“type” =
‘Requestor’::character varying))

So I created a couple of indexes:

CREATE INDEX tickets_status_type ON tickets (status, type);
CREATE INDEX groups_domain_type ON groups (domain, type);

This reduces the query time from 71 seconds to 34 seconds, and EXPLAIN
ANALYZE shows that the two sequential scans replaced by index scans on
the two new indexes. Unfortunately, this is still about 34 seconds
slower than the snappy response I get from RT2. Please help me figure
out what to try next?

Here’s a complete copy of the EXPLAIN ANALYZE output, in case it helps.

Carl

Limit (cost=1047.83…1047.90 rows=1 width=208) (actual
time=34893.55…34893.62 rows=1 loops=1)
→ Unique (cost=1047.83…1047.90 rows=1 width=208) (actual
time=34893.55…34893.61 rows=1 loops=1)
→ Sort (cost=1047.83…1047.84 rows=1 width=208) (actual
time=34893.54…34893.54 rows=2 loops=1)
Sort Key: main.priority, main.id, main.effectiveid,
main.queue, main.“type”, main.issuestatement, main.resolution,
main.“owner”, main.subject, main.initialpriority, main.finalpriority,
main.timeestimated, main.timeworked, main.status, main.timeleft,
main.told, main.starts, main.started, main.due, main.resolved,
main.lastupdatedby, main.lastupdated, main.creator, main.created,
main.disabled
→ Nested Loop (cost=0.00…1047.82 rows=1 width=208)
(actual time=18305.49…34892.70 rows=2 loops=1)
→ Nested Loop (cost=0.00…1041.00 rows=2
width=204) (actual time=228.24…34878.69 rows=411 loops=1)
→ Nested Loop (cost=0.00…1036.77 rows=1
width=196) (actual time=227.89…34852.07 rows=133 loops=1)
→ Nested Loop (cost=0.00…1033.74
rows=1 width=188) (actual time=227.43…34831.60 rows=133 loops=1)
Join Filter: ((“outer”.id)::text
= (“inner”.instance)::text)
→ Index Scan using
tickets_status_type, tickets_status_type on tickets main
(cost=0.00…224.77 rows=1 width=177) (actual time=0.67…15.71 rows=133
loops=1)
Index Cond: (((status =
‘new’::character varying) AND (“type” = ‘ticket’::character varying)) OR
((status = ‘open’::character varying) AND (“type” = ‘ticket’::character
varying)))
Filter: ((effectiveid = id)
AND (“type” = ‘ticket’::character varying) AND ((status =
‘new’::character varying) OR (status = ‘open’::character varying)))
→ Index Scan using
groups_domain_type on groups groups_1 (cost=0.00…718.32 rows=6044
width=11) (actual time=0.29…198.54 rows=8011 loops=133)
Index Cond: ((“domain” =
‘RT::Ticket-Role’::character varying) AND (“type” =
‘Requestor’::character varying))
→ Index Scan using principals2 on
principals principals_2 (cost=0.00…3.02 rows=1 width=8) (actual
time=0.09…0.10 rows=1 loops=133)
Index Cond: (“outer”.id =
principals_2.objectid)
Filter: (principaltype =
‘Group’::character varying)
→ Index Scan using cachedgroupmembers3 on
cachedgroupmembers cachedgroupmembers_3 (cost=0.00…4.09 rows=11
width=8) (actual time=0.05…0.07 rows=3 loops=133)
Index Cond: (“outer”.id =
cachedgroupmembers_3.groupid)
→ Index Scan using users_pkey on users users_4
(cost=0.00…3.02 rows=1 width=4) (actual time=0.03…0.03 rows=0 loops=411)
Index Cond: (“outer”.memberid = users_4.id)
Filter: (lower((emailaddress)::text) =
‘cgibbons@du.edu’::text)
Total runtime: 34895.20 msec
(23 rows)

Carl Gibbons wrote: