Slow page load times in RT 4.4.1 with PostgreSQL


I am investigating a performance issue with RT 4.4.1. Occasionally, there are really slow page load times for either replying or displaying a ticket. When the slow display occurs, this is the query being executed on the DB. I have included the EXPLAIN ANALYZE for it below:

rt4_test=# EXPLAIN ANALYZE SELECT DISTINCT main.* FROM Users main CROSS JOIN ACL ACL_3 JOIN Principals Principals_1 ON ( = ) JOIN CachedGroupMembers CachedGroupMembers_2 ON ( CachedGroupMembers_2.MemberId = ) JOIN CachedGroupMembers CachedGroupMembers_4 ON ( CachedGroupMembers_4.MemberId = ) WHERE ((ACL_3.ObjectType = ‘RT::Queue’) OR (ACL_3.ObjectType = ‘RT::System’ AND ACL_3.ObjectId = 1)) AND (ACL_3.PrincipalId = CachedGroupMembers_4.GroupId) AND (ACL_3.PrincipalType = ‘Group’) AND (ACL_3.RightName = ‘OwnTicket’) AND (CachedGroupMembers_2.Disabled = ‘0’) AND (CachedGroupMembers_2.GroupId = ‘4’) AND (CachedGroupMembers_4.Disabled = ‘0’) AND (Principals_1.Disabled = ‘0’) AND (Principals_1.PrincipalType = ‘User’) AND ( != ‘1’) ORDER BY main.Name ASC;

            QUERY PLAN                                                                                                                                                                                         

Unique (cost=118.29…118.37 rows=1 width=345) (actual time=38516.262…38583.701 rows=327 loops=1)
-> Sort (cost=118.29…118.30 rows=1 width=345) (actual time=38516.260…38522.666 rows=58691 loops=1)
Sort Key:,, main.password, main.comments, main.signature, main.emailaddress, main.freeformcontactinfo, main.organization, main.realname, main.nickname, main.lang, main.gecos, main.home
phone, main.workphone, main.mobilephone, main.pagerphone, main.address1, main.address2,, main.state,,, main.timezone, main.creator, main.created, main.lastupdatedby, main.lastupda
ted, main.authtoken, main.smimecertificate
Sort Method: quicksort Memory: 25779kB
-> Nested Loop (cost=2.00…118.28 rows=1 width=345) (actual time=0.307…38021.526 rows=58691 loops=1)
Join Filter: ( =
-> Nested Loop (cost=1.58…117.83 rows=1 width=12) (actual time=0.300…37859.702 rows=58691 loops=1)
-> Nested Loop (cost=1.29…107.02 rows=2 width=16) (actual time=0.066…824.111 rows=607238 loops=1)
-> Nested Loop (cost=0.86…103.11 rows=1 width=8) (actual time=0.050…3.874 rows=345 loops=1)
-> Index Only Scan using disgroumem on cachedgroupmembers cachedgroupmembers_2 (cost=0.43…26.93 rows=9 width=4) (actual time=0.031…0.459 rows=347 loops=1)
Index Cond: ((groupid = 4) AND (disabled = ‘0’::smallint))
Heap Fetches: 347
-> Index Scan using principals_pkey on principals principals_1 (cost=0.43…8.46 rows=1 width=4) (actual time=0.009…0.009 rows=1 loops=347)
Index Cond: (id = cachedgroupmembers_2.memberid)
Filter: ((id <> 1) AND (disabled = ‘0’::smallint) AND ((principaltype)::text = ‘User’::text))
Rows Removed by Filter: 0
-> Index Scan using cachedgroupmembers1 on cachedgroupmembers cachedgroupmembers_4 (cost=0.43…3.32 rows=58 width=8) (actual time=0.010…2.069 rows=1760 loops=345)
Index Cond: (memberid =
Filter: (disabled = ‘0’::smallint)
Rows Removed by Filter: 0
-> Index Only Scan using acl1 on acl acl_3 (cost=0.29…5.39 rows=2 width=4) (actual time=0.061…0.061 rows=0 loops=607238)
Index Cond: ((rightname = ‘OwnTicket’::text) AND (principaltype = ‘Group’::text) AND (principalid = cachedgroupmembers_4.groupid))
Filter: (((objecttype)::text = ‘RT::Queue’::text) OR (((objecttype)::text = ‘RT::System’::text) AND (objectid = 1)))
Heap Fetches: 58691
-> Index Scan using users_pkey on users main (cost=0.42…0.44 rows=1 width=345) (actual time=0.002…0.002 rows=1 loops=58691)
Index Cond: (id = cachedgroupmembers_4.memberid)
Planning time: 2.543 ms
Execution time: 38585.128 ms
(28 rows)

Time: 38588.790 ms

The page display time for v4.4.1 is 77.918598

The same page v3.8.13 only takes 4.160656s

Wow. It does not appear to be based on SQL ACL checks, since for the problem tickets, it happens, independently of that setting. Does anyone have any ideas for debugging this further?


I am a bit embarassed, but I thought the table had been analyzed by autovacuum. It was a test system and did not receive enough updates to trigger the analyze. I ran an ANALYZE and a VACUUM FREEZE and performance is very good across the board. I just wanted to keep this information in the post.