Performance Issues after 3.8.0 upgrade -- PostgreSQL delays

Hello everyone,

We recently upgraded from 3.6.6 to 3.8.0 and after the upgrade, we
immediately noticed delays loading pages within RT-3.8.0 website.
Currently, the web interface is hosted on a Gentoo Linux system
running apache2-2.8 with modperl 2.0.3 (portage) and Postfix MTA.
RT3.8 was built from source.
The database server is on a separate Gentoo Linux system running
Postgresql-8.1.11 (portage). We have roughly 6 concurrent users at any
time using RT.
The database server shows us the process postmaster running at 85-95%
CPU (5min avg load is around 3.88 - 4.5) during RT web clicks.
Being relatively naive to postgres performance tuning, I’m not sure if
this post is relevant to this list or not.

Additional Server Details:

tickets web/postfix has about 1GB RAM and is only running
Postfix/Apache. The system load is always under 1, about 0.10 average
RT-3.8.0 uses the following build options:

#!/bin/bash
./configure --with-bin-owner=root --with-libs-owner=root
–with-libs-group=bin
–with-db-type=Pg --with-db-host=copper.example.org
–with-db-port=5432
–with-db-rt-host=tickets.example.org
–with-db-dba=******** --with-db-database=rt3
–with-db-rt-user=rt --with-db-rt-pass=**********
–with-web-user=apache --with-web-group=apache --with-rt-group=rt
–enable-graphviz --enable-gd --enable-gpg

postgres “copper” server has 1GB RAM and is only running Postgres for
our RT3 installation. Prior to the RT-3.8 upgrade the server load
around 0.15

Here’s our postgresql.conf:

max_connections = 100
shared_buffers = 1000
lc_messages = ‘C’
lc_monetary = ‘C’
lc_numeric = ‘C’
lc_time = ‘C’
stats_start_collector = true
stats_row_level = true
listen_addresses = ‘*’
shared_buffers = 48000
max_prepared_transactions = 64
work_mem = 4096
maintenance_work_mem = 32767
max_stack_depth = 7168
fsync = off
effective_cache_size = 4000
autovacuum = on
autovacuum_naptime = 300
lc_messages = ‘en_US’
lc_monetary = ‘en_US’
lc_numeric = ‘en_US’
lc_time = ‘en_US’

Here’s a ‘vmstat 1’ from “copper” / postresql box:

copper /var/log# vmstat 1
procs -----------memory---------- —swap-- -----io---- --system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 41080 50316 19004 704144 0 0 1 0 1 1 4 1 95 0
0 0 41080 50316 19004 704144 0 0 0 0 145 59 0 0 100 0
0 0 41080 50316 19004 704144 0 0 0 0 105 18 0 0 100 0
1 0 41080 50316 19004 704212 0 0 0 0 617 540 5 2 93 0
2 0 41080 50316 19012 704204 0 0 0 1228 367 330 57 13 30 1
1 0 41080 50316 19012 704204 0 0 0 0 490 750 64 11 25 0
1 0 41080 49944 19012 704340 0 0 0 0 387 711 69 10 21 0
1 0 41080 49944 19012 704340 0 0 0 0 213 230 23 15 62 0
1 0 41080 49944 19012 704340 0 0 0 0 211 217 25 15 59 0
1 0 41080 49944 19020 704332 0 0 0 584 253 259 25 17 58 0
0 0 41080 49884 19020 704400 0 0 60 0 205 219 24 16 58 2
1 0 41080 49824 19020 704400 0 0 48 0 211 222 19 22 58 0
1 0 41080 49764 19020 704468 0 0 72 0 197 198 28 13 58 1
0 0 41080 49704 19020 704604 0 0 56 0 217 228 18 16 66 0
0 0 41080 49704 19028 704596 0 0 0 112 113 46 0 0 100 0
0 0 41080 49704 19028 704596 0 0 0 0 104 22 0 0 100 0
0 0 41080 49704 19028 704596 0 0 0 0 105 20 0 0 100 0

copper log /var/log#

We have a 2nd pair of identical servers running a copy of the
Postgresql RT-3.6.6 database (for testing purposes) and do not see
these slow downs (same hardware,software).

Does anyone have any suggestions where to tackle this? Is downgrading an option?

Thanks everyone,

-Jessie

Hello everyone,

We recently upgraded from 3.6.6 to 3.8.0 and after the upgrade, we
immediately noticed delays loading pages within RT-3.8.0 website.
Currently, the web interface is hosted on a Gentoo Linux system
running apache2-2.8 with modperl 2.0.3 (portage) and Postfix MTA.
RT3.8 was built from source.
The database server is on a separate Gentoo Linux system running
Postgresql-8.1.11 (portage). We have roughly 6 concurrent users at any
time using RT.
The database server shows us the process postmaster running at 85-95%
CPU (5min avg load is around 3.88 - 4.5) during RT web clicks.
Being relatively naive to postgres performance tuning, I’m not sure if
this post is relevant to this list or not.

Are you doing automated vacuum analyzes? Do you have Postgres’ slow
query log on? If so, what does it report?

Are you doing automated vacuum analyzes? Do you have Postgres’ slow query
log on? If so, what does it report?

AFAICT, postgresql.conf show me that autovacuum is on:

autovacuum = on
autovacuum_naptime = 300

I don’t see anything about slow query logs (is that set via
log_min_duration_statement ?)
If I need to set log_min_duration_statement, any recommendation value?

Update-

Ok, after several hours of logging, what I see most common in the slow
query (1s) log is:

LOG: duration: 3151.786 ms statement: EXECUTE [PREPARE:
SELECT * FROM Groups WHERE LOWER(Domain) = LOWER($1) AND LOWER(Type)
= LOWER($2)]

Several hundred of these queries in the slow query log data.

Update-

Ok, after several hours of logging, what I see most common in the slow
query (1s) log is:

LOG: duration: 3151.786 ms statement: EXECUTE [PREPARE:
SELECT * FROM Groups WHERE LOWER(Domain) = LOWER($1) AND LOWER(Type)
= LOWER($2)]

Several hundred of these queries in the slow query log data.

Any suggestions - or more information I can provide? the slow query
log is pretty lengthy…
It’s most common slow query is listed above, as well as many of these below:

<2008-07-17 12:49:03 PDT> LOG: duration: 3737.479 ms statement:
EXECUTE [PREPARE: SELECT * FROM GroupMembers WHERE
GroupId = $1 AND MemberId = $2]

<2008-07-17 17:20:43 PDT> LOG: duration: 8930.945 ms statement:
SELECT DISTINCT main.* FROM Users main CROSS JOIN ACL ACL_4 JOIN
Principals Principals_1 ON ( Principals_1.id = main.id ) JOIN
CachedGroupMembers CachedGroupMembers_2 ON (
CachedGroupMembers_2.MemberId = Principals_1.id ) JOIN Groups Groups_3
ON ( Groups_3.id = CachedGroupMembers_2.GroupId ) WHERE
(Principals_1.Disabled = ‘0’) AND (ACL_4.PrincipalType =
Groups_3.Type) AND (Principals_1.id != ‘1’) AND
(Principals_1.PrincipalType = ‘User’) AND (ACL_4.RightName =
‘OwnTicket’) AND (Groups_3.Domain = ‘RT::System-Role’) AND
((ACL_4.ObjectType = ‘RT::Queue’) OR (ACL_4.ObjectType =
‘RT::System’)) ORDER BY main.Name ASC

<2008-07-17 13:42:06 PDT> LOG: duration: 9115.251 ms statement:
SELECT DISTINCT main.* FROM Tickets main CROSS JOIN Users Users_3 JOIN
Groups Groups_1 ON ( Groups_1.Domain = ‘RT::Ticket-Role’ ) AND (
Groups_1.Type = ‘Requestor’ ) AND ( Groups_1.Instance = main.id ) JOIN
CachedGroupMembers CachedGroupMembers_2 ON (
CachedGroupMembers_2.MemberId = Users_3.id ) AND (
CachedGroupMembers_2.GroupId = Groups_1.id ) WHERE (Users_3.id =
‘5698’) AND (main.Status != ‘deleted’) AND ( ( CachedGroupMembers_2.id
IS NOT NULL ) AND ( main.Status = ‘open’ OR main.Status = ‘new’ ) )
AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id) ORDER BY
main.Priority DESC LIMIT 10

<2008-07-17 13:42:12 PDT> LOG: duration: 9128.360 ms statement:
SELECT DISTINCT main.* FROM Tickets main CROSS JOIN Users Users_3 JOIN
Groups Groups_1 ON ( Groups_1.Domain = ‘RT::Ticket-Role’ ) AND (
Groups_1.Type = ‘Requestor’ ) AND ( Groups_1.Instance = main.id ) JOIN
CachedGroupMembers CachedGroupMembers_2 ON (
CachedGroupMembers_2.MemberId = Users_3.id ) AND (
CachedGroupMembers_2.GroupId = Groups_1.id ) WHERE (Users_3.id =
‘5698’) AND (main.Status != ‘deleted’) AND ( ( CachedGroupMembers_2.id
IS NOT NULL ) AND ( main.Status = ‘open’ OR main.Status = ‘new’ ) )
AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id) ORDER BY
main.Priority DESC LIMIT 10

I’m not big expert on optimizing Pg queries using indexes. Below
you’ll find suggestions for some.

But please run EXPLAIN for each query before creating any index and
after. Without explains we can not help you precisely, just guessing.
Also, please use some real constants in EXPLAIN, random strings and
numbers will generate plans far from reality.

Update-

Ok, after several hours of logging, what I see most common in the slow
query (1s) log is:

LOG: duration: 3151.786 ms statement: EXECUTE [PREPARE:
SELECT * FROM Groups WHERE LOWER(Domain) = LOWER($1) AND LOWER(Type)
= LOWER($2)]

EXPLAIN …
CREATE INDEX RUZ_G1 ON Groups(LOWER(Domain), LOWER(Type));
EXPLAIN …

Several hundred of these queries in the slow query log data.

Any suggestions - or more information I can provide? the slow query
log is pretty lengthy…
It’s most common slow query is listed above, as well as many of these below:

<2008-07-17 12:49:03 PDT> LOG: duration: 3737.479 ms statement:
EXECUTE [PREPARE: SELECT * FROM GroupMembers WHERE
GroupId = $1 AND MemberId = $2]
EXPLAIN …
CREATE INDEX RUZ_GM1 ON GroupMembers(GroupId, MemberId);
EXPLAIN …

<2008-07-17 17:20:43 PDT> LOG: duration: 8930.945 ms statement:
SELECT DISTINCT main.* FROM Users main CROSS JOIN ACL ACL_4 JOIN
Principals Principals_1 ON ( Principals_1.id = main.id ) JOIN
CachedGroupMembers CachedGroupMembers_2 ON (
CachedGroupMembers_2.MemberId = Principals_1.id ) JOIN Groups Groups_3
ON ( Groups_3.id = CachedGroupMembers_2.GroupId ) WHERE
(Principals_1.Disabled = ‘0’) AND (ACL_4.PrincipalType =
Groups_3.Type) AND (Principals_1.id != ‘1’) AND
(Principals_1.PrincipalType = ‘User’) AND (ACL_4.RightName =
‘OwnTicket’) AND (Groups_3.Domain = ‘RT::System-Role’) AND
((ACL_4.ObjectType = ‘RT::Queue’) OR (ACL_4.ObjectType =
‘RT::System’)) ORDER BY main.Name ASC
Generate an explain, pretty hard to guess what’s wrong.

<2008-07-17 13:42:06 PDT> LOG: duration: 9115.251 ms statement:
SELECT DISTINCT main.* FROM Tickets main CROSS JOIN Users Users_3 JOIN
Groups Groups_1 ON ( Groups_1.Domain = ‘RT::Ticket-Role’ ) AND (
Groups_1.Type = ‘Requestor’ ) AND ( Groups_1.Instance = main.id ) JOIN
CachedGroupMembers CachedGroupMembers_2 ON (
CachedGroupMembers_2.MemberId = Users_3.id ) AND (
CachedGroupMembers_2.GroupId = Groups_1.id ) WHERE (Users_3.id =
‘5698’) AND (main.Status != ‘deleted’) AND ( ( CachedGroupMembers_2.id
IS NOT NULL ) AND ( main.Status = ‘open’ OR main.Status = ‘new’ ) )
AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id) ORDER BY
main.Priority DESC LIMIT 10

This one is request for “More about user XXX” box on the main page of a ticket.
EXPLAIN …
CREATE INDEX RUZ_CGM1 ON CachedGroupMembers(MemberId, GroupId, Disabled);
EXPLAIN …

[snip]


The rt-users Archives

Community help: http://wiki.bestpractical.com
Commercial support: sales@bestpractical.com

Discover RT’s hidden secrets with RT Essentials from O’Reilly Media.
Buy a copy at http://rtbook.bestpractical.com

Best regards, Ruslan.

I’m not big expert on optimizing Pg queries using indexes. Below
you’ll find suggestions for some.

But please run EXPLAIN for each query before creating any index and
after. Without explains we can not help you precisely, just guessing.
Also, please use some real constants in EXPLAIN, random strings and
numbers will generate plans far from reality.

Thanks for the guidance Ruslan, I’ll see what I can find…

For what its worth, I believe this is a big performance hit on RT
3.6.x too, so I’d recommend fixing rather than downgrading. :slight_smile:

I witnessed this query with a high execution rate on RT 3.6.1.On 17/07/2008, at 1:28 PM, Jessie Bryan wrote:

Update-

Ok, after several hours of logging, what I see most common in the slow
query (1s) log is:

LOG: duration: 3151.786 ms statement: EXECUTE [PREPARE:
SELECT * FROM Groups WHERE LOWER(Domain) = LOWER($1) AND LOWER(Type)
= LOWER($2)]

Several hundred of these queries in the slow query log data.

For what its worth, I believe this is a big performance hit on RT
3.6.x too, so I’d recommend fixing rather than downgrading. :slight_smile:

I witnessed this query with a high execution rate on RT 3.6.1.

Hello,
We are running RT 3.6 with a database with over 190 thousand tickets
on PgSQL.
We added the following indexes to the databse:

CREATE INDEX groupstest1 ON Groups (lower(Domain));
CREATE INDEX groupstest2 ON Groups (lower(Type));
CREATE INDEX groupstest3 ON Groups
(lower(Domain),Instance,lower(Type),id, Name);
CREATE INDEX groumemberstest1 ON groupmembers (groupid);
CREATE INDEX groumemberstest2 ON groupmembers (memberid);
CREATE INDEX userstest1 ON Users (lower(name));

PS: You should change the INDEX name, I just copied this from the test
server.

Paulo F. Andrade
pauloandrade@ist.utl.pt