Strange custom field performance issue after upgrading to RT 3.6.4

We are in the process of upgrading our RT installation from 3.4.2 (to 3.6.4).
As part of the upgrade, we were planning to restructure RT to allow additional
groups within our lab use RT (previously we just used it within our own group).
Part of that plan involved moving from having multiple queues for our group to
using just a single queue and using a custom field to store what was previously
the queue name. I performed that change with no problem – however, performance
on SOME queries that include custom fields is so slow that it is unusable.

Here is the system background:
RT 3.6.4
RHEL 5 (with all updates through last week)
Postgres 8.1.9-1.el5
Apache 2.2.3.7.el5
mod_perl 2.0.2-6.3.el5
SearchBuilder 1.49

The system has only about 7000 tickets and about 100 or so users (only about 10
have rights to RT).

The following query takes something like 165 seconds and returns three records:

SELECT DISTINCT main.* FROM Tickets main JOIN ObjectCustomFields ObjectCustomFields_1
ON ( ObjectCustomFields_1.ObjectId = ‘0’ ) OR ( ObjectCustomFields_1.ObjectId = main.Queue )
JOIN CustomFields CustomFields_2 ON ( CustomFields_2.id = ObjectCustomFields_1.CustomField )
JOIN ObjectCustomFieldValues ObjectCustomFieldValues_3 ON ( ObjectCustomFieldValues_3.ObjectId = main.id )
AND ( ObjectCustomFieldValues_3.Disabled = ‘0’ )
AND ( ObjectCustomFieldValues_3.ObjectType = ‘RT::Ticket’ )
AND ( ObjectCustomFieldValues_3.CustomField = CustomFields_2.id )
WHERE (CustomFields_2.Name = ‘Category’) AND (main.Status != ‘deleted’) AND (main.Owner = ‘368’
AND ( main.Status = ‘new’ OR main.Status = ‘open’ OR main.Status = ‘stalled’ )
AND main.Queue = ‘5’ AND ( ObjectCustomFieldValues_3.Content = ‘General’ ) )
AND (main.Type = ‘ticket’) AND (main.EffectiveId = main.id)
ORDER BY main.id ASC

Here is the Postgres explain plan output:

QUERY PLAN
Unique (cost=17.16…17.23 rows=1 width=620) (actual time=165267.134…165267.161 rows=3 loops=1)
→ Sort (cost=17.16…17.17 rows=1 width=620) (actual time=165267.127…165267.133 rows=3 loops=1)
Sort Key: main.id, main.effectiveid, main.queue, main.“type”, main.issuestatement, main.resolution, main.“owner”, main.subject, main.initialpriority, main.finalpriority, main.priority, main.timeestimated, main.timeworked, main.status, main.timeleft, main.told, main.starts, main.started, main.due, main.resolved, main.lastupdatedby, main.lastupdated, main.creator, main.created, main.disabled
→ Nested Loop (cost=5.02…17.15 rows=1 width=620) (actual time=32924.858…165266.929 rows=3 loops=1)
Join Filter: (((“outer”.objectid = 0) OR (“outer”.objectid = “inner”.queue)) AND (“outer”.objectid = “inner”.id))
→ Nested Loop (cost=0.00…8.09 rows=1 width=8) (actual time=0.124…555.995 rows=53712 loops=1)
→ Nested Loop (cost=0.00…2.07 rows=1 width=12) (actual time=0.032…0.180 rows=9 loops=1)
Join Filter: (“outer”.id = “inner”.customfield)
→ Seq Scan on customfields customfields_2 (cost=0.00…1.02 rows=1 width=4) (actual time=0.018…0.036 rows=1 loops=1)
Filter: ((name)::text = ‘Category’::text)
→ Seq Scan on objectcustomfields objectcustomfields_1 (cost=0.00…1.02 rows=2 width=8) (actual time=0.005…0.063 rows=17 loops=1)
→ Index Scan using objectcustomfieldvalues2 on objectcustomfieldvalues objectcustomfieldvalues_3 (cost=0.00…6.00 rows=1 width=8) (actual time=0.044…35.073 rows=5968 loops=9)
Index Cond: ((objectcustomfieldvalues_3.customfield = “outer”.id) AND ((objectcustomfieldvalues_3.objecttype)::text = ‘RT::Ticket’::text))
Filter: ((disabled = 0) AND ((content)::text = ‘General’::text))
→ Bitmap Heap Scan on tickets main (cost=5.02…9.05 rows=1 width=620) (actual time=3.002…3.052 rows=3 loops=53712)
Recheck Cond: ((“owner” = 368) AND (queue = 5))
Filter: (((status)::text <> ‘deleted’::text) AND (((status)::text = ‘new’::text) OR ((status)::text = ‘open’::text) OR ((status)::text = ‘stalled’::text)) AND ((“type”)::text = ‘ticket’::text) AND (effectiveid = id))
→ BitmapAnd (cost=5.02…5.02 rows=1 width=0) (actual time=2.513…2.513 rows=0 loops=53712)
→ Bitmap Index Scan on tickets2 (cost=0.00…2.39 rows=110 width=0) (actual time=0.148…0.148 rows=910 loops=53712)
Index Cond: (“owner” = 368)
→ Bitmap Index Scan on tickets1 (cost=0.00…2.39 rows=110 width=0) (actual time=2.325…2.325 rows=19050 loops=53712)
Index Cond: (queue = 5)
Total runtime: 165267.372 ms
(23 rows)

At this point, I think this could be a Postgres bug. If I change the part of the query that
specifies the custom field (the “ObjectCustomFieldValues_3.Content = ‘General’”) to instead
EXCLUDE all non-matching fields, the query executes almost immediately – e.g. using something
like:
(ObjectCustomFieldValues_3.Content != ‘FY07’ AND
ObjectCustomFieldValues_3.Content != ‘Upgrades’ AND
[etc.])

Should I simply try upgrading Postgres, or does anyone have any ideas about what
the REAL problem might be?

Thanks,
Matt Goheen

We are in the process of upgrading our RT installation from 3.4.2
(to 3.6.4).
As part of the upgrade, we were planning to restructure RT to allow
additional
groups within our lab use RT (previously we just used it within our
own group).
Part of that plan involved moving from having multiple queues for
our group to
using just a single queue and using a custom field to store what
was previously
the queue name. I performed that change with no problem –
however, performance
on SOME queries that include cust

Have you done a “FULL VACUUM ANALYZE” recently? It looks like the
statistics engine is…confused. That’s the best way to reset it.

PGP.sig (186 Bytes)

Have you done a “FULL VACUUM ANALYZE” recently? It looks like the
statistics engine is…confused. That’s the best way to reset it.

I didn’t think that would help as the data was just imported a week ago, and
almost nothing has been done to the database since.

However, this morning I tried upgrading to the latest Postgres (8.2.5) and
that totally fixed the problem. I will try going back to the old database
just to ensure that the “FULL VACUUM ANALYZE” will NOT fix it, but at this
point I’m fairly certain there must have been some sort of database issue
in the Postgres that ships with RHEL5.

I’ll send a follow-up with the final result of that test.

- Matt

Well, I take that back – I guess I should have tried the "vacuum full analyze"
first, as that ALSO fixed the problem in the original (RHEL5 Postgres) database.

I guess I’ll go back to the old database version.

I don’t know why the NEW Postgres could import the data just fine and work
well whereas the old version had problems…

- Matt