RT 3.8.9 + PostgreSQL 8.4.7 - missing/bad indexes

Vaclav_Ovsik · April 14, 2011, 1:26pm

Hi,
because of horrible performance of spreadsheet export of tickets I found
some badly created or missing indexes. I didn’t investigate if other
RDBMS schemas has the same problem.
To discover problems I rewrote Results.tsv into command-line shape and
run it with DBI_TRACE=2 finally.

For every row of Tickets the following additional problematic queries are
executed:

SELECT * FROM Groups WHERE Instance = ? AND LOWER(Domain) = LOWER(?) AND LOWER(Type) = LOWER(?)

e.g. one my bind params: 106431 ‘RT::Ticket-Role’ ‘Requestor’

rt=# EXPLAIN ANALYZE SELECT * FROM Groups WHERE Instance = 106431 AND LOWER(Domain) = LOWER(‘RT::Ticket-Role’) AND LOWER(Type) = LOWER(‘Requestor’);
QUERY PLAN
Seq Scan on groups (cost=0.00…12925.34 rows=1 width=66) (actual time=64.672…64.904 rows=1 loops=1)
Filter: ((instance = 106431) AND (lower((domain)::text) = ‘rt::ticket-role’::text) AND (lower((type)::text) = ‘requestor’::text))
Total runtime: 64.936 ms
(3 rows)

Existing indexes:
CREATE UNIQUE INDEX Groups1 ON Groups (Domain,Instance,Type,id, Name);
CREATE INDEX Groups2 On Groups (Type, Instance, Domain);

Both indexes above are case sensitive. Either problematic query should be
case sensitive (without lower()) or a new index with lower() on fields should
be created:

CREATE INDEX Groups_zito1 ON Groups (lower(Type), Instance, lower(Domain));

rt=# EXPLAIN ANALYZE SELECT * FROM Groups WHERE Instance = 106431 AND LOWER(Domain) = LOWER(‘RT::Ticket-Role’) AND LOWER(Type) = LOWER(‘Requestor’);
QUERY PLAN
Index Scan using groups_zito1 on groups (cost=0.01…8.38 rows=1 width=66) (actual time=0.084…0.084 rows=1 loops=1)
Index Cond: ((lower((type)::text) = ‘requestor’::text) AND (instance = 106431) AND (lower((domain)::text) = ‘rt::ticket-role’::text))
Total runtime: 0.108 ms
(3 rows)

SELECT main.* FROM ObjectCustomFieldValues main WHERE (main.Disabled = ‘0’) AND (main.ObjectType = ‘RT::Ticket’) AND (main.ObjectId = ‘106431’)

rt=# EXPLAIN ANALYZE SELECT main.* FROM ObjectCustomFieldValues main WHERE (main.Disabled = ‘0’) AND (main.ObjectType = ‘RT::Ticket’) AND (main.ObjectId = ‘106431’);
QUERY PLAN
Seq Scan on objectcustomfieldvalues main (cost=0.00…12298.75 rows=3 width=457) (actual time=142.497…142.887 rows=3 loops=1)
Filter: ((disabled = 0) AND ((objecttype)::text = ‘RT::Ticket’::text) AND (objectid = 106431))
Total runtime: 142.924 ms
(3 rows)

Existing indexes:
CREATE INDEX ObjectCustomFieldValues1 ON ObjectCustomFieldValues (CustomField,ObjectType,ObjectId,Content);
CREATE INDEX ObjectCustomFieldValues2 ON ObjectCustomFieldValues (CustomField,ObjectType,ObjectId);

The above query selects based on fields ObjectType and ObjectId,
unfortunately ObjectCustomFieldValues2 contains additional field
CustomField. To speedup the search index with good selectivity
with only field ObjectId can suffice.

CREATE INDEX ObjectCustomFieldValues_zito1 ON ObjectCustomFieldValues (ObjectId);

rt=# EXPLAIN ANALYZE SELECT main.* FROM ObjectCustomFieldValues main WHERE (main.Disabled = ‘0’) AND (main.ObjectType = ‘RT::Ticket’) AND (main.ObjectId = ‘106431’);
QUERY PLAN
Index Scan using objectcustomfieldvalues_zito1 on objectcustomfieldvalues main (cost=0.00…8.58 rows=3 width=457) (actual time=0.059…0.063 rows=3 loops=1)
Index Cond: (objectid = 106431)
Filter: ((disabled = 0) AND ((objecttype)::text = ‘RT::Ticket’::text))
Total runtime: 0.090 ms
(4 rows)

or index with two corresponding fields:
CREATE INDEX ObjectCustomFieldValues_zito2 ON ObjectCustomFieldValues (ObjectType, ObjectId);
My planner uses always objectcustomfieldvalues_zito1, because I have
only one ObjectType value:

rt=# select distinct objecttype from objectcustomfieldvalues;
objecttype
RT::Ticket
(1 row)

And performance benefit?

Original state, without a new indexes:

zito@rt2:~/migration$ time ./rt-search2 /opt/RT >/dev/null 2>&1

real 3m12.235s
user 0m11.713s
sys 0m0.576s

After creation of the indexes:
CREATE INDEX Groups_zito1 ON Groups (lower(Type), Instance, lower(Domain));
CREATE INDEX ObjectCustomFieldValues_zito1 ON ObjectCustomFieldValues (ObjectId);

zito@rt2:~/migration$ time ./rt-search2 /opt/RT >/dev/null 2>&1

real 0m14.171s
user 0m9.545s
sys 0m0.456s

That is speed up by a factor of circa 14 in my case.

A testing script rt-search2 attached (rewrite of Results.tsv). A TSQL
query at the start of the script should be changed to something
appropriate of course.

Can be new indexes added into schema.Pg dist file?
Thanks
Best Regards
Zito

rt-search2 (2.98 KB)

Jesse_Vincent · April 15, 2011, 2:09am

Hi,
because of horrible performance of spreadsheet export of tickets I found
some badly created or missing indexes. I didn’t investigate if other
RDBMS schemas has the same problem.
To discover problems I rewrote Results.tsv into command-line shape and
run it with DBI_TRACE=2 finally.

For every row of Tickets the following additional problematic queries are
executed:

########################################################################

SELECT * FROM Groups WHERE Instance = ? AND LOWER(Domain) = LOWER(?) AND LOWER(Type) = LOWER(?)

e.g. one my bind params: 106431 ‘RT::Ticket-Role’ ‘Requestor’

rt=# EXPLAIN ANALYZE SELECT * FROM Groups WHERE Instance = 106431 AND LOWER(Domain) = LOWER(‘RT::Ticket-Role’) AND LOWER(Type) = LOWER(‘Requestor’);
QUERY PLAN

I’d actually consider this a code bug. We shouldn’t be LOWER()ing either
of those parameters. Please open a ticket by mailing
rt-bugs@bestpractical.com

Thanks!

Dominic_Hargreaves · April 15, 2011, 9:01am

SELECT * FROM Groups WHERE Instance = ? AND LOWER(Domain) = LOWER(?) AND LOWER(Type) = LOWER(?)

e.g. one my bind params: 106431 ‘RT::Ticket-Role’ ‘Requestor’

rt=# EXPLAIN ANALYZE SELECT * FROM Groups WHERE Instance = 106431 AND LOWER(Domain) = LOWER(‘RT::Ticket-Role’) AND LOWER(Type) = LOWER(‘Requestor’);
QUERY PLAN

I’d actually consider this a code bug. We shouldn’t be LOWER()ing either
of those parameters. Please open a ticket by mailing
rt-bugs@bestpractical.com

There are already a couple of related bugs:

http://issues.bestpractical.com/Ticket/Display.html?id=8568
http://issues.bestpractical.com/Ticket/Display.html?id=13056

(linked from
http://requesttracker.wikia.com/wiki/DatabaseIndexes where I
documented the same index change that I found useful).

More in the thread starting
http://lists.bestpractical.com/pipermail/rt-users/2009-November/062329.html

Dominic Hargreaves, Systems Development and Support Team
Computing Services, University of Oxford

signature.asc (198 Bytes)

Vaclav_Ovsik · April 15, 2011, 9:07am

…
I’d actually consider this a code bug. We shouldn’t be LOWER()ing either
of those parameters. Please open a ticket by mailing
rt-bugs@bestpractical.com

I did it. It have received ticket [rt3 #17121].

About the second problem - selecting from ObjectCustomFieldValues - I’m
not certain the adding a new index is the best solution.
I have red about multicolumn indexes in Pg docs yesterday and found for
me, that multicolumn indexes can be used for lower number of fields, but
these must be the leftmost fields. So we have already:

CREATE INDEX ObjectCustomFieldValues1 ON ObjectCustomFieldValues (CustomField,ObjectType,ObjectId,Content);
CREATE INDEX ObjectCustomFieldValues2 ON ObjectCustomFieldValues (CustomField,ObjectType,ObjectId);

So instead of adding a new index I did
CREATE INDEX ObjectCustomFieldValues_zito1 ON ObjectCustomFieldValues (ObjectId);
alternative approach can be to reorder index columns of ObjectCustomFieldValues2?
It depends on a shape of every other query on ObjectCustomFieldValues and
decision can be difficult (analyze of all possible query shapes).

Should I send it to rt-bugs also?

Best Regards
Zito

Kenneth_Marshall · April 15, 2011, 12:44pm

Hi,
because of horrible performance of spreadsheet export of tickets I found
some badly created or missing indexes. I didn’t investigate if other
RDBMS schemas has the same problem.
To discover problems I rewrote Results.tsv into command-line shape and
run it with DBI_TRACE=2 finally.

For every row of Tickets the following additional problematic queries are
executed:

########################################################################

SELECT * FROM Groups WHERE Instance = ? AND LOWER(Domain) = LOWER(?) AND LOWER(Type) = LOWER(?)

e.g. one my bind params: 106431 ‘RT::Ticket-Role’ ‘Requestor’

rt=# EXPLAIN ANALYZE SELECT * FROM Groups WHERE Instance = 106431 AND LOWER(Domain) = LOWER(‘RT::Ticket-Role’) AND LOWER(Type) = LOWER(‘Requestor’);
QUERY PLAN

I’d actually consider this a code bug. We shouldn’t be LOWER()ing either
of those parameters. Please open a ticket by mailing
rt-bugs@bestpractical.com

Thanks!

Okay, but does removing the LOWER()'s make the product harder to use?
In other words, when writing a Scrip that uses Type would it now be
case sensitive? i.e. Were AdminCc and admincc both acceptable? Some
of my most painful bug hunts were case-only related. Maybe the RT
code already handles that. My two cents.

Ken

Jesse_Vincent · April 15, 2011, 12:48pm

Okay, but does removing the LOWER()'s make the product harder to use?
In other words, when writing a Scrip that uses Type would it now be
case sensitive? i.e. Were AdminCc and admincc both acceptable? Some
of my most painful bug hunts were case-only related. Maybe the RT
code already handles that. My two cents.

Those bits are internals and the API always drops the content into the
database in one specific way. If you have your own code that’s touching
Type or Domain, you’ve already voided your warranty, so to speak. The
fact that there’s case-smashing on that search is a straight-up bug.

Kevin_Falcone · April 15, 2011, 12:52pm

Hi,
because of horrible performance of spreadsheet export of tickets I found
some badly created or missing indexes. I didn’t investigate if other
RDBMS schemas has the same problem.
To discover problems I rewrote Results.tsv into command-line shape and
run it with DBI_TRACE=2 finally.

For every row of Tickets the following additional problematic queries are
executed:

########################################################################

SELECT * FROM Groups WHERE Instance = ? AND LOWER(Domain) = LOWER(?) AND LOWER(Type) = LOWER(?)

e.g. one my bind params: 106431 ‘RT::Ticket-Role’ ‘Requestor’

rt=# EXPLAIN ANALYZE SELECT * FROM Groups WHERE Instance = 106431 AND LOWER(Domain) = LOWER(‘RT::Ticket-Role’) AND LOWER(Type) = LOWER(‘Requestor’);
QUERY PLAN

I’d actually consider this a code bug. We shouldn’t be LOWER()ing either
of those parameters. Please open a ticket by mailing
rt-bugs@bestpractical.com

Okay, but does removing the LOWER()'s make the product harder to use?
In other words, when writing a Scrip that uses Type would it now be
case sensitive? i.e. Were AdminCc and admincc both acceptable? Some
of my most painful bug hunts were case-only related. Maybe the RT
code already handles that. My two cents.

I’ve also been bitten by case-sensitive problems, but in this
particular query, RT inserts the Domain and Type from static internal
strings and builds queries against them in a similar way and you
should never be able to end up with a lowercased version in the DB.
should, famous last words, etc.

-kevin

Ruslan_Zakirov1 · April 15, 2011, 10:23pm

SELECT * FROM Groups WHERE Instance = ? AND LOWER(Domain) = LOWER(?) AND LOWER(Type) = LOWER(?)

e.g. one my bind params: 106431 ‘RT::Ticket-Role’ ‘Requestor’

rt=# EXPLAIN ANALYZE SELECT * FROM Groups WHERE Instance = 106431 AND LOWER(Domain) = LOWER(‘RT::Ticket-Role’) AND LOWER(Type) = LOWER(‘Requestor’);
QUERY PLAN

I’d actually consider this a code bug. We shouldn’t be LOWER()ing either
of those parameters. Please open a ticket by mailing
rt-bugs@bestpractical.com

There are already a couple of related bugs:

http://issues.bestpractical.com/Ticket/Display.html?id=8568
http://issues.bestpractical.com/Ticket/Display.html?id=13056

Updated tickets. Linked them. You can find more tickets over there.

(linked from
http://requesttracker.wikia.com/wiki/DatabaseIndexes where I
documented the same index change that I found useful).

Vivek Khera posted his list long time ago and still uses it:
http://issues.bestpractical.com/Ticket/Display.html?id=6059

May be it’s on the wiki too, if it’s not then may be you can merge it.

As well I’ve done some review and created a code branch for that:

As you can see from changes some indexes goes into 4.2, but some are not there.

More in the thread starting
http://lists.bestpractical.com/pipermail/rt-users/2009-November/062329.html

–
Dominic Hargreaves, Systems Development and Support Team
Computing Services, University of Oxford

Best regards, Ruslan.