Another sql improvement recommendation for rt3.4.0

Another point regarding slow database access and RT 3.4.0rc1…

I’ve been going through the queries being run, and I think I can
make one simple recommendation to improve the performance of
RT significantly (having spent a fair bit of time with various
flavours of databases over the years).

If possible, avoid using functions on a column in a query.

ie: select * from users where lower(email)=‘abc@bar.com’;

In all the databases I’ve used, the optimizer will not use
any indexes associated with that column as soon as you wrap
it in a function and will do a full table scan.

In the above example, an index defined as:

“create index on users(email)” would not be used for that
query.

There are ways around this (ie: Oracle’s function-based indexes,
but I don’t know if that’s supported by all databases and that’s
a bit of band-aid solution).

The better way is to store the data in a known format (all lowercase
in this example) and do something like this instead:

select * from users where email=lower(‘abc@bar.com’);

I point this out because loading up a ticket used this query:

SELECT DISTINCT main.*
FROM Tickets main, Groups Groups_1, CachedGroupMembers CachedGroupMembers_2,
Users Users_3
WHERE (CachedGroupMembers_2.MemberId = Users_3.id) AND
(Groups_1.id = CachedGroupMembers_2.GroupId) AND
(main.EffectiveId = main.id) AND
(main.Status != ‘deleted’) AND
(main.id = Groups_1.Instance) AND
(((LOWER(Users_3.EmailAddress) = ‘user@abc.com’) AND
(Groups_1.Type = ‘Requestor’)) AND
((main.Status =‘new’) OR (main.Status =‘open’)))
ORDER BY main.Priority DESC LIMIT 10;

It took close to 2 seconds to run on my database (27k users).

Remove the ‘LOWER(Users_3.EmailAddress)’ and the query takes < 10ms because
it uses all the indexes.

David Kerry

I’ve been going through the queries being run, and I think I can
make one simple recommendation to improve the performance of
RT significantly (having spent a fair bit of time with various
flavours of databases over the years).

If possible, avoid using functions on a column in a query.

If you look a bit deeper, you’ll see that that’s being done by
SearchBuilder when a case-insensitive search is called for on a database
that doesn’t natively support case-insensitive queries.

In all the databases I’ve used, the optimizer will not use
any indexes associated with that column as soon as you wrap
it in a function and will do a full table scan.

Postgres and Oracle both support functional indexes.

The better way is to store the data in a known format (all lowercase
in this example) and do something like this instead:

Except that would mean storing usernames in all lowercase, as well as
all correspondence related to a ticket.

I point this out because loading up a ticket used this query:

That query sure looks like a search for all tickets with user@abc.com,
not loading a single ticket…What happens if you add a functional
index?

If possible, avoid using functions on a column in a query.

If you look a bit deeper, you’ll see that that’s being done by
SearchBuilder when a case-insensitive search is called for on a database
that doesn’t natively support case-insensitive queries.

Ah… ok.

In all the databases I’ve used, the optimizer will not use
any indexes associated with that column as soon as you wrap
it in a function and will do a full table scan.

Postgres and Oracle both support functional indexes.

True - but it’s nice to avoid such things if they’re not really
necessary.

The better way is to store the data in a known format (all lowercase
in this example) and do something like this instead:

Except that would mean storing usernames in all lowercase, as well as
all correspondence related to a ticket.

I didn’t mean all columns had to be done this way, just cases where
it can be done easily. For example - I see no reason to not store
user’s email addresses in all-lowercase. Email addresses are naturally
case-insensitive anyways, so no problems there. That would remove
the one slow query I found already.

I point this out because loading up a ticket used this query:

That query sure looks like a search for all tickets with user@abc.com,
not loading a single ticket…What happens if you add a functional
index?

This was for loading up a single ticket - I presume it was looking for
any other tickets created by the same user to list in one of the related
ticket display boxes.

For what it’s worth, adding a function based index didn’t seem to make
any difference in postgres.

ie: create index users5dk on users(lower(emailaddress));

Query time was the same (2-4 seconds) with or without this index.

I’ve also found another one:

SELECT * FROM Groups WHERE LOWER(Domain) = ‘systeminternal’ AND LOWER(Type) = ‘privileged’;

500ms (with LOWER()) vs 36 ms (without LOWER()).

This seems to be an internal query - RT should know the values in its
own internal tables, no? Why the case-insensitive search for this?

If the lower() is being added to columns automatically, I think
this is probably a major cause of some of the RT slowness. Overzealous
use of that is going to kill your performance (unless you make all your
indexes case-insensitive function-based indexes, that is).

Also, note that Oracle (haven’t tried postgres) won’t allow you to index
a column more than once (with a single-column index), so you either have
a case-sensitive index (lower()), or a case-insensitive one, but you
can’t have both, so you either have to do all your queries with a matching
lower() or none at all (or as few as possible).

Anyways… I’m just pointing out that you can get some major performance
boosts out of RT if you fine tune some of the queries a bit.

David Kerry

Except that would mean storing usernames in all lowercase, as well as
all correspondence related to a ticket.

I didn’t mean all columns had to be done this way, just cases where
it can be done easily. For example - I see no reason to not store
user’s email addresses in all-lowercase. Email addresses are naturally
case-insensitive anyways, so no problems there. That would remove
the one slow query I found already.

No, they’re not. You may not use a mail system that’s case sensitive.
But email address local parts are, at the option of the local system,
case sensitive. From RFC 2821:

Mailbox = Local-part "@" Domain

 	 Local-part = Dot-string / Quoted-string ; MAY be case-sensitive

I’ve also found another one:

SELECT * FROM Groups WHERE LOWER(Domain) = ‘systeminternal’ AND LOWER(Type) = ‘privileged’;

500ms (with LOWER()) vs 36 ms (without LOWER()).

This seems to be an internal query - RT should know the values in its
own internal tables, no? Why the case-insensitive search for this?

Likely it needs to be hinted. Open a ticket, ideally with a patch that
adds CASESENITIVE => 1, to the calls to seachbuilder->Limit.

Postgres and Oracle both support functional indexes.

True - but it’s nice to avoid such things if they’re not really
necessary.

Why? First, they are necessary in this case (although I confess
that I have the distinct impression some cases are because of MySQL’s
non-standard understanding of “=” in SQL). But second, why would
anyone implement functional indexes when they’re not really
necessary?

A

Andrew Sullivan | ajs@crankycanuck.ca
A certain description of men are for getting out of debt, yet are
against all taxes for raising money to pay it off.
–Alexander Hamilton

Umm, are you seriously asking why a software company would add a
feature to their software that may or may not really be needed??

Blink Blink

Matt

Umm, are you seriously asking why a software company would add a
feature to their software that may or may not really be needed??

Given that PostgreSQL is not “a software company”, yes, I am asking
that.

A

Andrew Sullivan | ajs@crankycanuck.ca
Information security isn’t a technological problem. It’s an economics
problem.
–Bruce Schneier