Redundant Postgres indexes

Hi All,

I’m in the middle of migrating from RT 3.0.10 to 3.2.2 as well as from
postgres 7.3 to 7.4. While doing this, I noticed that there appear to be
some indexes created that are not necessary. Specifically, according to
the Postgres docs
(http://www.postgresql.org/docs/7.4/interactive/indexes-multicolumn.html),
a multicolumn index on (a,b,c) can also be used by the query planner as
an index for a or (a,b). In addition, there’s an implicit index on the
primary key of any table. This leads to the following changes to
schema.Pg.

Remove:
CREATE INDEX Attachments1 ON Attachments (Parent); – another
index exists on (Parent,TransactionId)
CREATE INDEX CachedGroupMembers3 on CachedGroupMembers (GroupId);
– ditto on (GroupID,MemberID,disabled)
CREATE INDEX Users2 ON Users (Name); – there’s already a unique
index on this column
CREATE INDEX TicketCustomFieldValues2 ON TicketCustomFieldValues
(CustomField,Ticket); – another index exists on
(CustomField,Ticket,Content)

Change:
CREATE INDEX Users3 ON Users (id, EmailAddress);
CREATE INDEX Tickets5 ON Tickets (id, EffectiveId);
CREATE INDEX Tickets4 ON Tickets (id, Status);
Into:
CREATE INDEX Users3 ON Users (EmailAddress, id); – id available
via primary key index, and allows removal of index below.
CREATE INDEX Tickets5 ON Tickets (EffectiveId, id); – ditto
CREATE INDEX Tickets4 ON Tickets (Status, id); – id available via
primary key index, may as well also speed up anything that only refers
to status.

Remove:
CREATE INDEX Users4 ON Users (EmailAddress); – redundant with the
above column swaps
CREATE INDEX Tickets3 ON Tickets (EffectiveId); – ditto

I’m not sure what the performance impact is of maintaining these
indexes; it’s probably not that high.

So, do these changes make sense? Is there something I’m missing? (eg.
Are earlier postgres versions not as smart? Has someone found that
having the duplicate indexes makes things quicker despite what the docs
suggest?)

Cheers,

  • Andrew
    PS. There’s also a stray #-character in schema.Pg that needs to be
    nuked in the definition for the attributes table.

the Postgres docs
(http://www.postgresql.org/docs/7.4/interactive/indexes-multicolumn.html),
a multicolumn index on (a,b,c) can also be used by the query planner as
an index for a or (a,b). In addition, there’s an implicit index on the
primary key of any table. This leads to the following changes to
schema.Pg.

The general rule is that any leftwards subset of the index is also
usable in the index. So this is right, but you do have to be aware
of selectivity issues (sometimes a,b is selective enough to give you
an index scan but a from the index is not).

I’m not sure what the performance impact is of maintaining these
indexes; it’s probably not that high.

Multi-column indexes are pretty expensive to maintain at write time,
actually.

Andrew Sullivan | ajs@crankycanuck.ca
This work was visionary and imaginative, and goes to show that visionary
and imaginative work need not end up well.
–Dennis Ritchie

I’m in the middle of migrating from RT 3.0.10 to 3.2.2 as well as from
postgres 7.3 to 7.4. While doing this, I noticed that there appear to
be

See the list archives for my alternate set of indexes, which I came to
by profiling the queries RT actually does. There are some indexes for
which I see no evidence of ever being used.

Also you can use Pg’s own stat collector to see how often the various
indexes get used. That’s my next step to see if I need to remove more
indexes or adjust some.

smime.p7s (2.42 KB)