Postgres performance issues and some "odd" ways to get IDs

Matthew_Sullivan · August 14, 2006, 3:24pm

Hey All,

Preface: I’m a complete newb and l4m3r when it comes to DBs… however I
do have some very good friends who work as DB programmers… I am also
designing a new DB and learning lots… I am also running RT with around
12m rows…originally on MySQL, now on Postgres…

Performance on MySQL was ok until we broken the 10m row mark now it
sucks big time…

Performance on Postgres … not hot but very consistent and usable…

Performance on Postgres a week later (even with performing VACUUM FULL
ANALYSE) … sucks - not as bad as MySQL but not good… seems there are
a number of improvements that can be made … the first of which I
noticed when I remember the golden rule a colleague drummed into me when
he saw one of my tables like:

                                     Table "public.users"
    Column         |            Type             |

Modifiers
id | integer | not null default
nextval(‘users_id_seq’::text)

… he said “don’t do it, at best it’s bad practice, at worst it creates
race conditions”… when you try to do things like get the last insert
id… so when I was trying to get to he bottom of the doggyness in the
RT DB here I noticed a SELECT following each INSERT and that it was
taking average around 15 seconds and upto 45 seconds and seemed to be
the only slow queries… needless to say I was surprised when I saw what
it was doing (eg: SELECT id from Transactions where oid = …
further inspection lead me to DBIx::SearchBuilder::Handle::Pg::Insert()
… and the following:

sub Insert {
my $self = shift;
my $table = shift;

my $sth = $self->SUPER::Insert($table, @_ );

unless ($sth) {
        return ($sth);
}

#Lets get the id of that row we just inserted
my $oid = $sth->{'pg_oid_status'};
my $sql = "SELECT id FROM $table WHERE oid = ?";
my @row = $self->FetchResult($sql, $oid);
# TODO: Propagate Class::ReturnValue up here.
unless ($row[0]) {
        print STDERR "Can't find $table.id  for OID $oid";
        return(undef);
}
$self->{'id'} = $row[0];

return ($self->{'id'});

}

… is there any reason why you’re not SELECTing the next val from the
sequence and passing that to the insert and returning it … that way
you could remove the whole reliance on the OIDs in Postgres and improve
large DB performance (substantially in my case)…?

I figure as you are naming the sequences as _id_seq,
hardcoding it is as bad as hardcoding column the SELECT id FROM
$table… and it seems to be the only place that RT relies on OIDs…

Something like

SELECT nextval AS id from nextval(’$table_id_seq’::text);

will get the correct value of id before the insert - discard if the
insert fails … Postgres has a very large number for sequences I doubt
you’ll run out… Of course this code would be Pg specific as MySQL
still doesn’t support sequences properly … but that’s another story.

Comments/flames welcome…

Regards,

Mat

PS: It’s 1:15am here after 6 hours of database performance debugging,
apologies if the above offends, it is not meant to.

Jesse_Vincent · August 14, 2006, 4:30pm

Hey All,

the only slow queries… needless to say I was surprised when I saw what
it was doing (eg: SELECT id from Transactions where oid = …
further inspection lead me to DBIx::SearchBuilder::Handle::Pg::Insert()
… and the following:

Why don’t you try upgrading DBIx::SearchBuilder:

http://search.cpan.org/src/JESSE/DBIx-SearchBuilder-1.43/SearchBuilder/Handle/Pg.pm

sub Insert {
…
my $sql = “SELECT id FROM $table WHERE oid = ?”;
my @row = $self->FetchResult($sql, $oid);

TODO: Propagate Class::ReturnValue up here.

Comments/flames welcome…

Regards,

Mat

PS: It’s 1:15am here after 6 hours of database performance debugging,
apologies if the above offends, it is not meant to.

But maybe only after you get a bit of rest. Best luck. Please tell us if
this helps.

Jesse

Joby_Walker · August 14, 2006, 5:03pm

Jesse,

The new behavior seems very broken:

Insert row (sequence nextval returns x)
get current value of sequence (returns x)

As far as I can see SearchBuilder doesn’t wrap a transaction around the
Insert sub so you could end up with unpredictable results:

Process 1: Insert row (sequence nextval returns x)
Process 2: Insert row (sequence nextval returns x+1)
Process 1: get current value of sequence (x+1)
Process 2: get current value of sequence (x+1)

Wrapping a transaction around it should solve the problem, but you would
be adding transaction overhead for no gain. What Matt wrote is the best
behavior (it is also what other DB object models use such as PHP’s Propel):

get nextval of sequence (x)
insert row with id of x.

This cannot be munged by multiple processes:

Process 1: get nextval of sequence (x)
Process 2: get nextval of sequence (x+1)
Process 1: insert row (id = x)
Process 2: insert row (id = x+1)

While it is unlikely that the SearchBuilder behavior would actually end
up with a wrong result, the probability is non-zero. Very little code
change would be necessary (see attached – and I haven’t tested
this…). The only downside would be a gap in the sequence when the
insert failed.

Joby Walker
C&C SSG, University of Washington

Jesse Vincent wrote:

Pg.pm.diff (1.16 KB)

Andrew_Sullivan · August 14, 2006, 5:14pm

Jesse,

The new behavior seems very broken:

Insert row (sequence nextval returns x)

get current value of sequence (returns x)

As far as I can see SearchBuilder doesn’t wrap a transaction around the
Insert sub so you could end up with unpredictable results:

No, you can’t, even without explicit BEGIN; . . . COMMIT; Please
see the Postgres docs on nextval() an currval(). But the basic thing
is that currval() is private to your connection, not to your
transaction.

A

Andrew Sullivan | ajs@crankycanuck.ca
Information security isn’t a technological problem. It’s an economics
problem.
–Bruce Schneier

Andrew_Sullivan · August 14, 2006, 4:53pm

it was doing (eg: SELECT id from Transactions where oid = …

I’m pretty sure SearchBuilder was fixed of that mistake some time
ago. (You can also probably solve this by indexing the oid column,
but I strongly suggest against that). Of course, this suggests
you’re not using the latest Postgres, because it no longer creates
tables with oid by default; so this query would just fail. I
strongly suggest you upgrade to 8.1 if you can, because it has a
number of performance improvements. And you need the SearchBuilder
upgrade for sure.

… is there any reason why you’re not SELECTing the next val from the
sequence and passing that to the insert and returning it … that way

Better in Postgres to do the INSERT and then select currval() on the
sequence. No, there’s no race condition on this; check the
PostgreSQL docs to see why.

A

Andrew Sullivan | ajs@crankycanuck.ca
A certain description of men are for getting out of debt, yet are
against all taxes for raising money to pay it off.
–Alexander Hamilton

Joby_Walker · August 14, 2006, 5:38pm

Andrew Sullivan wrote:> On Mon, Aug 14, 2006 at 10:03:36AM -0700, Joby Walker wrote:

Jesse,

The new behavior seems very broken:

Insert row (sequence nextval returns x)

get current value of sequence (returns x)

As far as I can see SearchBuilder doesn’t wrap a transaction around the
Insert sub so you could end up with unpredictable results:

No, you can’t, even without explicit BEGIN; . . . COMMIT; Please
see the Postgres docs on nextval() an currval(). But the basic thing
is that currval() is private to your connection, not to your
transaction.

A

Your absolutely right. Thanks for correcting me! Looks like currval is
safer than I thought…

Matthew_Sullivan · August 14, 2006, 10:52pm

Jesse Vincent wrote:

Hey All,

the only slow queries… needless to say I was surprised when I saw what
it was doing (eg: SELECT id from Transactions where oid = …
further inspection lead me to DBIx::SearchBuilder::Handle::Pg::Insert()
… and the following:

Why don’t you try upgrading DBIx::SearchBuilder:

SearchBuilder/Handle/Pg.pm - metacpan.org

sub Insert {

…

my $sql = “SELECT id FROM $table WHERE oid = ?”;
my @row = $self->FetchResult($sql, $oid);

TODO: Propagate Class::ReturnValue up here.

Comments/flames welcome…

Regards,

Mat

PS: It’s 1:15am here after 6 hours of database performance debugging,
apologies if the above offends, it is not meant to.

But maybe only after you get a bit of rest. Best luck. Please tell us if
this helps.

My apologies - yes I should have got some sleep … helps to actually
install the newly build module before checking the live code and getting
cranky about it. After actually running the install it looks
considerably more sane

/ Mat