Slow ticket loading

Alex_Vandiver · May 13, 2014, 7:07pm

[snip]

This message was rejected from rt-users because it was over 1M of
logfiles. Please don’t send large messages to the list; you’re welcome
to provide a link to complete logs for those interested in them, of course.

Here’s the result of what RT4 does to the DB (note: RT3.8 does NOT
suffer the same problems - as am running that and just trying to
upgrade… Also note: RT 4.0.19 on Apache 2.2 with ModPerl2 running
against some very fast PostgreSQL 8.4.10 DBs (cluster of 4 servers) -
round trip to them is around 25-50ms)… the query log for a SINGLE
ticket follows…

This only occurs on the first request; DBIx::SearchBuilder calls
DBI->column_info to determine the set of columns for each RT table, to
be able to use them in the GROUP BY. It then caches the information;
none of the queries to pg_constraints should happen after the first request.

There are two fixes to this; one is to find a way to not call
DBIx::SearchBuilder->Fields, which is what calls DBI->column_info, which
causes the slow queries. RT is currently using the set of columns so
obtained to pass to GROUP BY to perform both a DISTINCT and ORDER BY; Pg
9.1 is smart enough to know that grouping by the primary key is enough
to imply the remaining columns as well[1]. As such, the attached patch
(which has only been briefly tested) may suffice.

This is only slightly papering around the problem, however, as any later
call to ->Fields will incur the same cost. RT does not currently
contain any such calls, so I expect it to be sufficient. A more
robust solution would be to call ->Fields at server initialization time,
before the first request. This would require also modifying
DBIx::SearchBuilder::Handle to cache the information by DSN, as it
currently only lasts until the dbh gets replaced – which happens at
server startup.

This has to be a bug of RT4 or of the DBIx modules it uses so cross
posting to rt-bugs@ as well…

Please don’t cross-post to rt-bugs@; it leads to duplicate tickets.
Once discussion has achieved consensus on the bug is the right time to
create a bug ticket.

Alex

[1] PostgreSQL: Documentation: 9.1: Release 9.1

0001-We-can-omit-the-heavy-on-Pg-Fields-call-on-9.1-and-a.patch (1.41 KB)

Alex_Vandiver · May 13, 2014, 7:28pm

[snip] As such, the attached patch (which has only been briefly tested) may suffice.

Updated version, with one more close paren, attached.

Alex

0001-We-can-omit-the-heavy-on-Pg-Fields-call-on-9.1-and-a.patch (1.41 KB)

Michelle_Sullivan · May 14, 2014, 3:17am

Alex Vandiver wrote:

[snip]

This message was rejected from rt-users because it was over 1M of
logfiles. Please don’t send large messages to the list; you’re welcome
to provide a link to complete logs for those interested in them, of course.

I actually didn’t know the size (though I knew it might be bigger than
acceptable by the list software - figured some intelligent person would
choose to deal or allow it )

Here’s the result of what RT4 does to the DB (note: RT3.8 does NOT
suffer the same problems - as am running that and just trying to
upgrade… Also note: RT 4.0.19 on Apache 2.2 with ModPerl2 running
against some very fast PostgreSQL 8.4.10 DBs (cluster of 4 servers) -
round trip to them is around 25-50ms)… the query log for a SINGLE
ticket follows…

This only occurs on the first request; DBIx::SearchBuilder calls
DBI->column_info to determine the set of columns for each RT table, to
be able to use them in the GROUP BY. It then caches the information;
none of the queries to pg_constraints should happen after the first request.

Yeah - i’m seeing this every time a ticket is loaded - so RT/DBIx is not
caching it… and that’s probably because the IDs are different - so
even pgpool/memcached setup can’t cache it…

There are two fixes to this; one is to find a way to not call
DBIx::SearchBuilder->Fields, which is what calls DBI->column_info, which
causes the slow queries. RT is currently using the set of columns so
obtained to pass to GROUP BY to perform both a DISTINCT and ORDER BY; Pg
9.1 is smart enough to know that grouping by the primary key is enough
to imply the remaining columns as well[1]. As such, the attached patch
(which has only been briefly tested) may suffice.

I’m on 8.4 will that work as well, or should I not bother? (I can’t
upgrade to 9.1 yet, as much as I want to - the software on the system is
in the hands of the Operations team and controlled by puppet so 8.x →
9.x is impossible without corrupting everything or Operations input).

This is only slightly papering around the problem, however, as any later
call to ->Fields will incur the same cost. RT does not currently
contain any such calls, so I expect it to be sufficient. A more
robust solution would be to call ->Fields at server initialization time,
before the first request.
That sounds sane (calling at init) … as for the calls to ->Fields I
think you do call it in the custom fields code… because that’s where I
first saw this issue 6 months ago (and it was dismissed.)
This would require also modifying
DBIx::SearchBuilder::Handle to cache the information by DSN, as it
currently only lasts until the dbh gets replaced – which happens at
server startup.

This has to be a bug of RT4 or of the DBIx modules it uses so cross
posting to rt-bugs@ as well…

Please don’t cross-post to rt-bugs@; it leads to duplicate tickets.
Once discussion has achieved consensus on the bug is the right time to
create a bug ticket.

Ok my apologies - it’s been a while since I’ve been here - maybe a cross
post to rt-devel would have been more appropriate - anyhow, my apologies.

Michelle Sullivan
http://www.mhix.org/

Michelle_Sullivan · May 14, 2014, 10:36pm

Michelle Sullivan wrote:

Alex Vandiver wrote:

[snip]

This message was rejected from rt-users because it was over 1M of
logfiles. Please don’t send large messages to the list; you’re welcome
to provide a link to complete logs for those interested in them, of course.

I actually didn’t know the size (though I knew it might be bigger than
acceptable by the list software - figured some intelligent person would
choose to deal or allow it )

Here’s the result of what RT4 does to the DB (note: RT3.8 does NOT
suffer the same problems - as am running that and just trying to
upgrade… Also note: RT 4.0.19 on Apache 2.2 with ModPerl2 running
against some very fast PostgreSQL 8.4.10 DBs (cluster of 4 servers) -
round trip to them is around 25-50ms)… the query log for a SINGLE
ticket follows…

This only occurs on the first request; DBIx::SearchBuilder calls
DBI->column_info to determine the set of columns for each RT table, to
be able to use them in the GROUP BY. It then caches the information;
none of the queries to pg_constraints should happen after the first request.

Yeah - i’m seeing this every time a ticket is loaded - so RT/DBIx is not
caching it… and that’s probably because the IDs are different - so
even pgpool/memcached setup can’t cache it…

There are two fixes to this; one is to find a way to not call
DBIx::SearchBuilder->Fields, which is what calls DBI->column_info, which
causes the slow queries. RT is currently using the set of columns so
obtained to pass to GROUP BY to perform both a DISTINCT and ORDER BY; Pg
9.1 is smart enough to know that grouping by the primary key is enough
to imply the remaining columns as well[1]. As such, the attached patch
(which has only been briefly tested) may suffice.

I’m on 8.4 will that work as well, or should I not bother? (I can’t
upgrade to 9.1 yet, as much as I want to - the software on the system is
in the hands of the Operations team and controlled by puppet so 8.x →
9.x is impossible without corrupting everything or Operations input).

FYI it appears every new connection to the DB is causing
DBI->column_info to be called. On the ‘in test’ system this happens a
lot … page loads when it opens a new connection is +4 minutes…
regardless of the page.

It’s quite usable with and without pgpool (it’s faster without) - except
when a new connection is opened.

This is only slightly papering around the problem, however, as any later
call to ->Fields will incur the same cost. RT does not currently
contain any such calls, so I expect it to be sufficient. A more
robust solution would be to call ->Fields at server initialization time,
before the first request.

That sounds sane (calling at init) … as for the calls to ->Fields I
think you do call it in the custom fields code… because that’s where I
first saw this issue 6 months ago (and it was dismissed.)

This might be a misnomer, as it’s connection opening that causes it…
and a call to customfields should be using the already open handle… in
theory…

Michelle

This would require also modifying
DBIx::SearchBuilder::Handle to cache the information by DSN, as it
currently only lasts until the dbh gets replaced – which happens at
server startup.

This has to be a bug of RT4 or of the DBIx modules it uses so cross
posting to rt-bugs@ as well…

Please don’t cross-post to rt-bugs@; it leads to duplicate tickets.
Once discussion has achieved consensus on the bug is the right time to
create a bug ticket.

Ok my apologies - it’s been a while since I’ve been here - maybe a cross
post to rt-devel would have been more appropriate - anyhow, my apologies.

Michelle Sullivan
http://www.mhix.org/

Michelle_Sullivan · June 24, 2014, 10:43am

Michelle Sullivan wrote:

Michelle Sullivan wrote:

Alex Vandiver wrote:

[snip]

This message was rejected from rt-users because it was over 1M of
logfiles. Please don’t send large messages to the list; you’re welcome
to provide a link to complete logs for those interested in them, of course.

I actually didn’t know the size (though I knew it might be bigger than
acceptable by the list software - figured some intelligent person would
choose to deal or allow it )

Here’s the result of what RT4 does to the DB (note: RT3.8 does NOT
suffer the same problems - as am running that and just trying to
upgrade… Also note: RT 4.0.19 on Apache 2.2 with ModPerl2 running
against some very fast PostgreSQL 8.4.10 DBs (cluster of 4 servers) -
round trip to them is around 25-50ms)… the query log for a SINGLE
ticket follows…

This only occurs on the first request; DBIx::SearchBuilder calls
DBI->column_info to determine the set of columns for each RT table, to
be able to use them in the GROUP BY. It then caches the information;
none of the queries to pg_constraints should happen after the first request.

Yeah - i’m seeing this every time a ticket is loaded - so RT/DBIx is not
caching it… and that’s probably because the IDs are different - so
even pgpool/memcached setup can’t cache it…

There are two fixes to this; one is to find a way to not call
DBIx::SearchBuilder->Fields, which is what calls DBI->column_info, which
causes the slow queries. RT is currently using the set of columns so
obtained to pass to GROUP BY to perform both a DISTINCT and ORDER BY; Pg
9.1 is smart enough to know that grouping by the primary key is enough
to imply the remaining columns as well[1]. As such, the attached patch
(which has only been briefly tested) may suffice.

I’m on 8.4 will that work as well, or should I not bother? (I can’t
upgrade to 9.1 yet, as much as I want to - the software on the system is
in the hands of the Operations team and controlled by puppet so 8.x →
9.x is impossible without corrupting everything or Operations input).

FYI it appears every new connection to the DB is causing
DBI->column_info to be called. On the ‘in test’ system this happens a
lot … page loads when it opens a new connection is +4 minutes…
regardless of the page.

It’s quite usable with and without pgpool (it’s faster without) - except
when a new connection is opened.

So, I have my individual ticket load times running 4 times faster
(minimum) with the following patch to DBIx::SearchBuilder:

— DBIx/SearchBuilder/Handle.pm.orig 2014-06-24 05:21:54.000000000 -0500
+++ DBIx/SearchBuilder/Handle.pm 2014-06-24 05:21:41.000000000 -0500
@@ -1429,14 +1429,16 @@
sub Fields {
my $self = shift;
my $table = shift;

my $lctn = lc $table;

unless ( keys %FIELDS_IN_TABLE ) {

   my $sth = $self->dbh->column_info( undef, '', '%', '%' )

unless ( keys %FIELDS_IN_TABLE && exists $FIELDS_IN_TABLE{$lctn} ) {

   my $sth = $self->dbh->column_info( undef, '', $table, '%' )
       or return ();
   my $info = $sth->fetchall_arrayref({});
   foreach my $e ( @$info ) {

       push @{ $FIELDS_IN_TABLE{ lc $e->{'TABLE_NAME'} } ||= [] },

lc $e->{‘COLUMN_NAME’};

       push @{ $FIELDS_IN_TABLE{ $lctn } ||= [] }, lc

$e->{‘COLUMN_NAME’};
}

$FIELDS_IN_TABLE{ $lctn } ||= ; # empty list so we don’t call
column_info() again.
}

return @{ $FIELDS_IN_TABLE{$lctn } || };

I don’t know how much this will break (if anything) but the 1000’s of
queries is dropped to just a few when calling Fields().

What I know it will break is if you call Fields with a table name not
matching the case of the table in the DB (for the first call) it will
not return the table and it will block returning the table after that -
even if you subsequently call with the correct case… which is because
of using the “lc $table” (in the original: lc $e->{‘TABLE_NAME’}) …
which should not be done as in PostgreSQL you can have Table1, table1,
and taBLE1 as valid tables (though they are probably a bad idea) in the
same DB and schema… It should be case sensitive so dropping all the
'lc’s would be a good idea in my opinion (if nothing else because having
2 table names the same but different case and definition will result in
Fields being the wrong values for at least one of the tables.)

Jesse, thoughts?

Michelle Sullivan
http://www.mhix.org/