Huge number of database queries

Jeff_Davis · November 18, 2007, 10:10pm

RT seems to be generating a huge amount of database traffic. After
investigating, we found that most of the queries were because the static
content (e.g. .css files, image files, .js files) was also generating
database traffic, about 19 queries each. The total number was about 700
queries for one page load!

So, of course, we reconfigured so that static content is not handled by
RT. However, a single dynamic page (e.g. ticket view) still generates
119 queries, which still seems like a huge number to me.

Here is the breakdown for one dynamic page load (displaying a ticket):

119 queries total
067 statements
049 select
005 ping test
044 other
016 deallocate
002 set
052 parse/bind/execute

Is this expected? Is there a way to bring this to a more reasonable
number? It’s hard for me to imagine that we need 44 normal SELECT
statements to view a ticket. The 16 additional prepared statements,
which are then immediately DEALLOCATEd are probably excessive as well.

Keep in mind this is a page reload so the page has already been loaded
once.

Regards,
Jeff Davis

Jesse_Vincent · November 18, 2007, 10:58pm

RT seems to be generating a huge amount of database traffic. After
investigating, we found that most of the queries were because the static
content (e.g. .css files, image files, .js files) was also generating
database traffic, about 19 queries each.

So, RT 3.6.6 should significantly improve the state of the static
content. We’ve finally deployed CSS::Squish to force only a single CSS
file for your RT instance and we’ve improved the browser hinting that
should force caching so you only load it once unless you’re forcing the
browser’s cache to wipe itself on reload. But do note that even 3.6.5
should be caching those files.

RT. However, a single dynamic page (e.g. ticket view) still generates
119 queries, which still seems like a huge number to me.

When I saw your blog post about this, I thought you were talking about
SQL queries.

Here is the breakdown for one dynamic page load (displaying a ticket):

How many custom fields and transactions on the ticket?

119 queries total
067 statements
049 select
005 ping test
044 other
016 deallocate
002 set
052 parse/bind/execute

Is this expected? Is there a way to bring this to a more reasonable
number?

Do you just find it “not-right” or are you seeing poor performance?
Either way, I’d like to see what the queries are, so as to be able to
give you a decent answer there.

It’s hard for me to imagine that we need 44 normal SELECT
statements to view a ticket.

Can you pull out what they are?

The 16 additional prepared statements,

which are then immediately DEALLOCATEd are probably excessive as well.

I suspect that the issue here stems from the fact that until very
recently, DBD::mysql and DBD::Pg didn’t support server side prepared
statements - everything was emulated on the client. Putting together a
cache at the ORM layer should be doable. But I’ve never seen those sorts
of queries actually generate measurable overhead.

Best,
Jesse

Jeff_Davis · November 19, 2007, 12:34am

So, RT 3.6.6 should significantly improve the state of the static
content. We’ve finally deployed CSS::Squish to force only a single CSS
file for your RT instance and we’ve improved the browser hinting that
should force caching so you only load it once unless you’re forcing the
browser’s cache to wipe itself on reload. But do note that even 3.6.5
should be caching those files.

Hmm… perhaps Ctrl-R forced it to re-request everything. I’m not quite
as concerned about this, because there’s a workaround, but I’m glad to
hear it’s being improved.

RT. However, a single dynamic page (e.g. ticket view) still generates
119 queries, which still seems like a huge number to me.

When I saw your blog post about this, I thought you were talking about
SQL queries.

What I mean, very specifically, is that when I turn
“log_min_duration_statement = 0” in postgresql.conf, it generates 119
lines that contain “duration:” in the log. These are primarily SQL
queries, although I suppose that depends on your definition.

Here is the breakdown for one dynamic page load (displaying a ticket):

How many custom fields and transactions on the ticket?

No custom fields and 3 transactions, as far as I can tell. We have an
almost empty database.

119 queries total
067 statements
049 select
005 ping test
044 other
016 deallocate
002 set
052 parse/bind/execute

Is this expected? Is there a way to bring this to a more reasonable
number?

Do you just find it “not-right” or are you seeing poor performance?
Either way, I’d like to see what the queries are, so as to be able to
give you a decent answer there.

I would say that the problem is more that it’s “not right” than any
serious performance problem now.

We haven’t deployed it to production yet, but we will have a substantial
number of users though, and most of those users will be using RT
constantly through the workday. People will avoid using the ticketing
system if every minor update or action interrupts them with a delay
(even a small delay), so I’d like the system to be essentially
instantaneous.

I’m more worried about the stability of performance than anything else.
With so many network round-trips to the database, I’m worried that any
minor slowdown would make RT unavailable. We don’t notice any serious
problem right now, because our network ping time is ~0.2 ms, and our
processor usage on the database server is also low. Also, we have
essentially an empty database now, so what will the performance be like
as it grows? Will the number of queries per page load change as the
dataset changes?

It also just makes it harder to administer. If I turn on query logging
to diagnose a problem, the signal to noise ratio makes it almost
useless.

It’s hard for me to imagine that we need 44 normal SELECT
statements to view a ticket.

Can you pull out what they are?

I have to scrub the data before putting it in a public place (yeah, I
know there’s nothing sensitive, but I still have to look to be sure), so
this is just a little more detail to the summary:

016 parse
001 “SELECT * FROM Users WHERE LOWER(Gecos) = LOWER($1)”
002 “SELECT * FROM Users WHERE LOWER(Gecos) = LOWER($1)”
001 “SELECT * FROM Tickets WHERE id = $1”
002 “SELECT * FROM Queues WHERE id = $1”
001 “SELECT * FROM Transactions WHERE id = $1”
009 other
018 bind
018 execute
016 deallocate
049 select
005 ping test
011 “SELECT … from ACL, Groups, Principals, CachedGroupMembers …”
004 “SELECT main.* FROM ( SELECT main.id FROM CustomFields …”
004 “SELECT DISTINCT main.* FROM Users main JOIN Principals …”
006 “SELECT main.* FROM Tickets main WHERE …”
006 “SELECT main.* FROM Links main WHERE …”
002 “…Users main CROSS JOIN ACL…”
011 other

The 16 additional prepared statements,

which are then immediately DEALLOCATEd are probably excessive as well.

I suspect that the issue here stems from the fact that until very
recently, DBD::mysql and DBD::Pg didn’t support server side prepared
statements - everything was emulated on the client. Putting together a
cache at the ORM layer should be doable. But I’ve never seen those sorts
of queries actually generate measurable overhead.

I’m not worried about optimizing away a few protocol round-trips that
are only generated by DBD::Pg. I’m worried about the big numbers. If you
count just the executes, and ignore parse/bind/deallocate (which is fine
by me), that’s still 18+49 = 67 queries.

So I guess the numbers are: 119 round trips to the DB server, and 67
queries. It would be nice if we could get both of those numbers down to
something reasonable.

Regards,
Jeff Davis

Jeff_Davis · November 20, 2007, 8:13pm

So I guess the numbers are: 119 round trips to the DB server, and 67
queries. It would be nice if we could get both of those numbers down to
something reasonable.

Is this considered a bug, or expected behavior?

Regards,
Jeff Davis

Jesse_Vincent · November 20, 2007, 8:16pm

So I guess the numbers are: 119 round trips to the DB server, and 67
queries. It would be nice if we could get both of those numbers down to
something reasonable.

Is this considered a bug, or expected behavior?

You got my message from yesterday, where I asked for some more detail
about the exact queries, right? I can’t tell exactly what’s biting you
until I see what’s going on.

Jesse_Vincent · November 19, 2007, 1:18pm

Jeff Davis wrote:

Hmm… perhaps Ctrl-R forced it to re-request everything. I’m not quite
as concerned about this, because there’s a workaround, but I’m glad to
hear it’s being improved.

Yeah, it generally does.

RT. However, a single dynamic page (e.g. ticket view) still generates
119 queries, which still seems like a huge number to me.
When I saw your blog post about this, I thought you were talking about
SQL queries.

What I mean, very specifically, is that when I turn
“log_min_duration_statement = 0” in postgresql.conf, it generates 119
lines that contain “duration:” in the log. These are primarily SQL
queries, although I suppose that depends on your definition.

nod FWIW, enabling “log_statement” on 8.1 didn’t do that for me.

Here is the breakdown for one dynamic page load (displaying a ticket):
How many custom fields and transactions on the ticket?

No custom fields and 3 transactions, as far as I can tell. We have an
almost empty database.

We haven’t deployed it to production yet, but we will have a substantial
number of users though, and most of those users will be using RT
constantly through the workday. People will avoid using the ticketing
system if every minor update or action interrupts them with a delay
(even a small delay), so I’d like the system to be essentially
instantaneous.

As would we.

I’m more worried about the stability of performance than anything else.
With so many network round-trips to the database, I’m worried that any
minor slowdown would make RT unavailable. We don’t notice any serious
problem right now, because our network ping time is ~0.2 ms, and our
processor usage on the database server is also low. Also, we have
essentially an empty database now, so what will the performance be like
as it grows? Will the number of queries per page load change as the
dataset changes?

It shouldn’t change significantly, but if you want to stress-test, the
results can help us to improve RT.

It also just makes it harder to administer. If I turn on query logging
to diagnose a problem, the signal to noise ratio makes it almost
useless.

I think that a little experience about what to grep out will go a long
way toward that. We’ve had a fair amount of experience and luck tuning
RT on Postgres with those logs.

It’s hard for me to imagine that we need 44 normal SELECT
statements to view a ticket.
Can you pull out what they are?

I have to scrub the data before putting it in a public place (yeah, I
know there’s nothing sensitive, but I still have to look to be sure), so
this is just a little more detail to the summary:

016 parse
001 “SELECT * FROM Users WHERE LOWER(Gecos) = LOWER($1)”
002 “SELECT * FROM Users WHERE LOWER(Gecos) = LOWER($1)”

LOWER(Gecos)? Really? Can you pull out the values there? I wouldn’t
expect to see that in normal operation unless you’re playing external
authenntication tricks.

001 “SELECT * FROM Tickets WHERE id = $1”
002 “SELECT * FROM Queues WHERE id = $1”

All thse seem pretty

001 “SELECT * FROM Transactions WHERE id = $1”

This seems a little surprsiing. That should get pulled in from a bigger
prefetch of all txns related to that ticket

009 other
018 bind
018 execute
016 deallocate

I talked about those in my last mail. I’d love to see a patch to
SearchBuilder to reuse prepared statements, but I’m not sure it’s a
high-priority target.

049 select
005 ping test
011 “SELECT … from ACL, Groups, Principals, CachedGroupMembers …”

11 ACL checks seems slightly high but not insanely high, since RT is
checking queue, ticket and custom field ACLs. Though you should pay
attention to Ruslan’s recent mail about refactoring the ACL code to be a
bit smarter and cut down on the number of queries.

004 “SELECT main.* FROM ( SELECT main.id FROM CustomFields …”

Your custom fields.

004 “SELECT DISTINCT main.* FROM Users main JOIN Principals …”

Requestors, Ccs, AdminCcs and Owners.

006 “SELECT main.* FROM Tickets main WHERE …”

Unsure

006 “SELECT main.* FROM Links main WHERE …”

Six different kinds of links on tickets

002 “…Users main CROSS JOIN ACL…”
011 other

What falls into “other”?

So I guess the numbers are: 119 round trips to the DB server, and 67
queries. It would be nice if we could get both of those numbers down to
something reasonable.

nod Improving query count has generally taken a back-seat to improving
overall query performance, since quite often there are one or two
queries in that set that dwarf the run time of all the rest. That said,
I’d love to make fewer queries and get the same results Would you
maybe be up for helping to improve things? Setting up a global
prepared-query cache seems like it might be a good, well-contained
project that would have a pretty direct impact on what’s made your
spidey-sense tingle.

There’s also interesting work to be done with caching and invalidating
caches of collection search results, possibly using a tool like MemCache
that I’d be happy to ramble about if somebody is interested in hacking
on it.

Best,

Jesse

Jeff_Davis · November 21, 2007, 12:39am

nod FWIW, enabling “log_statement” on 8.1 didn’t do that for me.

PostgreSQL logging has many options, and many of them are
interdependent. I mentioned log_min_duration_statement=0 because that’s
an easy way to make sure that the statements are logged.

I’m more worried about the stability of performance than anything else.
With so many network round-trips to the database, I’m worried that any
minor slowdown would make RT unavailable. We don’t notice any serious
problem right now, because our network ping time is ~0.2 ms, and our
processor usage on the database server is also low. Also, we have
essentially an empty database now, so what will the performance be like
as it grows? Will the number of queries per page load change as the
dataset changes?

It shouldn’t change significantly, but if you want to stress-test, the
results can help us to improve RT.

We’ll certainly provide our results.

It also just makes it harder to administer. If I turn on query logging
to diagnose a problem, the signal to noise ratio makes it almost
useless.

I think that a little experience about what to grep out will go a long
way toward that. We’ve had a fair amount of experience and luck tuning
RT on Postgres with those logs.

Ok.

It’s hard for me to imagine that we need 44 normal SELECT
statements to view a ticket.
Can you pull out what they are?

I have to scrub the data before putting it in a public place (yeah, I
know there’s nothing sensitive, but I still have to look to be sure), so
this is just a little more detail to the summary:

016 parse
001 “SELECT * FROM Users WHERE LOWER(Gecos) = LOWER($1)”
002 “SELECT * FROM Users WHERE LOWER(Gecos) = LOWER($1)”

LOWER(Gecos)? Really? Can you pull out the values there? I wouldn’t
expect to see that in normal operation unless you’re playing external
authenntication tricks.

We set WebExternalAuth and we unset WebFallbackToInternalAuth. I don’t
think we do any tricks aside from that. We use a .htaccess file to
authenticate through LDAP.

001 “SELECT * FROM Transactions WHERE id = $1”

This seems a little surprsiing. That should get pulled in from a bigger
prefetch of all txns related to that ticket

009 other
018 bind
018 execute
016 deallocate

I talked about those in my last mail. I’d love to see a patch to
SearchBuilder to reuse prepared statements, but I’m not sure it’s a
high-priority target.

I’ll have to look into it to see how many statements can be effectively
reused.

049 select
005 ping test
011 “SELECT … from ACL, Groups, Principals, CachedGroupMembers …”

11 ACL checks seems slightly high but not insanely high, since RT is
checking queue, ticket and custom field ACLs. Though you should pay
attention to Ruslan’s recent mail about refactoring the ACL code to be a
bit smarter and cut down on the number of queries.

Ok.

004 “SELECT main.* FROM ( SELECT main.id FROM CustomFields …”

Your custom fields.

“customfield” and “customfieldvalues” are both empty tables

004 “SELECT DISTINCT main.* FROM Users main JOIN Principals …”

Requestors, Ccs, AdminCcs and Owners.

006 “SELECT main.* FROM Tickets main WHERE …”

Unsure

006 “SELECT main.* FROM Links main WHERE …”

Six different kinds of links on tickets

002 “…Users main CROSS JOIN ACL…”
011 other

What falls into “other”?

I’ll send the logs to you off-list.

nod Improving query count has generally taken a back-seat to improving
overall query performance, since quite often there are one or two
queries in that set that dwarf the run time of all the rest. That said,

Good performance and also stable/scalable performance that doesn’t
drastically change when the conditions change are the main goals for me.
I’m not implying that RT does that currently, but that’s where I’m
coming from.

I’d love to make fewer queries and get the same results Would you
maybe be up for helping to improve things? Setting up a global
prepared-query cache seems like it might be a good, well-contained
project that would have a pretty direct impact on what’s made your
spidey-sense tingle.

I’ve been looking into it as I learn RT. The deadline for deploying RT
is approaching, but afterward when RT is in maintenance I may be able to
contribute (employer permitting).

There’s also interesting work to be done with caching and invalidating
caches of collection search results, possibly using a tool like MemCache
that I’d be happy to ramble about if somebody is interested in hacking
on it.

That’s always an interesting subject, but I’m cautious about introducing
more caching because it’s a challenge to get it right. I’d hate to give
up real stability for hypothetical performance

Regards,
Jeff Davis

Jesse_Vincent · November 23, 2007, 5:58pm

nod Improving query count has generally taken a back-seat to improving
overall query performance, since quite often there are one or two
queries in that set that dwarf the run time of all the rest. That said,

Good performance and also stable/scalable performance that doesn’t
drastically change when the conditions change are the main goals for me.
I’m not implying that RT does that currently, but that’s where I’m
coming from.

Understood. Seeing a significant rise in query count per page load as the system scales…shouldn’t happen. And postgres with a bit of tuning is very good at keeping the performance scale nicely. But there’s no substitute for testing to make me eat my words

That’s always an interesting subject, but I’m cautious about introducing
more caching because it’s a challenge to get it right. I’d hate to give
up real stability for hypothetical performance

Jesse_Vincent · November 23, 2007, 6:20pm

Here is the log of the SQL queries. I meant to send this earlier today.

Note that I am sending this off-list. I have redacted some of the
information, even though it may seem innocuous, but I don’t think it
will get in your way. I didn’t actually delete any of the entries
though.

[re-cced to rt-devel now that I’ve removed the SQL log]

There is one thing I notice in your log that’s “not right” and is likely
RT’s fault. of the 51 queries to build a ticket history page, a large
number of them are for “Transaction custom fields” Those could all be
done with a single query and pre-loaded. That may be worth looking at if
you see a performance issue there. But it’s well-indexed and shouldn’t
be much of a performance it. (Famous last words, I know).