AW: Re: clean up user table

Torsten_Brumm · November 27, 2007, 8:42am

Hi,
Make sure to delete the users in a safe way with rtx-shredder and not directly from the db. Btw, about how many users we talk? We have NO performance problems with 30.000 active and a half a million inactive users…

TorstenFrom: rt-users-bounces@lists.bestpractical.com rt-users-bounces@lists.bestpractical.com
To: rt-users@lists.bestpractical.com rt-users@lists.bestpractical.com
Sent: Mon Nov 26 23:43:26 2007
Subject: Re: [rt-users] clean up user table

Todd Chapman wrote:

Are you having preformance problems related to the users? If not why
clean it up? Do you like to play with fire?

Yes, I am. RT3 makes multiple queries per ticket displayed that join
the very large principals and cachedgroupmembers tables. It causes
tickets to display somewhat slowly. There’s a small but notable delay
with RT3 3.6.5 and Pg 8.1. The delay was intolerably long with some
earlier minor revision of 3.6 and MySQL 5.0.

I’m tired of running into these problems when I update some component of
the system, and would like to prune out the tens of thousands of “user”
records created by spam through the mail gateway. I’d be grateful for
any help that the list can offer.
http://lists.bestpractical.com/cgi-bin/mailman/listinfo/rt-users

SAVE THOUSANDS OF DOLLARS ON RT SUPPORT:

If you sign up for a new RT support contract before December 31, we’ll take
up to 20 percent off the price. This sale won’t last long, so get in touch today.
Email us at sales@bestpractical.com or call us at +1 617 812 0745.

Community help: http://wiki.bestpractical.com
Commercial support: sales@bestpractical.com

Discover RT’s hidden secrets with RT Essentials from O’Reilly Media.
Buy a copy at http://rtbook.bestpractical.com

Sven_Sternberger · November 27, 2007, 9:07am

Hello!

I also want to see only real users in my DB, it is not about
performance, but it would be nice number to show the management.

Make sure to delete the users in a safe way with rtx-shredder and not
directly from the db. Btw, about how many users we talk? We have NO
performance problems with 30.000 active and a half a million inactive
users…

But rtx-shredder don’t give you a simple way to delete all
users without a ticket, so you have to combine the select
statement from Gordon, and then call rtx-shredder

best regards

sven

Sven_Sternberger · November 27, 2007, 9:42am

oops!On Di, 2007-11-27 at 10:07 +0100, Sven Sternberger wrote:

But rtx-shredder don’t give you a simple way to delete all
users without a ticket, so you have to combine the select
statement from Gordon, and then call rtx-shredder

so Geordons sql statemaent is of course not correct
(for most of us) we have a lot of valid users who never
owned a ticket.
So replace Ticket.owner with Transaction.creator,
which gives us all users where we deleted
their tickets via rtx-shredder.

regards!

sven

Gordon_Messmer2 · November 27, 2007, 4:21pm

Ham MI-ID, Torsten Brumm wrote:

Hi,
Make sure to delete the users in a safe way with rtx-shredder and not
directly from the db.

I’d prefer to, but as I said before, it doesn’t work. The newest
version of rtx-shredder has a User plugin that has a no_tickets option,
but when I use that, it says that there’s nothing to remove.

Btw, about how many users we talk? We have NO
performance problems with 30.000 active and a half a million inactive
users…

GroupMembers (87456 rows), Groups (95138 rows), Principals (110714
rows), and Users (15574 rows) tables.

It doesn’t seem like much, but it is mighty slow. How do your tables
compare?

Gordon_Messmer2 · November 27, 2007, 4:27pm

Sven Sternberger wrote:

oops!

But rtx-shredder don’t give you a simple way to delete all
users without a ticket, so you have to combine the select
statement from Gordon, and then call rtx-shredder

so Geordons sql statemaent is of course not correct
(for most of us) we have a lot of valid users who never
owned a ticket.
So replace Ticket.owner with Transaction.creator,
which gives us all users where we deleted
their tickets via rtx-shredder.

Yep, I noticed that later. So far, the best I’ve got is this:

SELECT users.id
FROM users
WHERE users.id NOT IN
(SELECT users.id
FROM users
INNER JOIN principals ON users.id = principals.id
LEFT JOIN tickets ON principals.id = tickets.Owner
OR principals.id = tickets.creator
WHERE Tickets.id IS NOT NULL);

I still don’t know how other relationships are tracked, like CCs?

Kenneth_Marshall · November 27, 2007, 4:29pm

Here is our instance data, not slow at all:

rt36=# select count(*) from CachedGroupMembers;
count
1163547
(1 row)

rt36=# select count(*) from GroupMembers;
count
353086
(1 row)

rt36=# select count(*) from Groups;
count
473803
(1 row)

rt36=# select count(*) from Principals;
count
490777
(1 row)

rt36=# select count(*) from Users;
count
16971
(1 row)

KenOn Tue, Nov 27, 2007 at 08:21:51AM -0800, Gordon Messmer wrote:

Ham MI-ID, Torsten Brumm wrote:

Hi,
Make sure to delete the users in a safe way with rtx-shredder and not
directly from the db.

I’d prefer to, but as I said before, it doesn’t work. The newest version
of rtx-shredder has a User plugin that has a no_tickets option, but when I
use that, it says that there’s nothing to remove.

Btw, about how many users we talk? We have NO performance problems with
30.000 active and a half a million inactive users…

From my original message: CachedGroupMembers (254475 rows), GroupMembers
(87456 rows), Groups (95138 rows), Principals (110714 rows), and Users
(15574 rows) tables.

It doesn’t seem like much, but it is mighty slow. How do your tables
compare?

The rt-users Archives

SAVE THOUSANDS OF DOLLARS ON RT SUPPORT:

If you sign up for a new RT support contract before December 31, we’ll take
up to 20 percent off the price. This sale won’t last long, so get in touch
today. Email us at sales@bestpractical.com or call us at +1 617 812
0745.

Community help: http://wiki.bestpractical.com
Commercial support: sales@bestpractical.com

Discover RT’s hidden secrets with RT Essentials from O’Reilly Media. Buy a
copy at http://rtbook.bestpractical.com

Gordon_Messmer2 · November 27, 2007, 4:36pm

Kenneth Marshall wrote:

Here is our instance data, not slow at all:

I’m willing to consider that I’m looking in the wrong place… When you
open a ticket, what’s the “time to display” on your instance? Which DB
are you using, and what tuning did you do to it?

Kenneth_Marshall · November 27, 2007, 4:57pm

Kenneth Marshall wrote:

Here is our instance data, not slow at all:

I’m willing to consider that I’m looking in the wrong place… When you
open a ticket, what’s the “time to display” on your instance? Which DB are
you using, and what tuning did you do to it?

Gordon,

The time to open and display a ticket varies based on the number
of transactions involved, but typically takes on the order of 1-3s.
A new ticket with a few updates will take about 1s and an older
ticket with many updates takes 3-4s. We are using PostgreSQL 8.2
for the backend DB with the index patches from the mailing list
posting. I did have to watch the logs for slow queries, to figure
out where indexes were needed. That and keeping the session table
cleaned is pretty much all that we need to do. From your description,
it sounds like you are just missing an index or two. Once you find
out what you need and add them, you will be fine. I can help with
PostgreSQL but I do not have much experience with tuning MySQL.

Regards,
Ken

Gordon_Messmer2 · November 27, 2007, 5:49pm

Kenneth Marshall wrote:

The time to open and display a ticket varies based on the number
of transactions involved, but typically takes on the order of 1-3s.

With RT3 3.6.5 and PG 8.1.9, I’m seeing newish tickets open in a little
more than two seconds. I don’t have numbers for MySQL 5, but I recall
it being less than half a second. It was much longer than that with our
slightly older rt3 on MySQL 5, and also much longer before we’d done
vacuum analyse on postgres.

Yes, having seen the system behave much faster, I do think that >2
seconds is slow.

A new ticket with a few updates will take about 1s and an older
ticket with many updates takes 3-4s. We are using PostgreSQL 8.2
for the backend DB with the index patches from the mailing list
posting. I did have to watch the logs for slow queries, to figure
out where indexes were needed. That and keeping the session table
cleaned is pretty much all that we need to do. From your description,
it sounds like you are just missing an index or two. Once you find
out what you need and add them, you will be fine. I can help with
PostgreSQL but I do not have much experience with tuning MySQL.

After enabling the log, this is the single slowest query that I saw.

2007-11-27 09:39:53 PST rt3 - LOG: duration: 450.262 ms statement:
SELECT main.* FROM ( SELECT main.id FROM GroupMembers main JOIN Groups
Groups_1 ON ( Groups_1.id = main.GroupId ) WHERE (Groups_1.Domain =
‘SystemInternal’ OR Groups_1.Domain = ‘UserDefined’) AND (main.MemberId
= ‘169133’) GROUP BY main.id ORDER BY min(Groups_1.Domain) ASC,
min(Groups_1.Name) ASC ) distinctquery, GroupMembers main WHERE
(main.id = distinctquery.id)

Where would I look for “index patches”?

Kenneth_Marshall · November 27, 2007, 6:25pm

Kenneth Marshall wrote:

The time to open and display a ticket varies based on the number
of transactions involved, but typically takes on the order of 1-3s.

With RT3 3.6.5 and PG 8.1.9, I’m seeing newish tickets open in a little
more than two seconds. I don’t have numbers for MySQL 5, but I recall it
being less than half a second. It was much longer than that with our
slightly older rt3 on MySQL 5, and also much longer before we’d done vacuum
analyse on postgres.

Yes, having seen the system behave much faster, I do think that >2 seconds
is slow.

We are currently running 3.4.5pre1 and in most respects, the actual
browser/RT-frontend are the hold ups, not the database backend. The
RT 3.6.4 test instance was much faster in most areas. I cannot give
you any timings because the web server that I was using is in the
process of being upgraded to a newer release of the software. I do suspect
that the MySQL query cache could help you eek out a few more fractions of
a second in performance. Once we had the indexes adjusted for the slow
queries, the biggest gain was in minimizing the information that needed
to be rendered by the browser. For example, we pruned the list of
transactions that are displayed by default. This easily provided a big
speed increase for rendering, particularly as the tickets acquired
updates. As a reference point, it takes my DB 230ms to return the
query below uncached and 75-100ms once cached.

A new ticket with a few updates will take about 1s and an older
ticket with many updates takes 3-4s. We are using PostgreSQL 8.2
for the backend DB with the index patches from the mailing list
posting. I did have to watch the logs for slow queries, to figure
out where indexes were needed. That and keeping the session table
cleaned is pretty much all that we need to do. From your description,
it sounds like you are just missing an index or two. Once you find
out what you need and add them, you will be fine. I can help with
PostgreSQL but I do not have much experience with tuning MySQL.

After enabling the log, this is the single slowest query that I saw.

2007-11-27 09:39:53 PST rt3 - LOG: duration: 450.262 ms statement: SELECT
main.* FROM ( SELECT main.id FROM GroupMembers main JOIN Groups Groups_1
ON ( Groups_1.id = main.GroupId ) WHERE (Groups_1.Domain =
‘SystemInternal’ OR Groups_1.Domain = ‘UserDefined’) AND (main.MemberId =
‘169133’) GROUP BY main.id ORDER BY min(Groups_1.Domain) ASC,
min(Groups_1.Name) ASC ) distinctquery, GroupMembers main WHERE (main.id =
distinctquery.id)

Where would I look for “index patches”?

I sent you our list of indexes. You can see how your setup compares and
see if any of the missing ones help your performance. “EXPLAIN ANALYZE…”
can give you detailed information about your query plans.

Good luck with your pruning, but I suspect that that may not have
much of an effect if your indexes are correct.

Ken

Ruslan_Zakirov1 · November 30, 2007, 11:17pm

Kenneth Marshall wrote:

The time to open and display a ticket varies based on the number
of transactions involved, but typically takes on the order of 1-3s.

With RT3 3.6.5 and PG 8.1.9, I’m seeing newish tickets open in a little
more than two seconds. I don’t have numbers for MySQL 5, but I recall it
being less than half a second. It was much longer than that with our
slightly older rt3 on MySQL 5, and also much longer before we’d done vacuum
analyse on postgres.

Yes, having seen the system behave much faster, I do think that >2 seconds
is slow.

We are currently running 3.4.5pre1 and in most respects, the actual
browser/RT-frontend are the hold ups, not the database backend. The
RT 3.6.4 test instance was much faster in most areas. I cannot give
you any timings because the web server that I was using is in the
process of being upgraded to a newer release of the software. I do suspect
that the MySQL query cache could help you eek out a few more fractions of
a second in performance. Once we had the indexes adjusted for the slow
queries, the biggest gain was in minimizing the information that needed
to be rendered by the browser. For example, we pruned the list of
transactions that are displayed by default. This easily provided a big
speed increase for rendering, particularly as the tickets acquired
updates. As a reference point, it takes my DB 230ms to return the
query below uncached and 75-100ms once cached.

A new ticket with a few updates will take about 1s and an older
ticket with many updates takes 3-4s. We are using PostgreSQL 8.2
for the backend DB with the index patches from the mailing list
posting. I did have to watch the logs for slow queries, to figure
out where indexes were needed. That and keeping the session table
cleaned is pretty much all that we need to do. From your description,
it sounds like you are just missing an index or two. Once you find
out what you need and add them, you will be fine. I can help with
PostgreSQL but I do not have much experience with tuning MySQL.

After enabling the log, this is the single slowest query that I saw.

2007-11-27 09:39:53 PST rt3 - LOG: duration: 450.262 ms statement: SELECT
main.* FROM ( SELECT main.id FROM GroupMembers main JOIN Groups Groups_1
ON ( Groups_1.id = main.GroupId ) WHERE (Groups_1.Domain =
‘SystemInternal’ OR Groups_1.Domain = ‘UserDefined’) AND (main.MemberId =
‘169133’) GROUP BY main.id ORDER BY min(Groups_1.Domain) ASC,
min(Groups_1.Name) ASC ) distinctquery, GroupMembers main WHERE (main.id =
distinctquery.id)

Where would I look for “index patches”?

I sent you our list of indexes. You can see how your setup compares and
see if any of the missing ones help your performance. “EXPLAIN ANALYZE…”
can give you detailed information about your query plans.

Good luck with your pruning, but I suspect that that may not have
much of an effect if your indexes are correct.
I do believe index on GroupMembers(MemberId, GroupId) will help this query much.

Ken

Best regards, Ruslan.

Gordon_Messmer2 · December 4, 2007, 6:53pm

Kenneth Marshall wrote:

I sent you our list of indexes. You can see how your setup compares and
see if any of the missing ones help your performance. “EXPLAIN ANALYZE…”
can give you detailed information about your query plans.

Good luck with your pruning, but I suspect that that may not have
much of an effect if your indexes are correct.

That’s true, it didn’t make any significant changes after fixing the
indexes. For the archives, though, I did get rtx-shredder to work with
Ruslan’s advice:

rtx-shredder --plugin
‘Users=no_tickets,true;limit,20000;status,any;replace_relations,Nobody’

I used the command above to remove all of the users who presumably were
created by spammers, after removing all of the deleted tickets (also
using rtx-shredder).

Kenneth’s index list improved performance on Postgresql immensely.
Perhaps the attached patch could be applied to the distribution? Should
I submit it elsewhere for consideration?

Existing postgresql users should be able to:

DROP INDEX Queues1;
CREATE UNIQUE INDEX Queues1 ON Queues (lower((Name)::text)) ;
DROP INDEX Groups2;
CREATE INDEX Groups2 On Groups (lower((Type)::text),
lower((Domain)::text), Instance);
CREATE INDEX GroupMembers1 ON GroupMembers (GroupId);
DROP INDEX Users1;
CREATE UNIQUE INDEX Users1 ON Users (lower(Name)::text) ;
DROP INDEX Users2;
DROP INDEX Users4;
CREATE INDEX Users4 ON Users (lower(EmailAddress)::text);
DROP INDEX ObjectCustomFieldValues2;

Many thanks to everyone who helped. I really appreciate it.

schema.Pg.diff (1.22 KB)

Gordon_Messmer2 · December 4, 2007, 6:58pm

Ruslan Zakirov wrote:

I do believe index on GroupMembers(MemberId, GroupId) will help this
query much.

I missed this message somehow. Do you think I should have used an index
on GroupMembers(MemberId, GroupId) instead of just GroupMembers
(GroupId)? Or an additional index? The distributed schema indexes
CachedGroupMembers(MemberId) and CachedGroupMembers(GroupId)
separately. Should GroupMembers and CachedGroupMembers be indexed the
same way?