Rt-validator --check on upgrade from 4.8.14 to 4.0.7

Good day all.

I’m busy upgrading a large RT instance (63701.5MB) from version 4.8.14
to 4.0.7.

As part of the upgrading documentation is this section :

"Unique names are now enforced for user defined groups. New groups
cannot be created with a duplicate name and existing groups cannot be
renamed to an in-use name. The admin interface will warn about
existing groups with duplicate names. Although the groups will still
function, some parts of the interface (rights management, subgroup
membership) may not work as expected with duplicate names. Running

/opt/rt4/sbin/rt-validator --check

will report duplicate group names, and running it with --resolve will
fix duplicates by appending the group id to the name."

The query “SELECT s.id, s.id FROM Tickets s LEFT JOIN Groups t ON
(s.id = t.Instance) AND ( t.Domain = ‘RT::Ticket-Role’ ) WHERE t.id IS
NULL AND s.id IS NOT NULL” has now been running for 424101 seconds
(117 hours) and still going.

The tables sizes are :

Tickets : 1121969
Groups : 4558465

Since it’s a single thread it’s using only one CPU on the 2 CPU
machine. and memory is not having any impact (1150MB of 4096MB used)

The question is where to from here.

  • Is there a way to archive tickets but still have them accessible and
    searchable to speed this up? extension maybe ?
  • Is there a way to speed up the query ?

any other suggestions or things I’m overlooking ?

Regards
Henti

The query “SELECT s.id, s.id FROM Tickets s LEFT JOIN Groups t ON
(s.id = t.Instance) AND ( t.Domain = ‘RT::Ticket-Role’ ) WHERE t.id IS
NULL AND s.id IS NOT NULL” has now been running for 424101 seconds
(117 hours) and still going.

Note that this query is for other integrity checks, not the unique group
names. rt-validator is a general purpose database integrity checker and
fixer; unique group name checking is just part of it that was added in 4.0.

You can safely skip running rt-validator if you’re not concerned about
database integrity, and simply verify your group names manually in Tools
→ Configuration → Groups.

The question is where to from here.

  • Is there a way to archive tickets but still have them accessible and
    searchable to speed this up? extension maybe ?

That should be an unnecessary approach if the DB gets tuned appropriately.

  • Is there a way to speed up the query ?

You don’t mention what database you’re using. Can you run EXPLAIN on
that SELECT?

Since it’s a single thread it’s using only one CPU on the 2 CPU
machine. and memory is not having any impact (1150MB of 4096MB used)

The script is IO bound. It scans whole tables and does it several
times. If you’re on mysql you should consider increasing
innodb_buffer_pool_size. Even for normal RT operations you want to
give every bit of memory available after web server to DB server.
Which pools, buffers or caches should be increased is very individual.

Best regards, Ruslan.