Archiving and Retrieval of data in RT

Ritu_Khetan · March 1, 2004, 11:53am

Hello All,

What is the best way to archive data from RT (using Postgres) say on
a yearly basis. However, this data should be available at any point of
time of read/search if required.

Say, for e.g. -
If I have 5 years data in RT, which I do not require everyday, I would
like to archive it yearly so that it does not slow the performance of my
system. However, at the same time, if I have a request to look for some
information in the archived data, I should be able to look at it
immediately.

Please suggest.

Regards,
Ritu

NETCORE SOLUTIONS *** Ph: +91 22 5662 8000 Fax: +91 22 5662 8134

MailServ and FlexiMail: Messaging Solutions: http://netcore.co.in

Pragatee: Integrated Server-Software Suite: http://www.pragatee.com

Emergic Freedom: Server-centric Computing: http://www.emergic.com

BlogStreet: Blog Profiles and RSS Ecosystem: http://blogstreet.com

Deeshaa: Rural Development: http://www.deeshaa.com

Rajesh Jain’s Weblog on Technology: http://www.emergic.org

Cerion_Armour-Brown1 · March 1, 2004, 1:05pm

Hello All,

What is the best way to archive data from RT (using Postgres) say on
a yearly basis. However, this data should be available at any point of
time of read/search if required.

Say, for e.g. -
If I have 5 years data in RT, which I do not require everyday, I would
like to archive it yearly so that it does not slow the performance of my
system. However, at the same time, if I have a request to look for some
information in the archived data, I should be able to look at it
immediately.

Do the database archiving by hand/script… just setup an identical rt3
database, dump the databases, do diffs and add new records to rt3_archive…
or something to that effect.
As for access, depends how ‘immediate’… if you can handle having to change
$DatabaseName in RT_SiteConfig.pm, and then reloading RT in the browser, that
should work.
If that’s not immediate enough, you could set up a second RT instance with a
different $DatabaseName, plus a virtual host so you can access this
rt_archive instance ‘immediately’ - on a different web address.
Cerion

Ruslan_U_Zakirov · March 1, 2004, 1:08pm

Cerion Armour-Brown wrote:> On Monday 01 March 2004 12:53, Ritu Khetan wrote:

Hello All,

What is the best way to archive data from RT (using Postgres) say on
a yearly basis. However, this data should be available at any point of
time of read/search if required.

Say, for e.g. -
If I have 5 years data in RT, which I do not require everyday, I would
like to archive it yearly so that it does not slow the performance of my
system. However, at the same time, if I have a request to look for some
information in the archived data, I should be able to look at it
immediately.

Do the database archiving by hand/script… just setup an identical rt3
database, dump the databases, do diffs and add new records to rt3_archive…
or something to that effect.
As for access, depends how ‘immediate’… if you can handle having to change
$DatabaseName in RT_SiteConfig.pm, and then reloading RT in the browser, that
should work.
If that’s not immediate enough, you could set up a second RT instance with a
different $DatabaseName, plus a virtual host so you can access this
rt_archive instance ‘immediately’ - on a different web address.
Cerion

Your solution don’t solve main problem: perfomance of main server
because all tickets still there.

Ritu, there is no simple way to do what you want. Yes, you should use
another RT instance for archive, but also should write scripts that move
old tickets to archive and don’t break anything.

	Best regards. Ruslan.

Cerion_Armour-Brown1 · March 1, 2004, 1:22pm

Cerion Armour-Brown wrote:

Hello All,

What is the best way to archive data from RT (using Postgres) say on
a yearly basis. However, this data should be available at any point of
time of read/search if required.

Say, for e.g. -
If I have 5 years data in RT, which I do not require everyday, I would
like to archive it yearly so that it does not slow the performance of my
system. However, at the same time, if I have a request to look for some
information in the archived data, I should be able to look at it
immediately.

Do the database archiving by hand/script… just setup an identical rt3
database, dump the databases, do diffs and add new records to
rt3_archive… or something to that effect.
As for access, depends how ‘immediate’… if you can handle having to
change $DatabaseName in RT_SiteConfig.pm, and then reloading RT in the
browser, that should work.
If that’s not immediate enough, you could set up a second RT instance
with a different $DatabaseName, plus a virtual host so you can access
this rt_archive instance ‘immediately’ - on a different web address.
Cerion

Your solution don’t solve main problem: perfomance of main server
because all tickets still there.

Ritu, there is no simple way to do what you want. Yes, you should use
another RT instance for archive, but also should write scripts that move
old tickets to archive and don’t break anything.
  Best regards. Ruslan.

I am pretty new to all this, but tell me how does taking the records from rt3
and inserting them into an rt3_archive database not solve the performance
problem? What do you mean the tickets are still there? - they’d be in a
different database that would only be loaded if accessed…
I guess if you’re worried about performance while both are being accessed at
the same time, that’d have to be solved by putting the 2 instances on
different machines… or are you talking of something else?
Cerion

Ruslan_U_Zakirov · March 1, 2004, 1:35pm

Cerion Armour-Brown wrote:> On Monday 01 March 2004 14:08, Ruslan U. Zakirov wrote:

Cerion Armour-Brown wrote:

On Monday 01 March 2004 12:53, Ritu Khetan wrote:

Hello All,

What is the best way to archive data from RT (using Postgres) say on
a yearly basis. However, this data should be available at any point of
time of read/search if required.

Say, for e.g. -
If I have 5 years data in RT, which I do not require everyday, I would
like to archive it yearly so that it does not slow the performance of my
system. However, at the same time, if I have a request to look for some
information in the archived data, I should be able to look at it
immediately.

Do the database archiving by hand/script… just setup an identical rt3
database, dump the databases, do diffs and add new records to
rt3_archive… or something to that effect.
As for access, depends how ‘immediate’… if you can handle having to
change $DatabaseName in RT_SiteConfig.pm, and then reloading RT in the
browser, that should work.
If that’s not immediate enough, you could set up a second RT instance
with a different $DatabaseName, plus a virtual host so you can access
this rt_archive instance ‘immediately’ - on a different web address.
Cerion

Your solution don’t solve main problem: perfomance of main server
because all tickets still there.

Ritu, there is no simple way to do what you want. Yes, you should use
another RT instance for archive, but also should write scripts that move
old tickets to archive and don’t break anything.
  Best regards. Ruslan.
I am pretty new to all this, but tell me how does taking the records from rt3
and inserting them into an rt3_archive database not solve the performance
problem? What do you mean the tickets are still there? - they’d be in a
different database that would only be loaded if accessed…
I guess if you’re worried about performance while both are being accessed at
the same time, that’d have to be solved by putting the 2 instances on
different machines… or are you talking of something else?
Cerion

Think I said move not copy.

Main perfomance problem is DB size even indexes don’t help a lot.

Now, RT don’t allow any deletes via API, but API is only thing which
give you backward compatibility while release updates(upgrades). So if
you are going to use direct DB manipulations to move data from main
server archive you’re at wrong direction, DB could be changed in future
almost unpredictable.

Another problem with lazy scripts is relationships between DB tables and
RT Instances. For eg: ticket which should be moved has link to another
one which stay in main RT instance. What to do with Link?

	Best regards. Ruslan.

Cerion_Armour-Brown1 · March 1, 2004, 1:47pm

Cerion Armour-Brown wrote:

Hello All,

What is the best way to archive data from RT (using Postgres) say on
a yearly basis. However, this data should be available at any point of
time of read/search if required.

Say, for e.g. -
If I have 5 years data in RT, which I do not require everyday, I would
like to archive it yearly so that it does not slow the performance of
my system. However, at the same time, if I have a request to look for
some information in the archived data, I should be able to look at it
immediately.

Do the database archiving by hand/script… just setup an identical rt3
database, dump the databases, do diffs and add new records to
rt3_archive… or something to that effect.
As for access, depends how ‘immediate’… if you can handle having to
change $DatabaseName in RT_SiteConfig.pm, and then reloading RT in the
browser, that should work.
If that’s not immediate enough, you could set up a second RT instance
with a different $DatabaseName, plus a virtual host so you can access
this rt_archive instance ‘immediately’ - on a different web address.
Cerion

Your solution don’t solve main problem: perfomance of main server
because all tickets still there.

Ritu, there is no simple way to do what you want. Yes, you should use
another RT instance for archive, but also should write scripts that move
old tickets to archive and don’t break anything.
  Best regards. Ruslan.
I am pretty new to all this, but tell me how does taking the records from
rt3 and inserting them into an rt3_archive database not solve the
performance problem? What do you mean the tickets are still there? -
they’d be in a different database that would only be loaded if
accessed…
I guess if you’re worried about performance while both are being accessed
at the same time, that’d have to be solved by putting the 2 instances on
different machines… or are you talking of something else?
Cerion
Think I said move not copy.
I meant this too - I assumed this is what ‘archiving’ meant.

Main perfomance problem is DB size even indexes don’t help a lot.

Now, RT don’t allow any deletes via API, but API is only thing which
give you backward compatibility while release updates(upgrades). So if
you are going to use direct DB manipulations to move data from main
server archive you’re at wrong direction, DB could be changed in future
almost unpredictable.
Good point.

Another problem with lazy scripts is relationships between DB tables and
RT Instances. For eg: ticket which should be moved has link to another
one which stay in main RT instance. What to do with Link?
Ok, yep, I consider myself educated
Cerion

Les_Mikesell · March 1, 2004, 1:54pm

Think I said move not copy.

Main perfomance problem is DB size even indexes don’t help a lot.

I don’t think it is so much the size as the number of items
it must attempt to join to construct the views.

Now, RT don’t allow any deletes via API, but API is only thing which
give you backward compatibility while release updates(upgrades). So if
you are going to use direct DB manipulations to move data from main
server archive you’re at wrong direction, DB could be changed in future
almost unpredictable.

Another problem with lazy scripts is relationships between DB tables and
RT Instances. For eg: ticket which should be moved has link to another
one which stay in main RT instance. What to do with Link?

This is a big problem for me. I’m still using RT2 because of the
speed issues in RT3 (which may be already fixed - I haven’t tried
again for a while). However at about 20,000 tickets and nearly
as many users because these come in from public email, mysql
is starting to need temp files for the joins in searches and
frequently locks up. Does anyone have tuning hints for my.cnf?

I think the only thing I can do is start over with a new RT3
perhaps on a different machine and only import a small number
of tickets. However, I think the real problem is the size
of the user table and I’d like to drop all the auto-created
users that don’t have tickets after the move.

Les Mikesell
les@futuresource.com

Ruslan_U_Zakirov · March 1, 2004, 2:06pm

Les Mikesell wrote:

Think I said move not copy.

Main perfomance problem is DB size even indexes don’t help a lot.

I don’t think it is so much the size as the number of items
it must attempt to join to construct the views.

Yes, you’re right.

Now, RT don’t allow any deletes via API, but API is only thing which
give you backward compatibility while release updates(upgrades). So if
you are going to use direct DB manipulations to move data from main
server archive you’re at wrong direction, DB could be changed in future
almost unpredictable.

Another problem with lazy scripts is relationships between DB tables and
RT Instances. For eg: ticket which should be moved has link to another
one which stay in main RT instance. What to do with Link?

This is a big problem for me. I’m still using RT2 because of the
speed issues in RT3 (which may be already fixed - I haven’t tried
again for a while). However at about 20,000 tickets and nearly
as many users because these come in from public email, mysql
is starting to need temp files for the joins in searches and
frequently locks up. Does anyone have tuning hints for my.cnf?

I think the only thing I can do is start over with a new RT3
perhaps on a different machine and only import a small number
of tickets. However, I think the real problem is the size
of the user table and I’d like to drop all the auto-created
users that don’t have tickets after the move.

Since 3.0.4 lelease perfomance was big point of improvements and still
work is going. You could give a try to 3.0.9.

Yes, many user records decrease perfomance a lot, really problem for
public RT servers, but still there is no one script that wipeout ‘dummy
users’ who isn’t requestor(ticket was deleted, spammers).

	Good luck. Ruslan.