Migrating from Postgres to MySQL

Well, basically, Postgres is seemingly a bucket of fail specifically
for the RT system I maintain. We run RT/Postgres on a Ubuntu Hardy
LTS virtual machine running 2 virtual CPUs and the KVM maximum of 2GB
RAM, but Postgres ends up using so much in resources that it actually
kills the entire KVM system on the virtual host. I know tuning of
Postgres can fix this, but we’ve experienced issues with Postgres on
many other virtual machines, leading us to believe that it may simply
be an issue with Postgres not getting along well with KVM (which is a
somewhat known issue). Ideally, I want to migrate my RT database from
Postgres to MySQL, and preserve ALL data (tickets, attachments, links,
everything). Is there an “guaranteed to work” method to do this that
anyone knows of? Thanks!

Specs:
Postgres: 8.3
RT: 3.6.5
MySQL candidate on Hardy: 5.0.51a-3ubuntu5.4

Any help would be greatly appreciated. Thanks!

~ Kage
http://vitund.com
http://hackthissite.org

Well, basically, Postgres is seemingly a bucket of fail specifically
for the RT system I maintain. We run RT/Postgres on a Ubuntu Hardy
LTS virtual machine running 2 virtual CPUs and the KVM maximum of 2GB
RAM, but Postgres ends up using so much in resources that it actually
kills the entire KVM system on the virtual host. I know tuning of
Postgres can fix this, but we’ve experienced issues with Postgres on
many other virtual machines, leading us to believe that it may simply
be an issue with Postgres not getting along well with KVM (which is a
somewhat known issue). Ideally, I want to migrate my RT database from
Postgres to MySQL, and preserve ALL data (tickets, attachments, links,
everything). Is there an “guaranteed to work” method to do this that
anyone knows of? Thanks!

Specs:
Postgres: 8.3
RT: 3.6.5
MySQL candidate on Hardy: 5.0.51a-3ubuntu5.4

Any help would be greatly appreciated. Thanks!


~ Kage
http://vitund.com
http://hackthissite.org

Hi Kage,

I have had trouble with VMs with any I/O bound system. Have you
tested your system with MySQL? You may just be swapping one I/O
problem for another. That being said, for anything but the most
trivial of DBs, tuning your backend DB be it PostgreSQL, MySQL,
Oracle,… is critical for getting good performance.

Another data point, I have had a simple repeated file read of a
DNS zone file for an XFER tank a VM. VMs work well for CPU intensive
tasks with small amount of I/O. For any larger amount of I/O, using
real storage is a must. Good luck with whichever backend you choose.
A big plus for both the Oracle and PostgreSQL backend is the
availability of full-text indexing which allows content and attachment
searches to use an index and not take your DB with a full table scan.

Regards,
Ken

(didn’t do a reply-all, sorry if you get this twice, Ken)

What, if any, performance gains exist by using an NFS mountpoint for
the Postgres database within a virtual machine? In this case, I/O is
not directly placed on the virtual disk (unless the actual “act” of
I/O itself, regardless of what is being acted upon (virtual disk or
not), is also an issue). Could this potentially yield any performance
increase?

Also, no, I have not tried MySQL yet with our RT setup. I’d like to
move our database to MySQL to do some testing with that and gather
realistic metrics of “here’s our data in Postgres vs. MySQL”.On Tue, Jul 28, 2009 at 3:51 PM, Kenneth Marshallktm@rice.edu wrote:

On Tue, Jul 28, 2009 at 03:35:12PM -0400, Kage wrote:

Well, basically, Postgres is seemingly a bucket of fail specifically
for the RT system I maintain. We run RT/Postgres on a Ubuntu Hardy
LTS virtual machine running 2 virtual CPUs and the KVM maximum of 2GB
RAM, but Postgres ends up using so much in resources that it actually
kills the entire KVM system on the virtual host. I know tuning of
Postgres can fix this, but we’ve experienced issues with Postgres on
many other virtual machines, leading us to believe that it may simply
be an issue with Postgres not getting along well with KVM (which is a
somewhat known issue). Ideally, I want to migrate my RT database from
Postgres to MySQL, and preserve ALL data (tickets, attachments, links,
everything). Is there an “guaranteed to work” method to do this that
anyone knows of? Thanks!

Specs:
Postgres: 8.3
RT: 3.6.5
MySQL candidate on Hardy: 5.0.51a-3ubuntu5.4

Any help would be greatly appreciated. Thanks!


~ Kage
http://vitund.com
http://hackthissite.org

Hi Kage,

I have had trouble with VMs with any I/O bound system. Have you
tested your system with MySQL? You may just be swapping one I/O
problem for another. That being said, for anything but the most
trivial of DBs, tuning your backend DB be it PostgreSQL, MySQL,
Oracle,… is critical for getting good performance.

Another data point, I have had a simple repeated file read of a
DNS zone file for an XFER tank a VM. VMs work well for CPU intensive
tasks with small amount of I/O. For any larger amount of I/O, using
real storage is a must. Good luck with whichever backend you choose.
A big plus for both the Oracle and PostgreSQL backend is the
availability of full-text indexing which allows content and attachment
searches to use an index and not take your DB with a full table scan.

Regards,
Ken

~ Kage
http://vitund.com

Kage,

The main advantage is gained by avoiding I/O through the virtual
disk. The layout of the virtual disk tends to turn most I/O into
random I/O, even I/O that starts as sequential. The factor of
10 performance difference between random/sequential I/O causes
the majority of the performance problem. I have not had personal
experience with using an NFS mount point to run a database so I
cannot really comment on that. Good luck with your evaluation.

Cheers,
KenOn Tue, Jul 28, 2009 at 04:12:55PM -0400, Kage wrote:

(didn’t do a reply-all, sorry if you get this twice, Ken)

What, if any, performance gains exist by using an NFS mountpoint for
the Postgres database within a virtual machine? ?In this case, I/O is
not directly placed on the virtual disk (unless the actual “act” of
I/O itself, regardless of what is being acted upon (virtual disk or
not), is also an issue). ?Could this potentially yield any performance
increase?

Also, no, I have not tried MySQL yet with our RT setup. ?I’d like to
move our database to MySQL to do some testing with that and gather
realistic metrics of “here’s our data in Postgres vs. MySQL”.

On Tue, Jul 28, 2009 at 3:51 PM, Kenneth Marshallktm@rice.edu wrote:

On Tue, Jul 28, 2009 at 03:35:12PM -0400, Kage wrote:

Well, basically, Postgres is seemingly a bucket of fail specifically
for the RT system I maintain. ?We run RT/Postgres on a Ubuntu Hardy
LTS virtual machine running 2 virtual CPUs and the KVM maximum of 2GB
RAM, but Postgres ends up using so much in resources that it actually
kills the entire KVM system on the virtual host. ?I know tuning of
Postgres can fix this, but we’ve experienced issues with Postgres on
many other virtual machines, leading us to believe that it may simply
be an issue with Postgres not getting along well with KVM (which is a
somewhat known issue). ?Ideally, I want to migrate my RT database from
Postgres to MySQL, and preserve ALL data (tickets, attachments, links,
everything). ?Is there an “guaranteed to work” method to do this that
anyone knows of? ?Thanks!

Specs:
Postgres: 8.3
RT: 3.6.5
MySQL candidate on Hardy: 5.0.51a-3ubuntu5.4

Any help would be greatly appreciated. ?Thanks!


~ Kage
http://vitund.com
http://hackthissite.org

Hi Kage,

I have had trouble with VMs with any I/O bound system. Have you
tested your system with MySQL? You may just be swapping one I/O
problem for another. That being said, for anything but the most
trivial of DBs, tuning your backend DB be it PostgreSQL, MySQL,
Oracle,… is critical for getting good performance.

Another data point, I have had a simple repeated file read of a
DNS zone file for an XFER tank a VM. VMs work well for CPU intensive
tasks with small amount of I/O. For any larger amount of I/O, using
real storage is a must. Good luck with whichever backend you choose.
A big plus for both the Oracle and PostgreSQL backend is the
availability of full-text indexing which allows content and attachment
searches to use an index and not take your DB with a full table scan.

Regards,
Ken


~ Kage
http://vitund.com
http://hackthissite.org

Kage,

The main advantage is gained by avoiding I/O through the virtual
disk. The layout of the virtual disk tends to turn most I/O into
random I/O, even I/O that starts as sequential. The factor of
10 performance difference between random/sequential I/O causes
the majority of the performance problem. I have not had personal
experience with using an NFS mount point to run a database so I
cannot really comment on that. Good luck with your evaluation.

You’re trading head-seeking latencies for network latencies, and those are
almost certainly higher. Hosting your database server binaries and such
forth in NFS is possible, though again, not optimal both from a performance
and risk standpoint (NFS server drops, your DB binaries vanish, your DB
server drops even though the machine hosting it was fine).

I think hosting databases in NFS can cause serious problems - I seem to
remember older versions of mysql wouldn’t support that. I don’t know if
newer ones do…but I do know in the very large IT environment I worked
in, all database servers hosted the DBs on their local disks or in
filesystems hosted on disks (SANS?) attached via fibre-channel.

Could solid-state drives side-step the random-access issue with
virtualization, or at least make it suck less? Based on how many people I
know who have said “Wow, my SSD died. I thought those were supposed to be
more reliable?” … I wouldn’t bet my service uptime on it. :wink:

-Rob

Kage,

The main advantage is gained by avoiding I/O through the virtual
disk. The layout of the virtual disk tends to turn most I/O into
random I/O, even I/O that starts as sequential. The factor of
10 performance difference between random/sequential I/O causes
the majority of the performance problem. I have not had personal
experience with using an NFS mount point to run a database so I
cannot really comment on that. Good luck with your evaluation.

You’re trading head-seeking latencies for network latencies,

No.

If this were a standard host environment, that would be true. But in a
virtual environment, there is the overhead of the disk create/
maintenance/update processes of the virtualization engine which
multiply the overhead of the disk. Just run a disk benchmarking
utility inside a VE running under any platform that uses disk images
(vmware, xen, parallels, etc…) and then run those same tests on the
host node. The difference in performance is often an order of
magnitude slower for the virtual disks.

Contrast that with NFS performance, which has a small fixed overhead
imposed by the network (even smaller if you use jumbo frames). If you
were using a platform with a robust NFS implementation (Solaris,
FreeBSD), I’d put money on the database performing better on NFS than
inside most virtual machines. If you’re using NFS with Linux, you will
certainly have performance issues that you won’t be able to get past.

If the virtualization environment provides raw disk access to the VE,
my bet is off. Examples of virtualization platforms that [can] do this
are FreeBSD jails and Linux OpenVZ. On several occasions, I have built
VEs for MySQL and mounted a dedicated partition in the VE. Assuming
you’ve given adequate resources to the DB VE, that works as well as a
dedicated machine.

When I arrived at my current position, the SA team had put the
databases into the VEs that needed them, along with the apps that
accessed them. Despite having 6 servers to spread the load across,
they had recurring database performance issues (a few times a week),
particularly with RT. I resolved all the DB issues by building a
dedicated machine with 4 disks (two battery backed RAID-1 mirrors) all
the databases to it. The databases have dedicated spindles as does the
OS & logging. Despite the resistance to the “all our DB eggs in one
basket approach,” the wisdom of that choice is now plainly evident.
All the performance problems went away and haven’t returned.

and those are almost certainly higher. Hosting your database server
binaries and such forth in NFS is possible, though again, not
optimal both from a performance and risk standpoint (NFS server
drops, your DB binaries vanish, your DB server drops even though the
machine hosting it was fine).

That’s not how NFS works. If the NFS server vanishes, the NFS client
hangs and waits for it to return. That is a design feature of NFS. The
consistency of the databases is entirely dependency on the disk
subsystem of the file server.

I think hosting databases in NFS can cause serious problems - I seem
to remember older versions of mysql wouldn’t support that. I don’t
know if newer ones do…but I do know in the very large IT
environment I worked in, all database servers hosted the DBs on
their local disks or in filesystems hosted on disks (SANS?) attached
via fibre-channel.

I would never host a database server on anything but RAID protected
disks with block level access (ie, local disks, iSCSI, etc). Database
engines have been explicitly designed and optimized for this type of
disk backend. That is starting to change, as a few new DB engines
that are designed for network storage (like SimpleDB). But none I know
of are production-ready.

Could solid-state drives side-step the random-access issue with
virtualization, or at least make it suck less?

Haven’t tried it yet, but my guess is no. However, I have put
databases on SSD disks with excellent results.

Based on how many people I know who have said “Wow, my SSD died. I
thought those were supposed to be more reliable?” … I wouldn’t bet
my service uptime on it. :wink:

There’s this thing called RAID, that protects against disk
failures… It works quite well with SSD disks and delivers
performance numbers for a couple thousand bucks that would otherwise
take a $150,000+ SAN.

Matt

Hi Matt,

Raid is not the end-all be-all for disk safety, especially when you step
into terabyte class computing, sorry I am taking this a bit off topic. While
RAID has it’s bonuses, there are drawbacks as well, take your standard RAID
5 setup, 4 Disks, 3 active, 1 Hot Spare. Now lets say that Disk number 2
decided it was going to release it’s smoke to the world (never a good
thing), now your array is still alive and it is starting to rebuild onto
disk 4 to make up for the death of disk 2. During the rebuild process Disk
1 comes across a bad sector, poof, your data is gone. Just a word of
warning, don’t put all your data safety eggs into the RAID basket.

Otherwise I agree that running via NFS from a virtualized server would
probably have perfromance gains over running in the virtual invironment.

Thanks,
BillOn Wed, Jul 29, 2009 at 10:26 AM, Matt Simerson matt@corp.spry.com wrote:

On Jul 29, 2009, at 7:10 AM, Robert Nesius wrote:

On Tue, Jul 28, 2009 at 3:56 PM, Kenneth Marshall ktm@rice.edu wrote:

Kage,

The main advantage is gained by avoiding I/O through the virtual
disk. The layout of the virtual disk tends to turn most I/O into
random I/O, even I/O that starts as sequential. The factor of
10 performance difference between random/sequential I/O causes
the majority of the performance problem. I have not had personal
experience with using an NFS mount point to run a database so I
cannot really comment on that. Good luck with your evaluation.

You’re trading head-seeking latencies for network latencies,

No.

If this were a standard host environment, that would be true. But in a
virtual environment, there is the overhead of the disk
create/maintenance/update processes of the virtualization engine which
multiply the overhead of the disk. Just run a disk benchmarking utility
inside a VE running under any platform that uses disk images (vmware, xen,
parallels, etc…) and then run those same tests on the host node. The
difference in performance is often an order of magnitude slower for the
virtual disks.

Contrast that with NFS performance, which has a small fixed overhead
imposed by the network (even smaller if you use jumbo frames). If you were
using a platform with a robust NFS implementation (Solaris, FreeBSD), I’d
put money on the database performing better on NFS than inside most virtual
machines. If you’re using NFS with Linux, you will certainly have
performance issues that you won’t be able to get past.

If the virtualization environment provides raw disk access to the VE, my
bet is off. Examples of virtualization platforms that [can] do this are
FreeBSD jails and Linux OpenVZ. On several occasions, I have built VEs for
MySQL and mounted a dedicated partition in the VE. Assuming you’ve given
adequate resources to the DB VE, that works as well as a dedicated machine.

When I arrived at my current position, the SA team had put the databases
into the VEs that needed them, along with the apps that accessed them.
Despite having 6 servers to spread the load across, they had recurring
database performance issues (a few times a week), particularly with RT. I
resolved all the DB issues by building a dedicated machine with 4 disks (two
battery backed RAID-1 mirrors) all the databases to it. The databases have
dedicated spindles as does the OS & logging. Despite the resistance to the
"all our DB eggs in one basket approach," the wisdom of that choice is now
plainly evident. All the performance problems went away and haven’t
returned.

and those are almost certainly higher. Hosting your database server
binaries and such forth in NFS is possible, though again, not optimal both
from a performance and risk standpoint (NFS server drops, your DB binaries
vanish, your DB server drops even though the machine hosting it was fine).

That’s not how NFS works. If the NFS server vanishes, the NFS client hangs
and waits for it to return. That is a design feature of NFS. The consistency
of the databases is entirely dependency on the disk subsystem of the file
server.

I think hosting databases in NFS can cause serious problems - I seem to
remember older versions of mysql wouldn’t support that. I don’t know if
newer ones do…but I do know in the very large IT environment I worked
in, all database servers hosted the DBs on their local disks or in
filesystems hosted on disks (SANS?) attached via fibre-channel.

I would never host a database server on anything but RAID protected disks
with block level access (ie, local disks, iSCSI, etc). Database engines have
been explicitly designed and optimized for this type of disk backend. That
is starting to change, as a few new DB engines that are designed for network
storage (like SimpleDB). But none I know of are production-ready.

Could solid-state drives side-step the random-access issue with
virtualization, or at least make it suck less?

Haven’t tried it yet, but my guess is no. However, I have put databases on
SSD disks with excellent results.

Based on how many people I know who have said “Wow, my SSD died. I thought
those were supposed to be more reliable?” … I wouldn’t bet my service
uptime on it. :wink:

There’s this thing called RAID, that protects against disk failures… It
works quite well with SSD disks and delivers performance numbers for a
couple thousand bucks that would otherwise take a $150,000+ SAN.

Matt


http://lists.bestpractical.com/cgi-bin/mailman/listinfo/rt-users

Community help: http://wiki.bestpractical.com
Commercial support: sales@bestpractical.com

Discover RT’s hidden secrets with RT Essentials from O’Reilly Media.
Buy a copy at http://rtbook.bestpractical.com

Hi Matt,

Raid is not the end-all be-all for disk safety, especially when you
step into terabyte class computing, sorry I am taking this a bit off
topic. While RAID has it’s bonuses, there are drawbacks as well,
take your standard RAID 5 setup, 4 Disks, 3 active, 1 Hot Spare.
Now lets say that Disk number 2 decided it was going to release it’s
smoke to the world (never a good thing), now your array is still
alive and it is starting to rebuild onto disk 4 to make up for the
death of disk 2. During the rebuild process Disk 1 comes across a
bad sector, poof, your data is gone. Just a word of warning, don’t
put all your data safety eggs into the RAID basket.

RAID poorly implemented is a placebo, and is often less reliable than
a single disk. RAID properly implemented IS the end-all be-all of disk
safety.

Your chosen argument is only valid against RAID level 5, which would
be a very poor choice for a database application. RAID-5 is a poor
choice in any environment where the data set is volatile and valuable.
Especially when you consider the number of hours (or days) it takes to
rebuild a hot spare into the RAID-5 set. (HINT: test that before
deployment!) During that rebuild window, your system performance is
heavily degraded and extremely vulnerable. If the performance of your
RAID system is halved, is that sufficient for your application, or is
your system effectively down during the rebuild period? (HINT: test
before deployment!).

But a RAID-1 or RAID 10 can be extremely robust, remaining online and
performing optimally during multiple catastrophic disk failures. You
can mirror the data to as many spindles as you need to insure data
integrity. When disks fail, rebuilding a mirror disk usually takes
less than an hour.

The systems engineer has to choose between data integrity,
performance, and storage efficiency.

One of my systems is about 2/3 full with 24.64 TB of data. Does that
count as “terabyte class computing?” :slight_smile: I’m using ZFS to manage
it. :slight_smile: :slight_smile:

Matt