Migration from MySQL to PostreSQL - corrupt characters (german umlaut)

Hi RT Users,

I’m testing the migration from MySQL to PostgreSQL and I’m experiencing
problems with LATIN1 characters (particularly German umlauts) after the
migration. They look like ü instead ü

Hexcode from ü → c3 bc
is encoded again to → c3 83 (Ã) and c2 bc (¼)
(http://www.utf8-zeichentabelle.de/)

First I’ve upgraded RT from 3.8 to 4.2 with mysql db (utf8) and
everything went smoothly.

New server with fresh installed RT, MySQL, Postgresql from package manager

System (Debian Jessie)
request-tracker: 4.2.8-3+deb8u1
mysql-server: 5.5.44-0+deb8u1
postresql: 9.4.4-0+deb8u1
apache: 2.4.10-10+deb8u3
php: 5.6.13+dfsg-0+deb8u1

Now I followed this tutorial to migrate the DB
http://requesttracker.wikia.com/wiki/MigrateMysql2PostgresqlV4

The binary files, generated by rt-serializer --clone, include characters
like öäü so I suppose that there is something going wrong while
importing to Pg.

‘rt-setup-database’ creates the Pg DB as UTF8.

postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
postgres | postgres | SQL_ASCII | C | C |
rt4 | rt_user | UTF8 | C | C |
template0 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | |
postgres=CTc/postgres
template1 | postgres | SQL_ASCII | C | C | =c/postgres +
| | | | |
postgres=CTc/postgres

postgres=# show server_encoding;
server_encoding
SQL_ASCII
(1 row)

postgres=# show client_encoding;
client_encoding
UTF8
(1 row)

rt4=# select subject from tickets where id=82527;
subject
Hardware prüfen
(1 row)

Newly created tickets with umlauts are displayed correctly only the
imported are corrupt!

Switching the client_encoding via “set client_encoding=‘LATIN1’;” or
directly in /etc/postgresql/9.4/main/postgresql.conf
client_encoding = latin1 #(default would be the database encoding utf8)

rt4=# set client_encoding=‘LATIN1’;
SET
rt4=# select subject from tickets where id=82527;
subject
Hardware prüfen
(1 row)

Can someone help me to migrate the db with a full utf8 setup?

Best,
Maik

smime.p7s (5.29 KB)

Hexcode from ü → c3 bc
is encoded again to → c3 83 (Ã) and c2 bc (¼)
(http://www.utf8-zeichentabelle.de/)

First I’ve upgraded RT from 3.8 to 4.2 with mysql db (utf8) and
everything went smoothly.

Can you confirm that the umlaut can be sucessfully added to the database
after the upgrade but before the postgres migration?

I’m trying to determine if the issue happened during the upgrade or
because of rt-serializer.

RT 4.2.8 and MySQL: old and new tickets looks perfectly, also the
serialized data seems to be ok.

With locale LANG=en_US.UTF-8

^^test ticket with umlaut ü^@^@^@^GSubject

locale LANG=POSIX

^^test ticket with umlaut ^@^@^@^GSubject

Maybe rt-importer or perl opens the binary files with wrong locales and
while writing to the utf8 Pg DB all characters () are being
re-encoded?

I’m now testing to serialize mysql and import to mysql again.

smime.p7s (5.29 KB)

I’m now testing to serialize mysql and import to mysql again.

This is working, so it’s a problem with Pg

smime.p7s (5.29 KB)

I have the same problem. I want to migrate my RT 4.2.12 mysql-database to
postgres. Did you find an solution for this problem?

I noticed that the problem with the german umlauts is not in all fields. In
my history for example the german umlauts displayed correctly but in the
ticket subject they didn’t.On Wed, Sep 23, 2015 at 5:34 PM, Frank Wissink frank.wissink@gmail.com wrote:

Did find any solution for this problem? I have the same problem.

Am 21.09.2015 um 16:22 schrieb Maik Nergert maik.nergert@uni-hamburg.de:

I’m now testing to serialize mysql and import to mysql again.

This is working, so it’s a problem with Pg

Hi,

you are right.
History is not affected but Subject, User Real Name, Signature stored in
User Preferences, Description of Scrips, Templates, Groups and Queues,
Template Content, Article Name

smime.p7s (5.29 KB)

I found an older post on the PostgreSQL mailinglist with same problems

And I was able to fix the issue by dumping and re-importing the db!

Create empty PG-DB

rt-setup-database --action create,schema,acl --dba rt_user
–dba-password DBPASS

Run importer

rt-importer PATH

Run Update-Pg-Sequences Script

(http://requesttracker.wikia.com/wiki/MigrateMysql2PostgresqlV4#Import_your_RT_database)

Dump DB as latin1

pg_dump -E LATIN1 rt4 > rt4latin.sql

Replace client_encoding

sed -i ‘s/LATIN1/UTF8/’ rt4latin.sql

Recreate empty DB

rt-setup-database --action drop --dba rt_user --dba-password DBPASS
rt-setup-database --action create,schema,acl --dba rt_user
–dba-password DBPASS

Import DB

psql rt4 < rt4latin.sql

smime.p7s (5.29 KB)

1 Like

I can confirm that this is still an issue on rt4.4.2 when moving from mysql to postgres.

But the solution above posted by Maik_Nergert works eave though it is a little bit crude where the dump and reimport consumes some 8 hours on our system. It would be nice if those hours could be cut by doing the right ting from the start!