Migration from MySQL to PostgreSQL - Pg datatypes for blobs?

Dear Users and Developers,
I’m facing a migration four instances RT 3.2.2 to RT 3.8.8 with a number
of customizations.
Back-end database is MySQL and I have prepared a script for loading
database dumps from old MySQL 3.23.58 to 5.0.51 (Debian Lenny)
and then upgrading RT schema (rt-setup-database).
A next script will migrate from MySQL to PostgreSQL.
The PostgreSQL database cluster is initialized with cs_CZ.UTF-8 locale.
A problem arises with a column Attachments.Content with a pg type text.

DBD::Pg::st execute failed: ERROR: invalid byte sequence for encoding “UTF8”: 0xed2066
HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by “client_encoding”…

I think the correct data-type should by bytea, but this data type has
a bit strange behavior. Nevertheless a data copy ends with success after
data-type change:

alter table attachments drop column content;
alter table attachments add column content bytea;

The problem with bytea is (at least in my script), that values needs
extra handling. Binding values to parameters of type bytea must by done
according manual page of DBD::Pg… (RT already uses bytea in the tables
session).

Other fields: ObjectCustomFieldValues.LargeContent, Attributes.Content?
Any suggestions?

Regards
Zito

rt-mysql2pg (6.6 KB)

Dear Users and Developers,
I’m facing a migration four instances RT 3.2.2 to RT 3.8.8 with a number
of customizations.
Back-end database is MySQL and I have prepared a script for loading
database dumps from old MySQL 3.23.58 to 5.0.51 (Debian Lenny)
and then upgrading RT schema (rt-setup-database).
A next script will migrate from MySQL to PostgreSQL.
The PostgreSQL database cluster is initialized with cs_CZ.UTF-8 locale.
A problem arises with a column Attachments.Content with a pg type text.

With Pg, this content is handled specifically, see _EncodeLOB in
lib/RT/Record.pm.

(see also mysql2Pg page on RTwiki).

also, are-you sure that you’re mysql DB is using utf-8. If not, you
should first convert it or convert on the fly in your script.

Hi Emmanuel,
thanks for your reply.

With Pg, this content is handled specifically, see _EncodeLOB in
lib/RT/Record.pm.

(see also mysql2Pg page on RTwiki).

I saw Base64 encoding in script on wiki page and also elsewhere, but
considered that as workaround to store binary data into Pg text
type-field.

Are you sure, that _EncodeLOB is used only for Pg?
Running grep in lib/RT (git 3.8-trunk):

zito@bobek:/data/soft/rt/rt/lib/RT$ find . -name I18N -prune -o -type f -print |xargs fgrep Pg
./Handle.pm: elsif ( $db_type eq ‘Pg’ ) {
./Handle.pm: elsif ( $db_type eq ‘Pg’ ) {
./Interface/Web/Session.pm: Pg => ‘Apache::Session::Postgres’,
./Interface/Web/Handler.pm: unless ( RT->Config->Get(‘DatabaseType’) =~ /(?:mysql|Pg)/ ) {
./Report/Tickets.pm: elsif ( $db_type eq ‘Pg’ ) {
./Report/Tickets.pm: # Pg 8.3 requires explicit casting
./Report/Tickets.pm: $func .= ‘::text’ if $db_type eq ‘Pg’;
./Installer.pm: } qw/mysql Pg SQLite Oracle/
./Installer.pm: Pg => ‘PostgreSQL’, #loc
./Test.pm: # Pg doesn’t like if you issue a DROP DATABASE while still connected

I’m not sure.

I think, that Pg cluster initialized to ASCII can handle binary data in
the text data-type, but initialized to UTF-8 not. The correct solution
should be to change types for fields holding binary data to data-type
bytea. Maybe I’m simply not yet understanding the whole thing.

also, are-you sure that you’re mysql DB is using utf-8. If not, you
should first convert it or convert on the fly in your script.

I hope yes. National characters are displayed fine in the web interface
and mails. The Content field causing problem is MySQL LONGBLOB type. It
can contain binary data.

BTW: I did with old data following in bash script:

make_database()
{
local pref=“$1”; shift
local db=“$1”; shift
local user=“$1”; shift
local pass=“$1”; shift

mysql -e "CREATE DATABASE $db CHARACTER SET binary;"
ssh zito@aslan.i.cz cat $aslan_db_dir/$db.sql.gz | zcat | mysql --default-character-set=binary $db
mysql -e "GRANT ALL ON $db.* TO '$user'@'$RT_DB_HOST' IDENTIFIED BY '$pass';" $db
export RT_DB_NAME="$db"
export RT_DB_USER="$user"
export RT_DB_PASSWORD="$pass"

mysql $db < unaccent.sql
mysql $db < update-duplic-emailaddress.sql
mysql $db < update-duplic-name.sql

{ echo 3.2.2; echo 3.7.87; echo y; } \
    | $RT_SETUP_DATABASE --datadir $SCHEMA_PATH --action=upgrade

$UPGRADE_MYSQL_SCHEMA $db >/tmp/queries
mysql $db </tmp/queries

{ echo 3.7.87; echo; echo y; } \
    | $RT_SETUP_DATABASE --datadir $SCHEMA_PATH --action=upgrade

mysql $db < emailaddress-testing.sql

mig_$db $pref

}

Regards
Zito

Hi Emmanuel,
thanks for your reply.

With Pg, this content is handled specifically, see _EncodeLOB in
lib/RT/Record.pm.

(see also mysql2Pg page on RTwiki).

I saw Base64 encoding in script on wiki page and also elsewhere, but
considered that as workaround to store binary data into Pg text
type-field.

Are you sure, that _EncodeLOB is used only for Pg?

Oh yes, I was blind that RT uses DBIx::SearchBuilder. There is
a subroutine BinarySafeBLOBs defined in
DBIx::SearchBuilder::Handle::Pg.pm and returns undef. That is - Pg is
considered not capable handling binary in safe manner.

Every value must be converted using base64 :(.

Regards
Zito

Oh yes, I was blind that RT uses DBIx::SearchBuilder. There is
a subroutine BinarySafeBLOBs defined in
DBIx::SearchBuilder::Handle::Pg.pm and returns undef. That is - Pg is
considered not capable handling binary in safe manner.

Every value must be converted using base64 :(.

yes :frowning:

Maybe there is a better way as of recent postgres?

AW, I did a migration from mysql to postgres 8.3 with base64 conversion
and everything is ok.

Oh yes, I was blind that RT uses DBIx::SearchBuilder. There is
a subroutine BinarySafeBLOBs defined in
DBIx::SearchBuilder::Handle::Pg.pm and returns undef. That is - Pg is
considered not capable handling binary in safe manner.

Every value must be converted using base64 :(.

yes :frowning:

Maybe there is a better way as of recent postgres?

PostgreSQL can handle blob, but special handling is needed during
bind_param. Alternatively the quoting of value may be used.
This is the output from attached test script:

zito@bobek:~/pokusy/devel/perl/dbi$ ./pg-blob

synthetized binary value:0504030201000102030405

WITHOUT bind_param type spec…
data from database:0504030201

WITH bind_param type spec…
data from database:0504030201000102030405

WITH quoted value…
quoted value: E’\005\004\003\002\001\000\001\002\003\004\005’
data from database:0504030201000102030405

AW, I did a migration from mysql to postgres 8.3 with base64 conversion
and everything is ok.

Fine. I’m now testing added base64 encoding.

Regards
Zito

pg-blob (1.25 KB)

Maybe there is a better way as of recent postgres?

FYI: fwd

Zito