Accent problem in tickets submission by mail

[using RT 2.1.75 with Posgres 7.3.2 on Debian sid]

When submitting a ticket through the mail gateway and the subject
contains accented characters (eg: r�ponse) here what is entered in the
tickets table: ‘répons e’.

OENONE: Vous la voyez, madame, et pr�te � vous cacher,
        Vous ha�ssez le jour que vous veniez chercher ?
                                      (Ph�dre, J-B Racine, acte 1, sc�ne 3)

RT3 works natively in UTF8 format. All incoming mail (text and subject, but
not the attachement) is converted from its original code (yours is probably
iso-8859-1) to utf8 which is a one or two bytes format (one for the usual
128 ascii caracters, then 2 for most others) that allows representation of
all latin caracters, including central europe, cyrilic caracters, hebraic
caracters, and maybe others (asian?) I am not aware of without any
convertion. What you see in the database is indeed the same word (réponse),
but whith a different representation. If you use a utf8 compliant text
editor, you’ll see the right symbols.
Your navigator and you mailer are able to translate this back if they are
reasonably recent.
This is how RT is now able to “speak” almost any language, provided its
writtent representation still goes from left to right and top to bottom.

Blaise

-----Message d’origine-----De : Louis-David Mitterrand [mailto:vindex@apartia.org]
Envoyé : vendredi 21 février 2003 15:39
À : rt-devel@lists.fsck.com
Objet : [rt-devel] accent problem in tickets submission by mail

[using RT 2.1.75 with Posgres 7.3.2 on Debian sid]

When submitting a ticket through the mail gateway and the subject
contains accented characters (eg: réponse) here what is entered in the
tickets table: ‘répons e’.

OENONE: Vous la voyez, madame, et prête à vous cacher,
        Vous haïssez le jour que vous veniez chercher ?
                                      (Phèdre, J-B Racine, acte 1, scène

rt-devel mailing list
rt-devel@lists.fsck.com
http://lists.fsck.com/mailman/listinfo/rt-devel

RT3 works natively in UTF8 format. All incoming mail (text and
subject, but not the attachement) is converted from its original code
(yours is probably iso-8859-1) to utf8 which is a one or two bytes
format (one for the usual 128 ascii caracters, then 2 for most others)
that allows representation of all latin caracters, including central
europe, cyrilic caracters, hebraic caracters, and maybe others
(asian?) I am not aware of without any convertion. What you see in the
database is indeed the same word (r�ponse), but whith a different
representation. If you use a utf8 compliant text editor, you’ll see
the right symbols.

What I see in the database is ‘répons e’ (note the spurious space
between s and e) where the actuel subject of the e-mail is “r�ponse”.
However if I send a subject of “reponse” (sans accent) the database
entry does not have the extra space. This happens every time and is
easily reproducible.

Your navigator and you mailer are able to translate this back if they
are reasonably recent.

The browser can see the accent fine, but displays the spurious space as
well.

Thanks for your explanation and help, cheers,

ldm@apartia.org

Incidentally, what did you tell postgres your “standard” encoding was
when installing it?On Fri, Feb 21, 2003 at 05:18:29PM +0100, ‘Louis-David Mitterrand’ wrote:

On Fri, Feb 21, 2003 at 05:05:03PM +0100, THAUVIN Blaise (Dir. Informatique) wrote:

RT3 works natively in UTF8 format. All incoming mail (text and
subject, but not the attachement) is converted from its original code
(yours is probably iso-8859-1) to utf8 which is a one or two bytes
format (one for the usual 128 ascii caracters, then 2 for most others)
that allows representation of all latin caracters, including central
europe, cyrilic caracters, hebraic caracters, and maybe others
(asian?) I am not aware of without any convertion. What you see in the
database is indeed the same word (r�ponse), but whith a different
representation. If you use a utf8 compliant text editor, you’ll see
the right symbols.

What I see in the database is ‘répons e’ (note the spurious space
between s and e) where the actuel subject of the e-mail is “r�ponse”.
However if I send a subject of “reponse” (sans accent) the database
entry does not have the extra space. This happens every time and is
easily reproducible.

Your navigator and you mailer are able to translate this back if they
are reasonably recent.

The browser can see the accent fine, but displays the spurious space as
well.

Thanks for your explanation and help, cheers,


ldm@apartia.org


rt-devel mailing list
rt-devel@lists.fsck.com
http://lists.fsck.com/mailman/listinfo/rt-devel

http://www.bestpractical.com/rt – Trouble Ticketing. Free.

Incidentally, what did you tell postgres your “standard” encoding was
when installing it?

I selected SQL_ASCII but I now realize that I probably should have
selected UNICODE. It seemed to me that SQL_ASCII would provide better
performance in sorting and index management and that the overhead of
unicode was not called for when storing only latin1 characters.

Does RT requires unicode?> On Fri, Feb 21, 2003 at 05:18:29PM +0100, ‘Louis-David Mitterrand’ wrote:

On Fri, Feb 21, 2003 at 05:05:03PM +0100, THAUVIN Blaise (Dir. Informatique) wrote:

RT3 works natively in UTF8 format. All incoming mail (text and
subject, but not the attachement) is converted from its original code
(yours is probably iso-8859-1) to utf8 which is a one or two bytes
format (one for the usual 128 ascii caracters, then 2 for most others)
that allows representation of all latin caracters, including central
europe, cyrilic caracters, hebraic caracters, and maybe others
(asian?) I am not aware of without any convertion. What you see in the
database is indeed the same word (r�ponse), but whith a different
representation. If you use a utf8 compliant text editor, you’ll see
the right symbols.

What I see in the database is ‘répons e’ (note the spurious space
between s and e) where the actuel subject of the e-mail is “r�ponse”.
However if I send a subject of “reponse” (sans accent) the database
entry does not have the extra space. This happens every time and is
easily reproducible.

Your navigator and you mailer are able to translate this back if they
are reasonably recent.

The browser can see the accent fine, but displays the spurious space as
well.

Thanks for your explanation and help, cheers,


ldm@apartia.org


rt-devel mailing list
rt-devel@lists.fsck.com
http://lists.fsck.com/mailman/listinfo/rt-devel


http://www.bestpractical.com/rt – Trouble Ticketing. Free.

THERAMENE: Elle vous cherche.
HIPPOLYTE: Moi ?
(Ph�dre, J-B Racine, acte 2, sc�ne 3)

As you’ve seen in your database, the French accented caracters are not
stored “as is” in the database. Latin1 extensions are not part of standard
ASCII and are all 2 bytes caracters in unicode. In term of sorting for
example, it may be faster with your setting, but the resulting order will be
rather strange when accented caracters come on the way. Also, when using
database functions for counting caracters in a string, you’ll get different
results (with unicode, number of caracters (bytes) in a string is between
one time and twice the number of symbols). This is a problem with setting
the size for each field (by the way, is that OK in RT? Or can we overrun
database fields by filling with double-byte only caracters an input field in
RT’s GUI?).

I can’t explain why you got the spurius space, but it seems to me it is
always better to use the right encoding in the database when you can.

Blaise

-----Message d’origine-----De : Louis-David Mitterrand [mailto:vindex@apartia.org]
Envoyé : lundi 24 février 2003 09:45
À : rt-devel@lists.fsck.com
Objet : Re: [rt-devel] Re: accent problem in tickets submission by mail

Incidentally, what did you tell postgres your “standard” encoding was
when installing it?

I selected SQL_ASCII but I now realize that I probably should have
selected UNICODE. It seemed to me that SQL_ASCII would provide better
performance in sorting and index management and that the overhead of
unicode was not called for when storing only latin1 characters.

Does RT requires unicode?

As you’ve seen in your database, the French accented caracters are not
stored “as is” in the database. Latin1 extensions are not part of standard

Our main SQL_ASCII database is full of accented characters from the
latin1 subset, whithout any problems. AFAIK they fit into the 1 byte
address space (255 chars). What does unicode bring to the table if we
don’t plan to expand beyond western european languages? (sorry for the
off-topic question)

ASCII and are all 2 bytes caracters in unicode. In term of sorting for
example, it may be faster with your setting, but the resulting order will be
rather strange when accented caracters come on the way. Also, when using
database functions for counting caracters in a string, you’ll get different
results (with unicode, number of caracters (bytes) in a string is between
one time and twice the number of symbols). This is a problem with setting
the size for each field (by the way, is that OK in RT? Or can we overrun
database fields by filling with double-byte only caracters an input field in
RT’s GUI?).

I can’t explain why you got the spurius space, but it seems to me it is
always better to use the right encoding in the database when you can.

Agreed. I just want to make sure RT requires UNICODE in its Postgres
database to properly operate and better understand the issues. It’s
possible to maintain different encodings in each database on a single
Postgres installation, so it’s not a problem for me to convert.

In any case thanks for your insight, cheers,

OENONE: Quoi ! vous ne perdrez point cette cruelle envie ?
        Vous verrai-je toujours, renon�ant � la vie,
        Faire de votre mort les funestes appr�ts ?
                                      (Ph�dre, J-B Racine, acte 1, sc�ne 3)

There are two answers :

  • Using Unicode makes it possible for RT to use AND MIX IN THE SAME INSTANCE
    most of the caracters used around the globe with exactly the same code. The
    web UI serves pages coded in unicode too, so there is no extra translation
    done between the database and RT’s code. The internals of RT3 are all
    unicode. Translation is done once and for all when receiving mails. This
    makes it possible to have French and Czech caracters simultaneously for
    example. It allows to use RT in an international organisation much more
    easilly. RT2 would display strange caracters when viewed with foreign
    browser which would default to another code page.

  • In your case, if all of your mails and all of your browsers default to
    latin1, I guess all this brings nothing new, except that it corrects some
    funny behaviour in rare cases (my RT2 instance and the softwares around it
    (MTA, Apache…) did not appreciate accented caracters in subjects and
    peoples names. I never tried to track this down, but it is now solved with
    RT3)

Blaise

-----Message d’origine-----De : Louis-David Mitterrand [mailto:vindex@apartia.org]
Envoyé : lundi 24 février 2003 15:12
À : THAUVIN Blaise (Dir. Informatique)
Cc : rt-devel@lists.fsck.com
Objet : Re: [rt-devel] Re: accent problem in tickets submission by mail

Our main SQL_ASCII database is full of accented characters from the
latin1 subset, whithout any problems. AFAIK they fit into the 1 byte
address space (255 chars). What does unicode bring to the table if we
don’t plan to expand beyond western european languages? (sorry for the
off-topic question)