Different charsets problem

Hi,

has somebody dealt with more charsets in the RT2 series?

I’ve found just an e-mail iface hack (by Petr Rehor) only.

We have everyday serious problems with web ui - some characters
are displayed well (due to ugly hacked component with HTML header),
but when a user replies to message containing some accented characters
the system doesn’t send/say anything or just say incorrect transaction
type etc. There are also problems with displaying Subjects/Bodies with
different charsets etc.

As a solution model I’ve just found Japanese patch for RT1
(by Tadashi G. Takaoka) only.

Any help appreciated.

Jan Okrouhly
-----------------------------------------±----okrouhly@civ.zcu.cz—
Laboratory for Computer Science | phone: (420 19) 7491588
University of West Bohemia | location: Univerzitni 22
Americka 42, 306 14 Pilsen, Czech Republic | room: UI404
------------------------------------------73!-de-OK1INC@OK0PPL.#BOH.CZE.EU-

We have everyday serious problems with web ui - some characters
are displayed well (due to ugly hacked component with HTML header),
but when a user replies to message containing some accented characters
the system doesn’t send/say anything or just say incorrect transaction
type etc. There are also problems with displaying Subjects/Bodies with
different charsets etc.

Which particular charsets does your browser and apache server purport to
know about? You might find that the AddDefaultCharset (apache 1.3.12+)
directive might help with your normal characterset, as RT does not add a
charset by default (although you could put one in
WebRT/html/Elements/Header).

Handling multiple charsets is… currently outside RT’s abilities right now
(see the charset comment in lib/RT/Interface/Email.pm), but Jesse has
planned for it (see the SQL Users.{Lang,EmailEncoding,WebEncoding}
columns).

Regards,

                         Bruce Campbell                            RIPE
               Systems/Network Engineer                             NCC
             www.ripe.net - PGP562C8B1B                      Operations

We have everyday serious problems with web ui - some characters
are displayed well (due to ugly hacked component with HTML header),
but when a user replies to message containing some accented characters
the system doesn’t send/say anything or just say incorrect transaction
type etc. There are also problems with displaying Subjects/Bodies with
different charsets etc.

Which particular charsets does your browser and apache server purport to
know about? You might find that the AddDefaultCharset (apache 1.3.12+)
directive might help with your normal characterset, as RT does not add a
charset by default (although you could put one in
WebRT/html/Elements/Header).

AddDefaultCharset iso-8859-2 (On had there till today;-) partly
helps to Netscape browsers (I thing MSIE ignores that and uses
ContentType/Charset from my local/WebRT/html/Elements/Header.
But some problems will stay, because people [may|use] here at least
iso-8859-1 (M$ default mistake for Central Europe), iso-8859-2 (best bet
here), or Win-1250 and UTF-8 too…

During time I’ll test behavior of more browsers on more platforms with
different encodings.
There was one interesting issue with HTML encoded ‘accented content’ -
Ticket History seems to be OK, but during reply (etc.) people see &#xxx
instead of ‘accented content’. So I think the problem is also at text
area input side.

Handling multiple charsets is… currently outside RT’s abilities right now
(see the charset comment in lib/RT/Interface/Email.pm), but Jesse has

I’ve look at this (man MIME::Head /decode). This is just another
(maybe also important) problem with To, From, Subject etc. This actual
behavior is just fine to me. The main problem is that charset information
from Content-Type is not stored/used.
Example:
Content-Type: text/plain;
charset=“iso-8859-2”

I suppose the right behavior will be to reencode all incomming plain texts
into one internal encoding (UTF8 should be the best). The
Attachments.ContentEncoding could just fit for those, but it need a BIG
work around ;-(I think).

planned for it (see the SQL Users.{Lang,EmailEncoding,WebEncoding}
columns).

Yes, I know that schema, but not detailed Jesse’s plans (are somewhere on
web?). In my opinion .Lang will be usable, but one user often has
more different emails and/or webs encoding (in some heterogenous/open
enviroment).

Regards,

Thanks for the response


Bruce Campbell RIPE
Systems/Network Engineer NCC
www.ripe.net - PGP562C8B1B Operations


rt-users mailing list
rt-users@lists.fsck.com
http://lists.fsck.com/mailman/listinfo/rt-users

Jan Okrouhly
-----------------------------------------±----okrouhly@civ.zcu.cz—
Laboratory for Computer Science | phone: (420 19) 7491588
University of West Bohemia | location: Univerzitni 22
Americka 42, 306 14 Pilsen, Czech Republic | room: UI404
------------------------------------------73!-de-OK1INC@OK0PPL.#BOH.CZE.EU-

I’ve look at this (man MIME::Head /decode). This is just another
(maybe also important) problem with To, From, Subject etc. This actual
behavior is just fine to me. The main problem is that charset information
from Content-Type is not stored/used.
Example:
Content-Type: text/plain;
charset=“iso-8859-2”

If you put extra stuff in Attachments.Content{Type,Encoding} at the
present time, you will cause random breakages down the track as various
regexes on those field will suddenly not work.

I suppose the right behavior will be to reencode all incomming plain texts
into one internal encoding (UTF8 should be the best). The
Attachments.ContentEncoding could just fit for those, but it need a BIG
work around ;-(I think).

yup.

planned for it (see the SQL Users.{Lang,EmailEncoding,WebEncoding}
columns).

Yes, I know that schema, but not detailed Jesse’s plans (are somewhere on

Telepathy seems to work for some.

web?). In my opinion .Lang will be usable, but one user often has
more different emails and/or webs encoding (in some heterogenous/open
enviroment).

We had a discussion today on Google’s default behaviour if you didn’t have
a cookie saying ‘I want this language’. It will ignore what your browser
supplies, and use the dominant language of the region that it thinks your
IP address is. ( at a rough guess, they’re using the country of
registration of the ASN that originates the route to that IP ). This is
good because natives of the country are set. This is bad because
non-natives to the country have to look for ‘Ik wil dat Google het Engels’

With RT storing language (and eventually using) on a user basis, thats
cool irrespective of the browser the user happens to be using at the time.
The encoding is to the user’s browser should always be what the browser
says it can handle, and the WebEncoding in the Users table should be the
preferred one of that set. If User uses a browser without that encoding,
fallback to us-ascii. Email encoding is something that RT shouldn’t guess
at. If the User has said that they want encoding foo, then foo they will
get, irrespective of their (unknown to RT) email client.

Regards,

                         Bruce Campbell                            RIPE
               Systems/Network Engineer                             NCC
             www.ripe.net - PGP562C8B1B                      Operations