RT 3.4.5: UTF-8 problems in the web interface

Hi rt-users,

I’m trying to get non-ASCII (mostly latin1) characters to work with RT
3.4.5, and I have problems with UTF-8 encoding in the web interface. It
looks like the characters come out in ISO-8859-1 encoding, while the
HTTP headers call it UTF-8.

I’m using PostgreSQL as the database, and its encoding is set to ‘UNICODE’
(or ‘UTF8’, as it’s called in postgresql 8.1) by rt-setup-database. When
I look at the database contents with the ‘psql’ command-line tool,
they look UTF8-encoded, as expected. However, in the web interface the
non-ASCII characters don’t show properly. A dump with ‘curl’ shows that
while the HTTP headers claim that the encoding is utf-8, the characters
are actually in ISO-8859-1.

This is RT 3.4.5, perl 5.8.8 and PostgreSQL 8.1.4, on Debian. I can also
reproduce it with MySQL 5.0.22 and PostgreSQL 7.4.7, and with perl 5.8.4.

The encoding settings are untouched defaults; from RT_Config.pm:

@LexiconLanguages = qw(*) unless (@LexiconLanguages);
@EmailInputEncodings = qw(utf-8 iso-8859-1 us-ascii) unless (@EmailInputEncodings);
Set($EmailOutputEncoding , ‘utf-8’);

The non-ascii characters get into the database from iso-8859-1-encoded
emails. They are correctly utf-8-encoded in outgoing emails, like in
an AutoReply at creation time. Only the web interface seems to work
incorrectly.

After much fiddling, I found that this patch modifying
RT::Interface::Web::EscapeUTF8() fixes the behaviour completely for me:

— lib/RT/Interface/Web.pm 2006/06/27 10:55:43 1.1
+++ lib/RT/Interface/Web.pm 2006/06/27 10:55:52
@@ -88,7 +88,7 @@
$val =~ s/"/"/g;
$val =~ s/'/'/g;
$$ref = $val;

  •    Encode::_utf8_on($$ref);
    
  •    Encode::_utf8_off($$ref);
    

}

This doesn’t feel like the right solution, however, as there’s probably
a reason for the _utf8_on() call. Or is there?

It looks like the charset info in the HTTP headers comes from
‘html/autohandler’, so Apache configuration is out of this, as far as I
understand. Indeed, using anything as ‘AddDefaultCharset’ in the Apache
config doesn’t seem to have any effect.

Can anybody tell me what I’m doing wrong, please? I haven’t found anything
in the wiki or the mailing list archives, which is a bit surprising
because I’d expect this to hit other people too.

Thanks,
Niko Tyni ntyni@iki.fi