RT 3.0.1 I18N patch proposal

Thanks very much for this. I expect to have real net again on wednesday
or thursday and will test it out then. In the meantime, I’d love to hear
others’ reports on how this patch fixes things up?

Thanks again,
jesseOn Tue, May 06, 2003 at 02:39:44PM +0200, Remy Chibois wrote:

Hello everybody,

here’s a small patch for RT 3.0.1 which tries to correct some issues
people (including me) are having with international characters.

My setup is:

  • RedHat 7.3
  • Perl v5.8.0 built for i686-linux-thread-multi
  • Apache/1.3.27 with mod_perl/1.27
  • RT 3.0.1

“Encodings” setup in RT_SiteConfig.pm is:

@EmailInputEncodings = qw(iso-8859-1) unless (@EmailEncodings);
Set($EmailOutputEncoding , ‘iso-8859-1’);

To install it, cd into your RT 3.0.1 installation and type:

cat rt-3.0.1-I18N.patch.gz | gunzip | patch -p1
or
zcat rt-3.0.1-I18N.patch.gz | patch -p1

depending on your setup.

The patch modifies the following files:

→ lib/RT/EmailParser.pm

When a mail “Subject” contains international characters, it will normally
be encoded in “MIME words” format. RT already addresses this but not for
the “To” and “CC” fields. The patch corrects this.

→ lib/RT/I18N.pm, lib/RT/Action/SendEmail.pm

Attachments whose filenames contain international characters are also encoded
in “MIME words” format. These MIME attributes were not decoded in RT and
attachments appeared “MIME words” encoded in the web interface.

→ lib/RT/I18N.pm

It was observed that M$ Outlook oddly encodes some MIME fields and attributes.
RT already handles the substitution for a blank space encoded as an underscore,
but in some cases (espcially when the field is more than 40 characters),
multiple “MIME words” lines appear, separated by a newline followed by a tab.
The patch will add some more substitutions for this special case.

→ lib/RT/Record.pm

Perl internal Encode::decode_utf8 function completely mangles text if it’s
not in utf8 and contains international characters. In my case, data entered
in web forms was not sent in utf8 format and the converted string was set to
‘0’.

As usual, please make a backup before applying this patch.

In the hope this will be useful,


Remy Chibois

Request Tracker... So much more than a help desk — Best Practical Solutions – Trouble Ticketing. Free.

Thanks very much for this. I expect to have real net again on wednesday
or thursday and will test it out then. In the meantime, I’d love to hear
others’ reports on how this patch fixes things up?

Except for the Record.pm bit that I’d need to take a closer look about,
the rest seems wonderful. I’ll apply them to my local branch
provisionally and test-run it in next few days.

Thanks,
/Autrijus/

Thanks very much for this. I expect to have real net again on wednesday
or thursday and will test it out then. In the meantime, I’d love to hear
others’ reports on how this patch fixes things up?

Except for the Record.pm bit that I’d need to take a closer look about,
the rest seems wonderful. I’ll apply them to my local branch
provisionally and test-run it in next few days.

The Record.pm patch has already been superceded by a better implementation
available in 3.0.2-pre5. I did some stylistic cleanup to the other
chunks and applied to my branch here as @5663, as attached and inlined
below.

Also, re the I18N.pm bit: if as Remy said that OE separates multiline
MIME words with “a newline and a tab”, wouldn’t the correct regex be:

    $trailing =~ s/\n\t//g;

Instead of the original version:

    $trailing =~ s/\s?\t?$//g;

Or did I miss something?

Thanks,
/Autrijus/

diff -dur rt/lib/RT/Action/SendEmail.pm rt/lib/RT/Action/SendEmail.pm
— rt/lib/RT/Action/SendEmail.pm Tue Apr 15 17:54:15 2003
+++ rt/lib/RT/Action/SendEmail.pm Wed Apr 16 17:03:45 2003
@@ -158,7 +158,8 @@
if ( $transaction_content_obj
&& $transaction_content_obj->Id == $attach->Id );
$MIMEObj->attach( Type => $attach->ContentType,

  •                          Data => $attach->Content );
    
  •                          Data => $attach->Content,
    
  •                          Filename => $attach->Filename );
       }
    

    }
    diff -dur rt/lib/RT/EmailParser.pm rt/lib/RT/EmailParser.pm
    — rt/lib/RT/EmailParser.pm Tue Apr 15 17:54:15 2003
    +++ rt/lib/RT/EmailParser.pm Tue May 6 11:36:10 2003
    @@ -235,8 +235,9 @@

    … and subject too

    {
    my $head = $self->Head;

  • $head->replace(‘Subject’,
  •          RT::I18N::DecodeMIMEWordsToUTF8( $head->get('Subject') ) );
    
  • foreach my $field (qw(Subject From CC)) {
  •   $head->replace($field, RT::I18N::DecodeMIMEWordsToUTF8( $head->get($field) ) );
    
  • }
    }

diff -dur rt/lib/RT/I18N.pm rt/lib/RT/I18N.pm
— rt/lib/RT/I18N.pm Tue Apr 15 17:54:15 2003
+++ rt/lib/RT/I18N.pm Tue May 6 11:25:39 2003
@@ -166,9 +166,16 @@
my ( $mime_type, $charset ) =
( $head->mime_type, $head->mime_attr(“content-type.charset”) || “” );

  •    # the entity is not text, nothing to do with it.
    
  •    # the entity is not text; convert at least MIME word encoded attachment filename
       # TODO: should we be converting ANY text/ type? autrijus?
    
  •    return unless ( $mime_type =~ /^text\/plain$/ );
    
  •    if ($mime_type !~ /^text\/plain$/) {
    
  •        foreach my $attr (qw(content-type.name content-disposition.filename)) {
    
  •            if (my $name = $head->mime_attr($attr)) {
    
  •                $head->mime_attr($attr => DecodeMIMEWordsToUTF8($name));
    
  •            }
    
  •        }
    
  •        return;
    
  •    }
    
       # the entity is text and has charset setting, try convert
       # message body into $enc
    

@@ -280,6 +287,8 @@
my ($prefix, $charset, $encoding, $enc_str, $trailing) =
(shift, shift, shift, shift, shift);

  •    $trailing =~ s/\n\t//g;                 # Observed from Outlook Express
    
    if ($encoding eq ‘Q’ or $encoding eq ‘q’) {
    use MIME::QuotedPrint;
    $enc_str =~ tr/_/ /; # Observed from Outlook Express

5663.diff (2.36 KB)

Quoting Autrijus Tang autrijus@autrijus.org:

Thanks very much for this. I expect to have real net again on
wednesday
or thursday and will test it out then. In the meantime, I’d love to
hear
others’ reports on how this patch fixes things up?

Except for the Record.pm bit that I’d need to take a closer look
about,
the rest seems wonderful. I’ll apply them to my local branch
provisionally and test-run it in next few days.

The Record.pm patch has already been superceded by a better
implementation
available in 3.0.2-pre5. I did some stylistic cleanup to the other
chunks and applied to my branch here as @5663, as attached and inlined
below.

Sorry, I failed to check Record.pm.

Also, re the I18N.pm bit: if as Remy said that OE separates multiline
MIME words with “a newline and a tab”, wouldn’t the correct regex be:

    $trailing =~ s/\n\t//g;

Instead of the original version:

    $trailing =~ s/\s?\t?$//g;

Or did I miss something?

In fact, after numerous tries, the first regexp in the function “transforms”
\n to \s (haven’t yet figured out why). If you put some debug statments in
that area of RT, you’ll see that $trailing is “\s\t” in case of a multiline
MIME words string and “\s” when at the end of that string. Thus, the “original”
regexp catches both cases, and only at the end of the string.

Remy Chibois

In fact, after numerous tries, the first regexp in the function “transforms”
\n to \s (haven’t yet figured out why). If you put some debug statments in
that area of RT, you’ll see that $trailing is “\s\t” in case of a multiline
MIME words string and “\s” when at the end of that string. Thus, the “original”
regexp catches both cases, and only at the end of the string.

But in case of multiline strings, shouldn’t it be

$trailing =~ s/\s?\t?$//mg;

instead of

$trailing =~ s/\s?\t?$//g;

so “$” matches multiple end of lines? Or did I miss something?

/Autrijus/

Quoting Autrijus Tang autrijus@autrijus.org:> On Tue, May 06, 2003 at 05:41:02PM +0200, Remy Chibois wrote:

In fact, after numerous tries, the first regexp in the function
“transforms”
\n to \s (haven’t yet figured out why). If you put some debug
statments in
that area of RT, you’ll see that $trailing is “\s\t” in case of a
multiline
MIME words string and “\s” when at the end of that string. Thus, the
“original”
regexp catches both cases, and only at the end of the string.

But in case of multiline strings, shouldn’t it be

$trailing =~ s/\s?\t?$//mg;

instead of

$trailing =~ s/\s?\t?$//g;

so “$” matches multiple end of lines? Or did I miss something?

This regexp is executed for each line of a multiline string since
the very first regexp produces an array, thus it is not necessary to
specify ‘m’. In fact it should also work without ‘g’, given that it
appears only once per line.

Remy Chibois

This regexp is executed for each line of a multiline string since
the very first regexp produces an array, thus it is not necessary to
specify ‘m’. In fact it should also work without ‘g’, given that it
appears only once per line.

I see – the //g indeed led me off the track. I’ve changed it without
the trailing //g according to your description, so it reads:

$trailing =~ s/\s?\t?$//;

I’ll do some field tests with this set of changes today and tomorrow.

Thanks,
/Autrijus/