Decoding of Subjects broken in 3.0.3pre3 (3.0.3pre2 was OK)

Hello!

In 3.0.3pre3 you committed the following patch. It broke charset recongizing
of Subjects like =?koi8-r?B?1MXT1A==?=
This perfectly valid subject, but after RT processing it becomes ‘???’.

When I rollback the following patch, everythink becomes OK.

Please fix!

Thanks!!!

— rt-3-0-3pre2/lib/RT/Attachment_Overlay.pm Thu Jun 5 01:22:36 2003
+++ rt-3-0-3pre3/lib/RT/Attachment_Overlay.pm Wed Jun 11 00:17:58 2003
@@ -146,8 +146,11 @@
TransactionId => $args{‘TransactionId’},
Parent => 0,
ContentType => $Attachment->mime_type,

  •        Headers       => $Attachment->head->as_string,
    
  •        Subject       => $Subject,
    
  •        # We need to untaint the data here, because there's a high likelyho
    

od that some incompetent MUA

  •        # Author will try to put bogus non-ascii data in a message header.
    
  •        Headers => Encode::encode( utf8 => $Attachment->head->as_string, En
    

code::FB_PERLQQ ),

  •        Subject => Encode::encode( utf8 => $Subject, Encode::FB_PERLQQ )
    
       );
       foreach my $part ( $Attachment->parts ) {
    

@@ -219,14 +222,15 @@
$Body = MIME::Base64::encode_base64($Body);

     }
     my $id = $self->SUPER::Create( TransactionId => $args{'TransactionId'},
                                    ContentType   => $Attachment->mime_type,
                                    ContentEncoding => $ContentEncoding,
                                    Parent          => $args{'Parent'},
  •                                   # We need to untaint the data here, beca
    

use there’s a high likelyhood that some incompetent MUA

  •                                   # Author will try to put bogus non-ascii
    

data in a message header.

  •                                   Headers       => Encode::encode(utf8 =>
    

$Attachment->head->as_string, Encode::FB_PERLQQ),

  •                                   Subject       => Encode::encode(utf8 =>
    

$Subject, Encode::FB_PERLQQ),
Content => $Body,

  •                                   Headers  => $Attachment->head->as_string
    

,

  •                                   Subject  => $Subject,
                                      Filename => $Filename, );
       return ($id);
    
    }

Interesting. What if you back it out just for the Subject, but leave it
in for the Headers? The problem we’re trying to solve is that some
mail readers send headers containing unencoded non-ascii character
sets, which makes postgres go down in flames. Which is clearly not what
we want ;)On Wed, Jun 11, 2003 at 04:47:33PM +0400, Dmitry Sivachenko wrote:

Hello!

In 3.0.3pre3 you committed the following patch. It broke charset recongizing
of Subjects like =?koi8-r?B?1MXT1A==?=
This perfectly valid subject, but after RT processing it becomes ‘???’.

When I rollback the following patch, everythink becomes OK.

Please fix!

Thanks!!!

— rt-3-0-3pre2/lib/RT/Attachment_Overlay.pm Thu Jun 5 01:22:36 2003
+++ rt-3-0-3pre3/lib/RT/Attachment_Overlay.pm Wed Jun 11 00:17:58 2003
@@ -146,8 +146,11 @@
TransactionId => $args{‘TransactionId’},
Parent => 0,
ContentType => $Attachment->mime_type,

  •        Headers       => $Attachment->head->as_string,
    
  •        Subject       => $Subject,
    
  •        # We need to untaint the data here, because there's a high likelyho
    

od that some incompetent MUA

  •        # Author will try to put bogus non-ascii data in a message header.
    
  •        Headers => Encode::encode( utf8 => $Attachment->head->as_string, En
    

code::FB_PERLQQ ),

  •        Subject => Encode::encode( utf8 => $Subject, Encode::FB_PERLQQ )
    
       );
       foreach my $part ( $Attachment->parts ) {
    

@@ -219,14 +222,15 @@
$Body = MIME::Base64::encode_base64($Body);

     }
  •    my $id = $self->SUPER::Create( TransactionId => $args{'TransactionId'},
                                      ContentType   => $Attachment->mime_type,
                                      ContentEncoding => $ContentEncoding,
                                      Parent          => $args{'Parent'},
    
  •                                   # We need to untaint the data here, beca
    

use there’s a high likelyhood that some incompetent MUA

  •                                   # Author will try to put bogus non-ascii
    

data in a message header.

  •                                   Headers       => Encode::encode(utf8 =>
    

$Attachment->head->as_string, Encode::FB_PERLQQ),

  •                                   Subject       => Encode::encode(utf8 =>
    

$Subject, Encode::FB_PERLQQ),
Content => $Body,

  •                                   Headers  => $Attachment->head->as_string
    

,

  •                                   Subject  => $Subject,
                                      Filename => $Filename, );
       return ($id);
    
    }

rt-devel mailing list
rt-devel@lists.fsck.com
http://lists.fsck.com/mailman/listinfo/rt-devel

http://www.bestpractical.com/rt – Trouble Ticketing. Free.

Interesting. What if you back it out just for the Subject, but leave it
in for the Headers? The problem we’re trying to solve is that some
mail readers send headers containing unencoded non-ascii character
sets, which makes postgres go down in flames. Which is clearly not what
we want :wink:

When I back this patch out only for Subject (see diff between 3.0.3pre3 and
my current version attached), I get the following error:

perl -c Attachment_Overlay.pm

Bareword “Encode::FB_PERLQQ” not allowed while “strict subs” in use at Attachmen
t_Overlay.pm line 145.
Bareword “Encode::FB_PERLQQ” not allowed while “strict subs” in use at Attachmen
t_Overlay.pm line 222.
Attachment_Overlay.pm had compilation errors.

Could you please send me a correct patch for testing?

I see your intention to fix the case when e-mail contains unencoded non-ascii
characters, we also faced the same problem.

I propose we should pass such headers to Encode::Guess, as we do for
e-mail body.

Thanks!

rt.diff (1.84 KB)

When I back this patch out only for Subject (see diff between 3.0.3pre3 and
my current version attached), I get the following error:

perl -c Attachment_Overlay.pm

Bareword “Encode::FB_PERLQQ” not allowed while “strict subs” in use at Attachmen
t_Overlay.pm line 145.
Bareword “Encode::FB_PERLQQ” not allowed while “strict subs” in use at Attachmen
t_Overlay.pm line 222.
Attachment_Overlay.pm had compilation errors.

Writing Encode::FB_PERLQQ as Encode::FB_PERLQQ() will fix this.

/Autrijus/

When I back this patch out only for Subject (see diff between 3.0.3pre3 and
my current version attached), I get the following error:

perl -c Attachment_Overlay.pm

Bareword “Encode::FB_PERLQQ” not allowed while “strict subs” in use at Attachmen
t_Overlay.pm line 145.
Bareword “Encode::FB_PERLQQ” not allowed while “strict subs” in use at Attachmen
t_Overlay.pm line 222.
Attachment_Overlay.pm had compilation errors.

Writing Encode::FB_PERLQQ as Encode::FB_PERLQQ() will fix this.

Thanks, that did the trick.

So when I back out this patch only for Subject (and leave for Headers),
properly MIME-encoded Subjects also become ‘???’ after RT.

The only way to fix this is to back out that patch completely.

What do you think about my suggestion to pass every header containing
non-ascii characters to Encode::Guess, as you do with message body?

What do you think about my suggestion to pass every header containing
non-ascii characters to Encode::Guess, as you do with message body?

I think it’s the only sane way out. Jesse?

/Autrijus/

What do you think about my suggestion to pass every header containing
non-ascii characters to Encode::Guess, as you do with message body?

I think it’s the only sane way out. Jesse?

Another sane way (which is more complex to implement) would be
to look if message body contains text part and assume that encoding of
headers is the same as encoding of the body.
Otherwise, fallback to Encode::Guess.
This is probably better solution because Encode::Guess may fail to
determine encoding because headers typically contain a few characters
to base a decision on, and headers should have the same encoding as
text part of the body almost for sure.

Just another thought.

Well, I’d actually just use the content-type’s encoding if it’s there.
failing that, encode guess and then fall back to assume latin1.On Sun, Jun 15, 2003 at 03:17:06AM +0800, Autrijus Tang wrote:

On Sat, Jun 14, 2003 at 03:21:45PM +0400, Dmitry Sivachenko wrote:

What do you think about my suggestion to pass every header containing
non-ascii characters to Encode::Guess, as you do with message body?

I think it’s the only sane way out. Jesse?

/Autrijus/

http://www.bestpractical.com/rt – Trouble Ticketing. Free.

What do you think about my suggestion to pass every header containing
non-ascii characters to Encode::Guess, as you do with message body?

I think it’s the only sane way out. Jesse?

Another sane way (which is more complex to implement) would be
to look if message body contains text part and assume that encoding of
headers is the same as encoding of the body.
Otherwise, fallback to Encode::Guess.
This is probably better solution because Encode::Guess may fail to
determine encoding because headers typically contain a few characters
to base a decision on, and headers should have the same encoding as
text part of the body almost for sure.

Just another thought.

PS: this will improve probability of the correct charset detection in the
following (rather common) situation:

A (broken) MUA sends a plain text message with non-ascii characters in
both body and subject (and probably other header fields like From).

The best way in such a case would be to run body through Encode::Guess
and then treat other header fields as if they are in the same encoding.

Well, I’d actually just use the content-type’s encoding if it’s there.
failing that, encode guess and then fall back to assume latin1.

That’s certainly fine with me. So ->OriginalEncoding will take a
meaning to both the headers and the body. Do we want something
like ->OriginalHeader too?

Thanks,
/Autrijus/