Decoding of Subjects broken in 3.0.3pre3 (3.0.3pre2 was OK)

Dmitry_Sivachenko · June 11, 2003, 12:47pm

Hello!

In 3.0.3pre3 you committed the following patch. It broke charset recongizing
of Subjects like =?koi8-r?B?1MXT1A==?=
This perfectly valid subject, but after RT processing it becomes ‘???’.

When I rollback the following patch, everythink becomes OK.

Please fix!

Thanks!!!

— rt-3-0-3pre2/lib/RT/Attachment_Overlay.pm Thu Jun 5 01:22:36 2003
+++ rt-3-0-3pre3/lib/RT/Attachment_Overlay.pm Wed Jun 11 00:17:58 2003
@@ -146,8 +146,11 @@
TransactionId => $args{‘TransactionId’},
Parent => 0,
ContentType => $Attachment->mime_type,

       Headers       => $Attachment->head->as_string,

```
       Subject       => $Subject,
```

       # We need to untaint the data here, because there's a high likelyho

od that some incompetent MUA

       # Author will try to put bogus non-ascii data in a message header.

       Headers => Encode::encode( utf8 => $Attachment->head->as_string, En

code::FB_PERLQQ ),

       Subject => Encode::encode( utf8 => $Subject, Encode::FB_PERLQQ )

   );
   foreach my $part ( $Attachment->parts ) {

@@ -219,14 +222,15 @@
$Body = MIME::Base64::encode_base64($Body);

     }
     my $id = $self->SUPER::Create( TransactionId => $args{'TransactionId'},
                                    ContentType   => $Attachment->mime_type,
                                    ContentEncoding => $ContentEncoding,
                                    Parent          => $args{'Parent'},

                                  # We need to untaint the data here, beca

use there’s a high likelyhood that some incompetent MUA

                                  # Author will try to put bogus non-ascii

data in a message header.

                                  Headers       => Encode::encode(utf8 =>

$Attachment->head->as_string, Encode::FB_PERLQQ),

                                  Subject       => Encode::encode(utf8 =>

$Subject, Encode::FB_PERLQQ),
Content => $Body,

                                  Headers  => $Attachment->head->as_string

,

                                  Subject  => $Subject,
                                  Filename => $Filename, );
   return ($id);

}

Jesse_Vincent · June 11, 2003, 5:43pm

Interesting. What if you back it out just for the Subject, but leave it
in for the Headers? The problem we’re trying to solve is that some
mail readers send headers containing unencoded non-ascii character
sets, which makes postgres go down in flames. Which is clearly not what
we want ;)On Wed, Jun 11, 2003 at 04:47:33PM +0400, Dmitry Sivachenko wrote:

Hello!

In 3.0.3pre3 you committed the following patch. It broke charset recongizing
of Subjects like =?koi8-r?B?1MXT1A==?=
This perfectly valid subject, but after RT processing it becomes ‘???’.

When I rollback the following patch, everythink becomes OK.

Please fix!

Thanks!!!

— rt-3-0-3pre2/lib/RT/Attachment_Overlay.pm Thu Jun 5 01:22:36 2003
+++ rt-3-0-3pre3/lib/RT/Attachment_Overlay.pm Wed Jun 11 00:17:58 2003
@@ -146,8 +146,11 @@
TransactionId => $args{‘TransactionId’},
Parent => 0,
ContentType => $Attachment->mime_type,
       Headers       => $Attachment->head->as_string,
       Subject       => $Subject,
       # We need to untaint the data here, because there's a high likelyho
od that some incompetent MUA
       # Author will try to put bogus non-ascii data in a message header.
       Headers => Encode::encode( utf8 => $Attachment->head->as_string, En
code::FB_PERLQQ ),
       Subject => Encode::encode( utf8 => $Subject, Encode::FB_PERLQQ )

   );
   foreach my $part ( $Attachment->parts ) {
@@ -219,14 +222,15 @@
$Body = MIME::Base64::encode_base64($Body);
     }
   my $id = $self->SUPER::Create( TransactionId => $args{'TransactionId'},
                                  ContentType   => $Attachment->mime_type,
                                  ContentEncoding => $ContentEncoding,
                                  Parent          => $args{'Parent'},
                                  # We need to untaint the data here, beca
use there’s a high likelyhood that some incompetent MUA
                                  # Author will try to put bogus non-ascii
data in a message header.
                                  Headers       => Encode::encode(utf8 =>
$Attachment->head->as_string, Encode::FB_PERLQQ),
                                  Subject       => Encode::encode(utf8 =>
$Subject, Encode::FB_PERLQQ),
Content => $Body,
                                  Headers  => $Attachment->head->as_string
,
                                  Subject  => $Subject,
                                  Filename => $Filename, );
   return ($id);
}
rt-devel mailing list
rt-devel@lists.fsck.com
http://lists.fsck.com/mailman/listinfo/rt-devel

Request Tracker... So much more than a help desk — Best Practical Solutions – Trouble Ticketing. Free.

Dmitry_Sivachenko · June 13, 2003, 9:57am

Interesting. What if you back it out just for the Subject, but leave it
in for the Headers? The problem we’re trying to solve is that some
mail readers send headers containing unencoded non-ascii character
sets, which makes postgres go down in flames. Which is clearly not what
we want

When I back this patch out only for Subject (see diff between 3.0.3pre3 and
my current version attached), I get the following error:

perl -c Attachment_Overlay.pm

Bareword “Encode::FB_PERLQQ” not allowed while “strict subs” in use at Attachmen
t_Overlay.pm line 145.
Bareword “Encode::FB_PERLQQ” not allowed while “strict subs” in use at Attachmen
t_Overlay.pm line 222.
Attachment_Overlay.pm had compilation errors.

Could you please send me a correct patch for testing?

I see your intention to fix the case when e-mail contains unencoded non-ascii
characters, we also faced the same problem.

I propose we should pass such headers to Encode::Guess, as we do for
e-mail body.

Thanks!

rt.diff (1.84 KB)

Autrijus_Tang · June 13, 2003, 5:43pm

When I back this patch out only for Subject (see diff between 3.0.3pre3 and
my current version attached), I get the following error:

perl -c Attachment_Overlay.pm

Bareword “Encode::FB_PERLQQ” not allowed while “strict subs” in use at Attachmen
t_Overlay.pm line 145.
Bareword “Encode::FB_PERLQQ” not allowed while “strict subs” in use at Attachmen
t_Overlay.pm line 222.
Attachment_Overlay.pm had compilation errors.

Writing Encode::FB_PERLQQ as Encode::FB_PERLQQ() will fix this.

/Autrijus/

Dmitry_Sivachenko · June 14, 2003, 11:21am

When I back this patch out only for Subject (see diff between 3.0.3pre3 and
my current version attached), I get the following error:

perl -c Attachment_Overlay.pm

Bareword “Encode::FB_PERLQQ” not allowed while “strict subs” in use at Attachmen
t_Overlay.pm line 145.
Bareword “Encode::FB_PERLQQ” not allowed while “strict subs” in use at Attachmen
t_Overlay.pm line 222.
Attachment_Overlay.pm had compilation errors.

Writing Encode::FB_PERLQQ as Encode::FB_PERLQQ() will fix this.

Thanks, that did the trick.

So when I back out this patch only for Subject (and leave for Headers),
properly MIME-encoded Subjects also become ‘???’ after RT.

The only way to fix this is to back out that patch completely.

What do you think about my suggestion to pass every header containing
non-ascii characters to Encode::Guess, as you do with message body?

Autrijus_Tang · June 14, 2003, 7:17pm

What do you think about my suggestion to pass every header containing
non-ascii characters to Encode::Guess, as you do with message body?

I think it’s the only sane way out. Jesse?

/Autrijus/

Dmitry_Sivachenko · June 14, 2003, 7:42pm

What do you think about my suggestion to pass every header containing
non-ascii characters to Encode::Guess, as you do with message body?

I think it’s the only sane way out. Jesse?

Another sane way (which is more complex to implement) would be
to look if message body contains text part and assume that encoding of
headers is the same as encoding of the body.
Otherwise, fallback to Encode::Guess.
This is probably better solution because Encode::Guess may fail to
determine encoding because headers typically contain a few characters
to base a decision on, and headers should have the same encoding as
text part of the body almost for sure.

Just another thought.

Jesse_Vincent · June 14, 2003, 7:46pm

Well, I’d actually just use the content-type’s encoding if it’s there.
failing that, encode guess and then fall back to assume latin1.On Sun, Jun 15, 2003 at 03:17:06AM +0800, Autrijus Tang wrote:

On Sat, Jun 14, 2003 at 03:21:45PM +0400, Dmitry Sivachenko wrote:

What do you think about my suggestion to pass every header containing
non-ascii characters to Encode::Guess, as you do with message body?

I think it’s the only sane way out. Jesse?

/Autrijus/

Request Tracker... So much more than a help desk — Best Practical Solutions – Trouble Ticketing. Free.

Dmitry_Sivachenko · June 14, 2003, 7:55pm

What do you think about my suggestion to pass every header containing
non-ascii characters to Encode::Guess, as you do with message body?

I think it’s the only sane way out. Jesse?

Another sane way (which is more complex to implement) would be
to look if message body contains text part and assume that encoding of
headers is the same as encoding of the body.
Otherwise, fallback to Encode::Guess.
This is probably better solution because Encode::Guess may fail to
determine encoding because headers typically contain a few characters
to base a decision on, and headers should have the same encoding as
text part of the body almost for sure.

Just another thought.

PS: this will improve probability of the correct charset detection in the
following (rather common) situation:

A (broken) MUA sends a plain text message with non-ascii characters in
both body and subject (and probably other header fields like From).

The best way in such a case would be to run body through Encode::Guess
and then treat other header fields as if they are in the same encoding.

Autrijus_Tang · June 15, 2003, 3:43am

Well, I’d actually just use the content-type’s encoding if it’s there.
failing that, encode guess and then fall back to assume latin1.

That’s certainly fine with me. So ->OriginalEncoding will take a
meaning to both the headers and the body. Do we want something
like ->OriginalHeader too?

Thanks,
/Autrijus/