R19578 - in rt/3.8/trunk: share/html/Ticket/Attachment/WithHeaders

Emmanuel, there is no such thing as original encoding for headers.
Anything not ascii is illegal in the head. Even the message has been
in some encoding then it doesn’t mean that the head was in the same or
even in one encoding. I never saw “download with headers” as something
useful for anything except some debugging. I don’t mind that we make
it consistent and readable, but let’s make it in a branch with tests
and for 3.8.4.

Are you ok with that?On Thu, May 7, 2009 at 2:13 PM, elacour@bestpractical.com wrote:

Author: elacour
Date: Thu May 7 06:13:05 2009
New Revision: 19578

Modified:
rt/3.8/trunk/lib/RT/Attachment_Overlay.pm
rt/3.8/trunk/share/html/Ticket/Attachment/WithHeaders/dhandler

Log:
Fix encoding differences between attachment content and headers when
downloading attachments with headers.

  • $AttachmentsObj->Headers is always UTF-8 as it’s converted when saved to DB,
    and it has utf8 flag on
  • $AttachmentsObj->OriginalContent is in the original encoding and has utf8
    flag off

so this patch adds an Attachment::OriginalHeaders method.

Modified: rt/3.8/trunk/lib/RT/Attachment_Overlay.pm

— rt/3.8/trunk/lib/RT/Attachment_Overlay.pm (original)
+++ rt/3.8/trunk/lib/RT/Attachment_Overlay.pm Thu May 7 06:13:05 2009
@@ -320,6 +320,46 @@
return $content;
}

+=head2 OriginalHeaders
+
+Returns the attachment’s headers as octets before RT’s mangling. Currently,
+this just means restoring text content back to its original encoding.
+
+=cut
+
+sub OriginalHeaders {

  • my $self = shift;
  • return $self->Headers unless RT::I18N::IsTextualContentType($self->ContentType);
  • my $enc = $self->OriginalEncoding;
  • my $headers;
  • if ( !$self->ContentEncoding || $self->ContentEncoding eq ‘none’ ) {
  •    $headers = $self->_Value('Headers', decode_utf8 => 0);
    
  • } elsif ( $self->ContentEncoding eq ‘base64’ ) {
  •    $headers = MIME::Base64::decode_base64($self->_Value('Headers', decode_utf8 => 0));
    
  • } elsif ( $self->ContentEncoding eq ‘quoted-printable’ ) {
  •    $headers = MIME::QuotedPrint::decode($self->_Value('Headers', decode_utf8 => 0));
    
  • } else {
  •    return( $self->loc("Unknown ContentEncoding [_1]", $self->ContentEncoding));
    
  • }
  • Turn off the SvUTF8 bits here so decode_utf8 and from_to below can work.

  • local $@;
  • Encode::_utf8_off($headers);
  • if (!$enc || $enc eq ‘’ || $enc eq ‘utf8’ || $enc eq ‘utf-8’) {
  •    # If we somehow fail to do the decode, at least push out the raw bits
    
  •    eval { return( Encode::decode_utf8($headers)) } || return ($headers);
    
  • }
  • eval { Encode::from_to($headers, ‘utf8’ => $enc) } if $enc;
  • if ($@) {
  •    $RT::Logger->error("Could not convert attachment headers from assumed utf8 to '$enc' :".$@);
    
  • }
  • return $headers;
    +}

=head2 OriginalEncoding

Returns the attachment’s original encoding.

Modified: rt/3.8/trunk/share/html/Ticket/Attachment/WithHeaders/dhandler

— rt/3.8/trunk/share/html/Ticket/Attachment/WithHeaders/dhandler (original)
+++ rt/3.8/trunk/share/html/Ticket/Attachment/WithHeaders/dhandler Thu May 7 06:13:05 2009
@@ -70,7 +70,7 @@
# XXX: should we check handle html here and integrate headers into html?
$r->content_type( $content_type );
$m->clear_buffer;

  • $m->out( $AttachmentObj->Headers );
  • $m->out( $AttachmentObj->OriginalHeaders );
    $m->out( “\n\n” );
    $m->out( $AttachmentObj->OriginalContent );
    $m->abort;

Rt-commit mailing list
Rt-commit@lists.bestpractical.com
http://lists.bestpractical.com/cgi-bin/mailman/listinfo/rt-commit

Best regards, Ruslan.

Emmanuel, there is no such thing as original encoding for headers.
Anything not ascii is illegal in the head. Even the message has been
in some encoding then it doesn’t mean that the head was in the same or
even in one encoding.

For me, “download” with headers is a way to get the mail as close as
possible as the form it had before entering RT.

When mail enter RT, header is rfc 2047 decoded, content-type charset
header is changed as utf-8 as content of email is recorded in utf-8 in
DB, then headers are saved in DB.

Except I’m missing something, we don’t record somewhere original rfc
2047 encoding of those headers, so I tried to asume it’s somewhat
consistent with text content. I’m ok with you that this is not usefull,
and can be often false.

So the OriginalHeaders method can still exists I think, but only return
rfc 2047 headers with content-type header charset reverted to
X-RT-OriginalEncoding.

Do you agree?

I never saw “download with headers” as something
useful for anything except some debugging. I don’t mind that we make
it consistent and readable, but let’s make it in a branch with tests
and for 3.8.4.

Ok, for a branch, but even it’s a debugging function, we should try to
returns true datas :wink:

Hello Emmanuel,

I’m working on OriginalHeaders sub right now. I see there special case
when OriginalEncoding is base64 or quoted-printable, but as far as I
know it’s impossible encoding for content.

b64 and QP are content transfer encodings. Yes, headers can be
protected using Q or B encoding (variants of b64 and QB). Do we have a
real need in making “Download with headers” unreadable for a human? I
can not read Q/B encodings :slight_smile:

Do you have a real life example where OriginalEncoding is b64 or QB?
And example where we should protect headers using Q or B encodings?On Sat, May 9, 2009 at 12:37 PM, Emmanuel Lacourelacour@easter-eggs.com wrote:

On Sat, May 09, 2009 at 12:59:24AM +0400, Ruslan Zakirov wrote:

Emmanuel, there is no such thing as original encoding for headers.
Anything not ascii is illegal in the head. Even the message has been
in some encoding then it doesn’t mean that the head was in the same or
even in one encoding.

For me, “download” with headers is a way to get the mail as close as
possible as the form it had before entering RT.

When mail enter RT, header is rfc 2047 decoded, content-type charset
header is changed as utf-8 as content of email is recorded in utf-8 in
DB, then headers are saved in DB.

Except I’m missing something, we don’t record somewhere original rfc
2047 encoding of those headers, so I tried to asume it’s somewhat
consistent with text content. I’m ok with you that this is not usefull,
and can be often false.

So the OriginalHeaders method can still exists I think, but only return
rfc 2047 headers with content-type header charset reverted to
X-RT-OriginalEncoding.

Do you agree?

I never saw “download with headers” as something
useful for anything except some debugging. I don’t mind that we make
it consistent and readable, but let’s make it in a branch with tests
and for 3.8.4.

Ok, for a branch, but even it’s a debugging function, we should try to
returns true datas :wink:

Best regards, Ruslan.

Hello Emmanuel,

Privet Ruslan, and excuse me for answering so late :confused:

I’m working on OriginalHeaders sub right now. I see there special case
when OriginalEncoding is base64 or quoted-printable, but as far as I
know it’s impossible encoding for content.

b64 and QP are content transfer encodings. Yes, headers can be
protected using Q or B encoding (variants of b64 and QB). Do we have a
real need in making “Download with headers” unreadable for a human? I
can not read Q/B encodings :slight_smile:

In my opinion, the download with headers link should let us get the
original mail received by RT, but the way RT record email makes this
impossible.

So there is two solutions I think:

  • encode headers
  • make headers and content use the same encoding (utf-8)

Do you have a real life example where OriginalEncoding is b64 or QB?
And example where we should protect headers using Q or B encodings?

here is example of things that works and things that doesn’t works :wink:

  • if I send a mail full iso-8859-1 (see attached “�ssai.eml.gz”), download
    with headers display is ok (see “�ssai.png”), using iso-8859-1

  • if I send a mail full utf-8 (see attached “�ssai utf-8.eml.gz”), download
    with headers display a mixed file, headers are iso8859-1 and displayed
    ok (http Content-Type is iso8859-1), but body is utf-8 and so display
    is not ok (see “�ssai utf-8.png”)

so the problem is:

  • we do not specify the Content-Encoding in HTTP headers (see
    23a1c9753add55910e5826f691dddb08c8c4b241)
  • we can have different encoding between headers and body (see
    ed08485bde8cd7d0d63f384157fff35004871e79)

and we can have the problem of header encoding vs body encoding also
when using the forward function (see
5548cde4fbd0b835eb9232fcad6323ea6f1c2ed5).