More conservative charsets

The patch below simplifies charsets of text/plain parts in outgoing
email.

Believe it or not, we have received numerous reports that umlauts are
broken. In our constituency, MUAs without UTF-8 support are still far
too common. Therefore, our RT instance tries to recode attachments
using a preferred charset list and uses the first possible charset.

What do you think about this approach? (The actual patch should
probably go into MIME::Tools, but that’s just a detail.)

  • modified files

— orig/lib/RT/Action/SendEmail.pm
+++ mod/lib/RT/Action/SendEmail.pm
@@ -33,6 +33,7 @@
use MIME::Words qw(encode_mimeword);

use RT::EmailParser;
+use Encode;

=head1 NAME

@@ -257,6 +258,9 @@
$RT::Logger->info($msgid. " No recipients found. Not sending.\n");
return (1);
}

  • Canonify charsets.

  • &reencode ($MIMEObj);

    PseudoTo (fake to headers) shouldn’t get matched for message recipients.

    If we don’t have any ‘To’ header, drop in the pseudo-to header.

@@ -676,6 +680,63 @@

}}}

+my @Charset_List = (“us-ascii”, “iso-8859-1”, “iso-8859-15”, “windows-1252”);
+sub reencode ($) {

  • my $entity = shift;
  • if ($entity->is_multipart) {
  • for my $part ($entity->parts) {
  •   &reencode ($part);
    
  • }
  • } else {
  • my $body = $entity->bodyhandle;
  • my $content_type = new Mail::Field
  •   ('content-type', $entity->head->get ('content-type'));
    
  • unless ($content_type->type =~ m,^text/,i) {
  •   return;
    
  • }
  • my $decoded;
  • my $source_charset = $entity->head->mime_attr(‘content-type.charset’);
  • unless ($source_charset) {
  •   $source_charset = 'utf-8';
    
  • }
  • eval {
  •   $decoded = decode($source_charset, $body->as_string, Encode::FB_CROAK);
    
  • };
  • if ($@) {
  •   # Can't decode, don't touch it.
    
  •   return;
    
  • }
  •  CHARSET:
    
  • for my $charset (@Charset_List) {
  •   my $octets;
    
  •   eval {
    
  •   $octets = encode($charset, $decoded, Encode::FB_CROAK);
    
  •   };
    
  •   if ($@) {
    
  •   next CHARSET;
    
  •   }
    
  •   my $IO = $body->open ("w");
    
  •   $IO->print ($octets);
    
  •   $IO->close;
    
  •   $content_type->param('charset', $charset);
    
  •   $entity->head->replace('content-type', $content_type->stringify);
    
  •   last CHARSET;
    
  • }
  • }
    +}
    eval “require RT::Action::SendEmail_Vendor”;
    die $@ if ($@ && $@ !~ qr{^Can’t locate RT/Action/SendEmail_Vendor.pm});
    eval “require RT::Action::SendEmail_Local”;

Florian Weimer wrote:

The patch below simplifies charsets of text/plain parts in outgoing
email.

There already appears to be similar functionality (maybe that’s why no
one responded 8-). However, you cannot list several preferred character
sets. Shall I update this patch to provide the same functionality, but
in the already existing framework?