Rt 2.1.56

This release includes:
* Lots of UI work from Linda. More on the way.
* The new client-server mail gateway from Simon. Do not deploy
this on a publicly accessible server just yet. It needs a
bit of locking down.
* A raft of bug fixes.

The only thing stopping alpha-2 from coming out is the lock down
on the mail gate. expect that soon.

»|« http://www.bestpractical.com/rt – Trouble Ticketing. Free.

Oh. and it’s important to note that MailTools has a unicode bug on perl
5.8.0, which renders the mailgate temporarily inoperable.On Fri, Jan 03, 2003 at 07:47:09PM -0500, Jesse Vincent wrote:

This release includes:

  • Lots of UI work from Linda. More on the way.
  • The new client-server mail gateway from Simon. Do not deploy
    this on a publicly accessible server just yet. It needs a
    bit of locking down.
  • A raft of bug fixes.

The only thing stopping alpha-2 from coming out is the lock down
on the mail gate. expect that soon.


»|« http://www.bestpractical.com/rt – Trouble Ticketing. Free.

»|« http://www.bestpractical.com/rt – Trouble Ticketing. Free.

Oh. and it’s important to note that MailTools has a unicode bug on perl
5.8.0, which renders the mailgate temporarily inoperable.

Just for the record, the bug is not MailTools’ bug perl’s –
It’s just MailTools exposed this bug.

For people who want to test before Overmeer applies my patch,
grab MailTools 1.53 and apply the following patch. The bugs
should magically go away.

Thanks,
/Autrijus/

— Mail/Field.pm.orig Sat Jan 4 08:22:58 2003
+++ Mail/Field.pm Sat Jan 4 08:25:03 2003
@@ -226,6 +226,9 @@
my $self = shift;
my $tag = ref($self) || $self;

  • Bug in unicode \U, perl 5.8.0 breaks when casting utf8 in regex

  • utf8::downgrade($tag) if $] eq 5.008;
    $tag =~ s/.*:://o;
    $tag =~ s/_/-/og;
    $tag =~ s/\b([a-z]+)/\L\u$1/gio;
    @@ -294,6 +297,9 @@
    unless(eval "require " . $pkg)
    {
    my $tag = $method;

  • Bug in unicode \U, perl 5.8.0 breaks when casting utf8 in regex

  • utf8::downgrade($tag) if $] eq 5.008;

    $tag =~ s/_/-/og;
    $tag =~ s/\b([a-z]+)/\L\u$1/gio;
    — Mail/Address.pm.orig Sat Jan 4 08:34:33 2003
    +++ Mail/Address.pm Sat Jan 4 08:34:35 2003
    @@ -21,6 +21,9 @@
    sub extract_name
    {
    local $
    = shift || ‘’;

  • Bug in unicode \U, perl 5.8.0 breaks when casing utf8 in regex

  • utf8::downgrade($_) if $] eq 5.008;

    trim whitespace

    s/^\s+//;

All the past development snapshots don’t like german umlaut’s. I get this:
" Über die Zeit steigt die PrioritÀt bis: " (Over time, priority
moves toward) and it should be “ï¿œber die Zeit steigt die Prioritï¿œt bis:”.
Wehre i go wrong here?

Thank’s and greetings!

Oh. and it’s important to note that MailTools has a unicode bug on perl
5.8.0, which renders the mailgate temporarily inoperable.

This release includes:

  • Lots of UI work from Linda. More on the way.
  • The new client-server mail gateway from Simon. Do not deploy
    this on a publicly accessible server just yet. It needs a
    bit of locking down.
  • A raft of bug fixes.

The only thing stopping alpha-2 from coming out is the lock down
on the mail gate. expect that soon.


ï¿œ|ï¿œ http://www.bestpractical.com/rt – Trouble Ticketing. Free.


ï¿œ|ï¿œ http://www.bestpractical.com/rt – Trouble Ticketing. Free.


rt-devel mailing list
rt-devel@lists.fsck.com
http://lists.fsck.com/mailman/listinfo/rt-devel

Mit freundlichen Grᅵᅵen / Kind Regards
Stefan Fischer

Hi,

That means that RT sends incorrect character set information
(if it sends it at all).

In your browser, choose the encoding “Unicode” and you must see it well.

To the developer guys: this involves several things:

First is that the charset information should be
sent in HTTP header, and the second that the charset depends
on Perl version.

Due to default_escape_flags => ‘h’,
Mason uses HTML::Entities, which acts differently depending on the
perl version:
http://search.cpan.org/author/GAAS/HTML-Parser-3.26/lib/HTML/Entities.pm

Thus, there must be a line like this:

$charset = $] > 5.007 ? ‘UTF-8’ : ‘ISO-8859-1’;

and a line like this:

$r->header_out( ‘Charset’, $charset );

I suppose the proper place for these is in html/Elements/Header.

And the final thing is, that perhaps default_escape_flags => 'h’
is not the correct way of handling non-ascii characters, especially
for such languages as Russian.
I suppose it’s better to suppress Mason’s escaping, and
manage it internally.

Regards,
Stan— Stefan Fischer info@debian.homeunix.net wrote:

All the past development snapshots don’t like german umlaut’s. I get this:
" Über die Zeit steigt die Priorität bis: " (Over time, priority
moves toward) and it should be “�ber die Zeit steigt die Priorit�t bis:”.
Wehre i go wrong here?

Hi,

That means that RT sends incorrect character set information
(if it sends it at all).
First is that the charset information should be
sent in HTTP header, and the second that the charset depends
on Perl version.

Due to default_escape_flags => ‘h’,
Mason uses HTML::Entities, which acts differently depending on the
perl version:
http://search.cpan.org/author/GAAS/HTML-Parser-3.26/lib/HTML/Entities.pm

Thus, there must be a line like this:

$charset = $] > 5.007 ? ‘UTF-8’ : ‘ISO-8859-1’;

and a line like this:

$r->header_out( ‘Charset’, $charset );

So. I was fairly sure that autrijus had set this up. Autrijus?

I suppose the proper place for these is in html/Elements/Header.

And the final thing is, that perhaps default_escape_flags => 'h’
is not the correct way of handling non-ascii characters, especially
for such languages as Russian.
I suppose it’s better to suppress Mason’s escaping, and
manage it internally.

No. Mason provides a callback for html entity escaping. We need to put
together a utf8 callback.

Regards,
Stan

All the past development snapshots don’t like german umlaut’s. I get this:
" ???ber die Zeit steigt die Priorit???t bis: " (Over time, priority
moves toward) and it should be “?ber die Zeit steigt die Priorit?t bis:”.
Wehre i go wrong here?


rt-devel mailing list
rt-devel@lists.fsck.com
http://lists.fsck.com/mailman/listinfo/rt-devel

»|« http://www.bestpractical.com/rt – Trouble Ticketing. Free.

Hi,

That means that RT sends incorrect character set information
(if it sends it at all).
First is that the charset information should be
sent in HTTP header,

And, actually, it is:

Content-Type: text/html; charset=utf-8

»|« http://www.bestpractical.com/rt – Trouble Ticketing. Free.

First is that the charset information should be
sent in HTTP header,

And, actually, it is:

Content-Type: text/html; charset=utf-8

aha, the situation is more complicated: the strings that
Stefan Fischer has sent, are not Unicode! and neither ISO latin1.

Unfortunately, I’ve got no server to check it quickly,
but I suspect it went through these steps:

lib/RT/I18N/de.po is encoded Latin1.

Then it goes through lib/RT/I18N.pm and is presented as wanna-be Unicode.
I’m not sure at this stage if it really produces unicode.

Then it goes through HTML::Entities (as told by default_escape_flags => ‘h’),
and all non-ascii characters are replaced with entities:
Ä for a-umlaut etc.
At this stage, HTML::Entities depends on Perl version (Stefan, what’s yours?).

If it’s 5.6, it treats each non-ascii byte (remember, Unicode
symbols come as two-byte symbols?) as non-ascii character, and
produces two HTML entities per each Unicode symbol.

In 5.8, each non-ascii Unicode symbol (two bytes) is
replaced with a HTML entity. In HTML::Entities, they are defined
for Latin1 symbols only. It means, Cyrillic (Russian) symbols would
be replaced with (one or two?) numeric entities.
Some browsers will survive that (in case if it’s still one entity),
but it’s definitely wrong way.

The right way would be to totally avoid entity’izing, and
shoot out the plain text, with correct charset in HTTP header.

With regards,

Stan

In addition, in Stefan’s message, each Latin1 letter
was replaced with 4 characters, not 2. This means, that
HTML::Entities received already a 4-byte sequence for
each symbol. It means either double Latin1->UTF-8
conversion, or surprisingly appeared UTF-16.— Stanislav Sinyagin ssinyagin@yahoo.com wrote:

lib/RT/I18N/de.po is encoded Latin1.

Then it goes through lib/RT/I18N.pm and is presented as wanna-be Unicode.
I’m not sure at this stage if it really produces unicode.

Then it goes through HTML::Entities (as told by default_escape_flags => ‘h’),
and all non-ascii characters are replaced with entities:
Ä for a-umlaut etc.
At this stage, HTML::Entities depends on Perl version (Stefan, what’s yours?).

If it’s 5.6, it treats each non-ascii byte (remember, Unicode
symbols come as two-byte symbols?) as non-ascii character, and
produces two HTML entities per each Unicode symbol.

In 5.8, each non-ascii Unicode symbol (two bytes) is
replaced with a HTML entity. In HTML::Entities, they are defined
for Latin1 symbols only. It means, Cyrillic (Russian) symbols would
be replaced with (one or two?) numeric entities.
Some browsers will survive that (in case if it’s still one entity),
but it’s definitely wrong way.

Oh. and it’s important to note that MailTools has a unicode bug on perl
5.8.0, which renders the mailgate temporarily inoperable.
Just for the record, the bug is not MailTools’ bug perl’s –
It’s just MailTools exposed this bug.

MailTools-1.55 just hit the shelf of the CPAN library; it integrates
the needed patch. Just FYI.

/Autrijus/

Hi,

I confirm I face the same strange behaviour with the fr.po file. Just a few
pieces of information if it may help understanding the problem :

  1. The file I submit and use for French is coded as UTF8 and not Latin1.
    Converting it to Latin 1 does not make it any better.
  2. The problem did not exist around 2.1.39 when I did my previous test.
  3. The first login page is OK, it gets wrong when reaching the “home page”

Blaise

— Stanislav Sinyagin < mailto:ssinyagin@yahoo.com ssinyagin@yahoo.com>
wrote:

lib/RT/I18N/de.po is encoded Latin1.

Then it goes through lib/RT/I18N.pm and is presented as wanna-be Unicode.
I’m not sure at this stage if it really produces unicode.

Then it goes through HTML::Entities (as told by default_escape_flags =>
‘h’),
and all non-ascii characters are replaced with entities:
Ä for a-umlaut etc.
At this stage, HTML::Entities depends on Perl version (Stefan, what’s
yours?).

If it’s 5.6, it treats each non-ascii byte (remember, Unicode
symbols come as two-byte symbols?) as non-ascii character, and
produces two HTML entities per each Unicode symbol.

In 5.8, each non-ascii Unicode symbol (two bytes) is
replaced with a HTML entity. In HTML::Entities, they are defined
for Latin1 symbols only. It means, Cyrillic (Russian) symbols would
be replaced with (one or two?) numeric entities.
Some browsers will survive that (in case if it’s still one entity),
but it’s definitely wrong way.

Blaise Thauvin
Groupe France Rail Publicité

bthauvin@clearchannel.fr mailto:bthauvin@clearchannel.fr
+33 1 40 64 24 56

Hi,

I confirm I face the same strange behaviour with the fr.po file. Just a few
pieces of information if it may help understanding the problem :

  1. The file I submit and use for French is coded as UTF8 and not Latin1.
    Converting it to Latin 1 does not make it any better.
  2. The problem did not exist around 2.1.39 when I did my previous test.
  3. The first login page is OK, it gets wrong when reaching the “home page”

I’ve spoken with Autrijus Tang and we’ve confirmed that this Does work
with 5.8, which uses newer unicode libraries, but not with 5.6. Still
not sure why it’s busted on 5.6.

Blaise

»|« http://www.bestpractical.com/rt – Trouble Ticketing. Free.

I’ve spoken with Autrijus Tang and we’ve confirmed that this Does work
with 5.8, which uses newer unicode libraries, but not with 5.6. Still
not sure why it’s busted on 5.6.

I’ll try and find some time in mid-February and track the problem.
I hope my level of German language is enough for that.
We’re still running 5.6.1.

Still, I see a big problem with non-Latin charsets, e.g. Russian.
Though I didn’t yet find the time for testing it.
I insist that HTML::Entities is doing the wrong things, it’s
not needed for Latin charsets and blows everything for non-Latin ones.

regards,
Stan

I’ve spoken with Autrijus Tang and we’ve confirmed that this Does work
with 5.8, which uses newer unicode libraries, but not with 5.6. Still
not sure why it’s busted on 5.6.

I’ll try and find some time in mid-February and track the problem.
I hope my level of German language is enough for that.
We’re still running 5.6.1.

Can you test it on a 5.8 box?

Still, I see a big problem with non-Latin charsets, e.g. Russian.
Though I didn’t yet find the time for testing it.
I insist that HTML::Entities is doing the wrong things, it’s
not needed for Latin charsets and blows everything for non-Latin ones.

So. Mason now has pluggable html escaping rules. I’d be thrilled if you
could hand me the one-line patch that makes it do the right thing.

regards,
Stan


rt-devel mailing list
rt-devel@lists.fsck.com
http://lists.fsck.com/mailman/listinfo/rt-devel

»|« http://www.bestpractical.com/rt – Trouble Ticketing. Free.

We’re still running 5.6.1.

Can you test it on a 5.8 box?

this will take more time. Maybe I’ll be not so much
busy on other high-priority projects as now.

I insist that HTML::Entities is doing the wrong things, it’s
not needed for Latin charsets and blows everything for non-Latin ones.

So. Mason now has pluggable html escaping rules. I’d be thrilled if you
could hand me the one-line patch that makes it do the right thing.

will look into that. But I think disabling
default_escape_flags => 'h’
in lib/RT/Interface/Web.pm would be enough for now.

All major browsers understand raw Unicode in HTML, so you don’t need to
change UTF characters into entities.
Raw Latin1 or Cyrillics (or those Asian alphabets) aren’t needed to
translate either, you just need
to specify the correct charset in Content-type header.

Sorry, that’s only the theory on the weekend. I’ll be able to play
with real systems in 2-3 weeks.

Cheers,
Stan

So. Mason now has pluggable html escaping rules. I’d be thrilled if you
could hand me the one-line patch that makes it do the right thing.

will look into that. But I think disabling
default_escape_flags => 'h’
in lib/RT/Interface/Web.pm would be enough for now.

DO NOT do thst on a production system. It will open you up
to a wide variety of cross-site scripting attacks. Anyone who sends mail
to RT will be able to compromise the account of any RT user who even has
a ticket listed in their homepage.

»|« http://www.bestpractical.com/rt – Trouble Ticketing. Free.

DO NOT do thst on a production system. It will open you up
to a wide variety of cross-site scripting attacks. Anyone who sends mail
to RT will be able to compromise the account of any RT user who even has
a ticket listed in their homepage.

wow. no, I won’t run it on production system. I’ve got a FreeBSD hard disk
in my desk, need only to find the chassis to plug it into…

Hello Stan & Jesse,

thank you for your investigations! Stan asked me for the perl version on
my box…

hostname:~/rt-2-1-66# perl -v
This is perl, v5.6.1 built for i386-linux
hostname:~/rt-2-1-66# uname -a
Linux hostname 2.4.18 #5 Fri Jan 31 16:52:02 CET 2003 i686 unknown
Debian woody stable

The problem further below described is still the same in 2.1.66.

Greetings Stefan!

First is that the charset information should be
sent in HTTP header,

And, actually, it is:

Content-Type: text/html; charset=utf-8

aha, the situation is more complicated: the strings that
Stefan Fischer has sent, are not Unicode! and neither ISO latin1.

Unfortunately, I’ve got no server to check it quickly,
but I suspect it went through these steps:

lib/RT/I18N/de.po is encoded Latin1.

Then it goes through lib/RT/I18N.pm and is presented as wanna-be Unicode.
I’m not sure at this stage if it really produces unicode.

Then it goes through HTML::Entities (as told by default_escape_flags =>
‘h’),
and all non-ascii characters are replaced with entities:
Ä for a-umlaut etc.
At this stage, HTML::Entities depends on Perl version (Stefan, what’s
yours?).

If it’s 5.6, it treats each non-ascii byte (remember, Unicode
symbols come as two-byte symbols?) as non-ascii character, and
produces two HTML entities per each Unicode symbol.

In 5.8, each non-ascii Unicode symbol (two bytes) is
replaced with a HTML entity. In HTML::Entities, they are defined
for Latin1 symbols only. It means, Cyrillic (Russian) symbols would
be replaced with (one or two?) numeric entities.
Some browsers will survive that (in case if it’s still one entity),
but it’s definitely wrong way.

The right way would be to totally avoid entity’izing, and
shoot out the plain text, with correct charset in HTTP header.

With regards,

Stan


rt-devel mailing list
rt-devel@lists.fsck.com
http://lists.fsck.com/mailman/listinfo/rt-devel

Mit freundlichen Grᅵᅵen / Kind Regards
Stefan Fischer

Hi Stefan and all,

I still had no time to dig deeper into the problem, but I promise I’ll try
and catch the moment.

Cheers,
Stan— Stefan Fischer info@debian.homeunix.net wrote:

Hello Stan & Jesse,

thank you for your investigations! Stan asked me for the perl version on
my box…

hostname:~/rt-2-1-66# perl -v
This is perl, v5.6.1 built for i386-linux