I18N with Perl 5.6.1 and Solaris 8

Looks like it was blocked because of the size.
Here’s the message again, without attachment:— Stanislav Sinyagin ssinyagin@yahoo.com wrote:

Hi Autrijus and all,

I’ve made few tests of the mail gateway (rt-2-1-78) concerning the international
support with Perl 5.6.1 on Solaris 2.8, and here are my concerns:

1). When the incoming mail is UTF-8, everything’s fine.

2). When the incoming mail is Latin1 (iso-8859-1) or Russian (koi8-r),
there are problems in compatibility between Encode::compat::Alias.pm
and Solaris 8 iconv(3C).

I didn’t work with iconv before today, so I have no idea how
this applies to other OSes.

In Solaris, iconv is sensitive to the charset name. It looks up
the filenames in /usr/lib/iconv/ for the corresponding from-to pair, and
issues error if there’s no such file. See the whole directory listing, attached.
From the listing, you can see that:
Latin1 is recognized as “ISO8859-1” or “8859-1” instead of “iso-8859-1”,
Unicode is recognized as “UTF-8” instead of “utf-8”,
Cyrillics is recognized as “KOI8-R” or “koi8-r” (interesting (8^))

Thus, Encode::compat::Alias.pm needs to adaptate its predefined aliases
to Solaris special case…

I’ll be glad to perform the tests if there’s any update from Autrijus.

Cheers,
Stan

(Cc’ing the knowledgable folks at -unicode)

Looks like it was blocked because of the size.
Here’s the message again, without attachment:

Hi there. I suddenly remembered why there was a Encode.pm instead
of Just Use Iconv… Because iconv on each platform are all different,
as pointed out in the Text::Iconv manpage:

NOTES
The supported codesets, their names, the supported conver-
sions, and the quality of the conversions are all sys-
tem-dependent.

In Solaris, iconv is sensitive to the charset name. It looks up
the filenames in /usr/lib/iconv/ for the corresponding from-to pair, and
issues error if there’s no such file. See the whole directory listing, attached.
From the listing, you can see that:
Latin1 is recognized as “ISO8859-1” or “8859-1” instead of “iso-8859-1”,
Unicode is recognized as “UTF-8” instead of “utf-8”,
Cyrillics is recognized as “KOI8-R” or “koi8-r” (interesting (8^))

I think I’ll do a fuzzy match against the supported names, if there
is a way to do that. Is iconvlist(3) support on Solaris? Or must
I fall back to iconv -l?

Another thought is to use GNU Recode on machines which should have
an unified set of character names. I don’t know if recode simply
forces you to use GNU libiconv, though…

I’ll be glad to perform the tests if there’s any update from Autrijus.

Thanks for your feedback!

Thanks,
/Autrijus/

Hi all,— Autrijus Tang autrijus@autrijus.org wrote:

On Tue, Mar 04, 2003 at 09:58:06AM -0800, Stanislav Sinyagin wrote:

In Solaris, iconv is sensitive to the charset name. It looks up
the filenames in /usr/lib/iconv/ for the corresponding from-to pair, and
issues error if there’s no such file. See the whole directory listing, attached.
From the listing, you can see that:
Latin1 is recognized as “ISO8859-1” or “8859-1” instead of “iso-8859-1”,
Unicode is recognized as “UTF-8” instead of “utf-8”,
Cyrillics is recognized as “KOI8-R” or “koi8-r” (interesting (8^))

I think I’ll do a fuzzy match against the supported names, if there
is a way to do that. Is iconvlist(3) support on Solaris? Or must
I fall back to iconv -l?

no, there’s no such thing as iconvlist or iconv -l.
And, as far as the documentation says, the whole iconv package in Solaris
did not change since version 2.6, and it’s going to be the same in version 9.
Thus, we can rely on ($^O eq ‘solaris’) condition.

Actually, I suspect that the way Solaris does it is somehow the right way.
For instance, the XML specification says that encoding string is case-sensitive,
and UTF-8 is the right name, not utf-8.
See the paragraph 4.3.3 of Extensible Markup Language (XML) 1.0 (Second Edition)
See also Character Sets
Of course, omitting the dash between “ISO” and “8859” is not the right way,
and that’s where Solaris specifics should be taken into account.

Regards,
Stan