Bad characters in names loaded from LDAP (AD)

Hi all,

we have RT 4.4.0 on CentOS 7 and Perl v5.22.1. And we are starting to
use RT in production.

We configured RT to authenticate users via LDAP
(RT::Authen::ExternalAuth::LDAP). Our LDAP server is MS AD (Win 2008 R2).

Our config of LDAP ExternalAuth in RT:

Set($ExternalSettings, {
‘My_LDAP’ => {
‘type’ => ‘ldap’,
‘server’ => ‘ldaps://ADserver:636’,
‘user’ => ‘ldap-user’,
‘pass’ => ‘password’,
‘base’ => ‘dc=domain,dc=com’,
‘filter’ => ‘(objectClass=person)’,
‘d_filter’ =>
‘(userAccountControl:1.2.840.113556.1.4.803:=2)’,
‘tls’ => { verify => “require”, capath =>
“/etc/openldap/certs/cacert.pem” },
‘net_ldap_args’ => [ version => 3, debug => 8 ],
‘attr_match_list’ => [
‘Name’,
‘EmailAddress’,
],
‘attr_map’ => {
‘Name’ => ‘sAMAccountName’,
‘EmailAddress’ => ‘mail’,
‘RealName’ => ‘displayName’,
‘WorkPhone’ => ‘telephoneNumber’,
},
},
} );

Authentication is working fine. Users can log in, if the user doesn’t
exist in RT the account is autocreated. All the configured attributes
are transferred.
But we have problem with encoding of RealName which is mapped from
displayName attribute in MS AD.
For Example:
displayName in MS AD: Matouš Novák
is loaded and saved in RT Real Name as:
RealName: Matouš Novák

Log file:

[6937] [Tue Sep 27 15:59:25 2016] [info]:
RT::User::CanonicalizeUserInfoFromExternalAuth returning Disabled: ,
EmailAddress: novak@domain.com, Gecos: novak, Name: novak, Privileged:
1, RealName: Matouš Novák, WorkPhone:
(/opt/rt4/sbin/…/lib/RT/User.pm:811)

We had similar problem with Moodle. When we configured Moodle against
Active Directory and set cp1250 encoding, then it was doing exactly same
thing. After we changed encoding for LDAP connector to utf-8 then the
names was
corrected.

If you know how we can specify encoding in LDAP configuration that will
be great. I didn’t find any description about encoding option in LDAP
configuration in RT.

I was searching in:

  • RT documentatiton
  • RT comunity wiki
  • RT mailing lists archives
  • google

I found only this question in mailing list but without answer:

Also I red thath MS AD in LDAP protocol version 3 returns any string to
LDAP client in utf-8 encoding.
I really don’t know where could be a problem.

Any help will be appreciated.
Thanks in advance for any hint.

Best regards
Jan Burian

smime.p7s (3.06 KB)

Hi all,

we have RT 4.4.0 on CentOS 7 and Perl v5.22.1. And we are starting to
use RT in production.

We configured RT to authenticate users via LDAP
(RT::Authen::ExternalAuth::LDAP). Our LDAP server is MS AD (Win 2008
R2).
[…]
Authentication is working fine. Users can log in, if the user doesn’t
exist in RT the account is autocreated. All the configured attributes
are transferred.

This is a strong sign that the LDAP part is working correctly. If the
LDAP server (AD) and client (Perl’s Net::LDAP module) are using
mismatched encodings, it is likely to show up in authentication failures
due to incompatible encodings of the same (logical) characters that
8-bit encodings assign to byte values 0x80-0xff.

Fortunately, it is somewhere between arcane and impossible to make
Net::LDAP use anything other than UTF-8. There’s probably some way to
make it do T.61 for ancient-history compatibility, but that’s mostly
pointless.

[…]

We had similar problem with Moodle. When we configured Moodle against
Active Directory and set cp1250 encoding, then it was doing exactly
same
thing. After we changed encoding for LDAP connector to utf-8 then the
names was
corrected.

Which makes sense: LDAP v3 by default uses UTF-8 and you have a modern
system with a mature LDAP client. I know of no way to configure a CentOS
7/Perl 5.22 system such that the LDAP interaction with an AD LDAP server
talking UTF-8 would be the source of this sort of encoding conflict. I’m
mildly surprised that anything talking LDAPv3 can be made to use cp1250
encoding, but I suppose Microsoft makes their own rules to go along with
their own unique code pages.

[…]

Also I red thath MS AD in LDAP protocol version 3 returns any string
to
LDAP client in utf-8 encoding.
I really don’t know where could be a problem.

The most likely place is in your database. I’m guessing that you are
using MySQL, which defaults to latin1 encoding. When you store a UTF-8
string into a latin1 table, it breaks any multi-byte characters into 2
or 3 characters, but the right bits are still there. This issue has come
up a few times on this list over the past decade and I think Best
Practical has documented how to safely convert a RT database with that
sort of problem from latin1 to utf8. It is probably worth looking
through their docs (possibly one of the UPGRADING* files?) and the RT
Wiki for a solution. I expect it could be done with a binary dump of the
database, altering of any latin1 tables to use utf8, and a re-import of
the binary dump. I’m not enough of a MySQL expert to detail that process
(I generally use Postgres where possible.)

Hi Bill,

thank you for your response. Sry not to mention our database.
We use PostreSQL.
After I wrote first email a also checked encoding in database.

The database was with following parameters:
Name | Encoding | Collate | Ctype
rt4 | UTF8 | en_US.UTF-8 | en_US.UTF-8

  1. I dump database with UTF-8 encoding parameter.

  2. Then I drop the databases.

  3. Create new database with following parameters:

    Name | Encoding | Collate | Ctype
    rt4 | UTF8 | cs_CZ.UTF-8 | cs_CZ.UTF-8

  4. And then import database from dump.

But after that change names are loading from LDAP still with bad
characters :-/.

When the user writes first email to queue, then is also autocreated as
unprivileged. If he/she was his/her name in From header, then is used as
RealName RT attribute. But in this case is his/her name saved correctly.

Example from the log - autocreated from LDAP:
[6937] [Tue Sep 27 15:59:25 2016] [info]:
RT::User::CanonicalizeUserInfoFromExternalAuth returning Disabled: ,
EmailAddress: novak@vsup.cz, Gecos: novak, Name: novak, Privileged: 1,
RealName: MatouÅ¡ Novák, WorkPhone: (/opt/rt4/sbin/…/lib/RT/User.pm:811)
[6937] [Tue Sep 27 15:59:25 2016] [info]: Autocreated external user
novak ( 61 ) (/opt/rt4/sbin/…/lib/RT/Authen/ExternalAuth.pm:356)
[6937] [Tue Sep 27 15:59:25 2016] [info]:
RT::Authen::ExternalAuth::LDAP::GetAuth External Auth OK ( My_LDAP ):
novak (/opt/rt4/sbin/…/lib/RT/Authen/ExternalAuth/LDAP.pm:348)
[6937] [Tue Sep 27 15:59:26 2016] [info]:
RT::User::CanonicalizeUserInfoFromExternalAuth returning EmailAddress:
novak@vsup.cz, Name: novak, RealName: Matouš Novák, WorkPhone:
(/opt/rt4/sbin/…/lib/RT/User.pm:811)
*Example from the log - autocreated from email:
[6026] [Mon Oct 10 06:26:02 2016] [info]:
RT::User::CanonicalizeUserInfoFromExternalAuth returning Comments:
Autocreated on ticket submission, Disabled: , EmailAddress:
Tereza.Skvarova@seznam.cz, Name: Tereza.Skvarova@seznam.cz, Privileged:
, RealName: Tereza Škvárová (/opt/rt4/sbin/…/lib/RT/User.pm:811)

Any other ideas?

Best regards
Jan BurianOn 11.10.2016 05:41, Bill Cole wrote:

On 10 Oct 2016, at 16:26, Jan Burian wrote:

Hi all,

we have RT 4.4.0 on CentOS 7 and Perl v5.22.1. And we are starting to
use RT in production.

We configured RT to authenticate users via LDAP
(RT::Authen::ExternalAuth::LDAP). Our LDAP server is MS AD (Win 2008
R2).
[…]
Authentication is working fine. Users can log in, if the user doesn’t
exist in RT the account is autocreated. All the configured attributes
are transferred.

This is a strong sign that the LDAP part is working correctly. If the
LDAP server (AD) and client (Perl’s Net::LDAP module) are using
mismatched encodings, it is likely to show up in authentication
failures due to incompatible encodings of the same (logical)
characters that 8-bit encodings assign to byte values 0x80-0xff.

Fortunately, it is somewhere between arcane and impossible to make
Net::LDAP use anything other than UTF-8. There’s probably some way
to make it do T.61 for ancient-history compatibility, but that’s
mostly pointless.

[…]

We had similar problem with Moodle. When we configured Moodle against
Active Directory and set cp1250 encoding, then it was doing exactly same
thing. After we changed encoding for LDAP connector to utf-8 then the
names was
corrected.

Which makes sense: LDAP v3 by default uses UTF-8 and you have a modern
system with a mature LDAP client. I know of no way to configure a
CentOS 7/Perl 5.22 system such that the LDAP interaction with an AD
LDAP server talking UTF-8 would be the source of this sort of encoding
conflict. I’m mildly surprised that anything talking LDAPv3 can be
made to use cp1250 encoding, but I suppose Microsoft makes their own
rules to go along with their own unique code pages.

[…]

Also I red thath MS AD in LDAP protocol version 3 returns any string to
LDAP client in utf-8 encoding.
I really don’t know where could be a problem.

The most likely place is in your database. I’m guessing that you are
using MySQL, which defaults to latin1 encoding. When you store a UTF-8
string into a latin1 table, it breaks any multi-byte characters into 2
or 3 characters, but the right bits are still there. This issue has
come up a few times on this list over the past decade and I think Best
Practical has documented how to safely convert a RT database with that
sort of problem from latin1 to utf8. It is probably worth looking
through their docs (possibly one of the UPGRADING* files?) and the RT
Wiki for a solution. I expect it could be done with a binary dump of
the database, altering of any latin1 tables to use utf8, and a
re-import of the binary dump. I’m not enough of a MySQL expert to
detail that process (I generally use Postgres where possible.)

RT 4.4 and RTIR training sessions, and a new workshop day!
https://bestpractical.com/training

  • Boston - October 24-26
  • Los Angeles - Q1 2017

smime.p7s (3.06 KB)

Hi all,

I finally resolved the issue with support from RT engineers. So big
thanks to them.
I’m posting the fix, if someone will be interested (maybe in the
future), so it can be found in list archive.

Here is answer from RT engineers:

/We use Net::LDAP and there is an option called ‘raw’ that might properly
convert the incoming content to utf8. That’s the first thing to try
since we pass parameters through to Net::LDAP and you can put it right
in the config file. //https://metacpan.org/pod/distribution/perl-ldap/lib/Net/LDAP.pod//However, there is likely another bit of code we need to add to RT to be
explicit about the incoming text and treat it as utf8 when told to do
so. We can file it as a bug, or provide some commercial assistance if
you are interested. /

So I add

raw => qr/(?i:^jpegPhoto|;binary)/

as net_ldap_args parameter in RT_SiteConfig.pm.

Now it is all working fine, the names are imported correctly from LDAP
(MS AD, LDAP protocol version 3).
I also suggested to add information about raw option with example to RT
docs.

Best regards
Jan BurianOn 11.10.2016 11:51, Jan Burian wrote:

Hi Bill,

thank you for your response. Sry not to mention our database.
We use PostreSQL.
After I wrote first email a also checked encoding in database.

The database was with following parameters:
Name | Encoding | Collate | Ctype
-------------±------------±----------------±-----------------
rt4 | UTF8 | en_US.UTF-8 | en_US.UTF-8

  1. I dump database with UTF-8 encoding parameter.

  2. Then I drop the databases.

  3. Create new database with following parameters:

    Name | Encoding | Collate | Ctype
    -------------±------------±----------------±-----------------
    rt4 | UTF8 | cs_CZ.UTF-8 | cs_CZ.UTF-8

  4. And then import database from dump.

But after that change names are loading from LDAP still with bad
characters :-/.

When the user writes first email to queue, then is also autocreated as
unprivileged. If he/she was his/her name in From header, then is used
as RealName RT attribute. But in this case is his/her name saved
correctly.

Example from the log - autocreated from LDAP:
[6937] [Tue Sep 27 15:59:25 2016] [info]:
RT::User::CanonicalizeUserInfoFromExternalAuth returning Disabled: ,
EmailAddress: novak@vsup.cz, Gecos: novak, Name: novak, Privileged: 1,
RealName: Matouš Novák, WorkPhone:
(/opt/rt4/sbin/…/lib/RT/User.pm:811)
[6937] [Tue Sep 27 15:59:25 2016] [info]: Autocreated external user
novak ( 61 ) (/opt/rt4/sbin/…/lib/RT/Authen/ExternalAuth.pm:356)
[6937] [Tue Sep 27 15:59:25 2016] [info]:
RT::Authen::ExternalAuth::LDAP::GetAuth External Auth OK ( My_LDAP ):
novak (/opt/rt4/sbin/…/lib/RT/Authen/ExternalAuth/LDAP.pm:348)
[6937] [Tue Sep 27 15:59:26 2016] [info]:
RT::User::CanonicalizeUserInfoFromExternalAuth returning EmailAddress:
novak@vsup.cz, Name: novak, RealName: Matouš Novák, WorkPhone:
(/opt/rt4/sbin/…/lib/RT/User.pm:811)
*
*Example from the log - autocreated from email:
[6026] [Mon Oct 10 06:26:02 2016] [info]:
RT::User::CanonicalizeUserInfoFromExternalAuth returning Comments:
Autocreated on ticket submission, Disabled: , EmailAddress:
Tereza.Skvarova@seznam.cz, Name: Tereza.Skvarova@seznam.cz,
Privileged: , RealName: Tereza Škvárová
(/opt/rt4/sbin/…/lib/RT/User.pm:811)

Any other ideas?

Best regards
Jan Burian

On 11.10.2016 05:41, Bill Cole wrote:

On 10 Oct 2016, at 16:26, Jan Burian wrote:

Hi all,

we have RT 4.4.0 on CentOS 7 and Perl v5.22.1. And we are starting to
use RT in production.

We configured RT to authenticate users via LDAP
(RT::Authen::ExternalAuth::LDAP). Our LDAP server is MS AD (Win 2008
R2).
[…]
Authentication is working fine. Users can log in, if the user doesn’t
exist in RT the account is autocreated. All the configured attributes
are transferred.

This is a strong sign that the LDAP part is working correctly. If the
LDAP server (AD) and client (Perl’s Net::LDAP module) are using
mismatched encodings, it is likely to show up in authentication
failures due to incompatible encodings of the same (logical)
characters that 8-bit encodings assign to byte values 0x80-0xff.

Fortunately, it is somewhere between arcane and impossible to make
Net::LDAP use anything other than UTF-8. There’s probably some way
to make it do T.61 for ancient-history compatibility, but that’s
mostly pointless.

[…]

We had similar problem with Moodle. When we configured Moodle against
Active Directory and set cp1250 encoding, then it was doing exactly
same
thing. After we changed encoding for LDAP connector to utf-8 then the
names was
corrected.

Which makes sense: LDAP v3 by default uses UTF-8 and you have a
modern system with a mature LDAP client. I know of no way to
configure a CentOS 7/Perl 5.22 system such that the LDAP interaction
with an AD LDAP server talking UTF-8 would be the source of this sort
of encoding conflict. I’m mildly surprised that anything talking
LDAPv3 can be made to use cp1250 encoding, but I suppose Microsoft
makes their own rules to go along with their own unique code pages.

[…]

Also I red thath MS AD in LDAP protocol version 3 returns any string to
LDAP client in utf-8 encoding.
I really don’t know where could be a problem.

The most likely place is in your database. I’m guessing that you are
using MySQL, which defaults to latin1 encoding. When you store a
UTF-8 string into a latin1 table, it breaks any multi-byte characters
into 2 or 3 characters, but the right bits are still there. This
issue has come up a few times on this list over the past decade and I
think Best Practical has documented how to safely convert a RT
database with that sort of problem from latin1 to utf8. It is
probably worth looking through their docs (possibly one of the
UPGRADING* files?) and the RT Wiki for a solution. I expect it could
be done with a binary dump of the database, altering of any latin1
tables to use utf8, and a re-import of the binary dump. I’m not
enough of a MySQL expert to detail that process (I generally use
Postgres where possible.)

RT 4.4 and RTIR training sessions, and a new workshop day!
https://bestpractical.com/training

  • Boston - October 24-26
  • Los Angeles - Q1 2017

RT 4.4 and RTIR training sessions, and a new workshop day! https://bestpractical.com/training

  • Boston - October 24-26
  • Los Angeles - Q1 2017

smime.p7s (3.06 KB)

Hi Bill,

thank you for your response. Sry not to mention our database.
We use PostreSQL.
After I wrote first email a also checked encoding in database.

The database was with following parameters:
Name | Encoding | Collate | Ctype
-------------±------------±----------------±-----------------
rt4 | UTF8 | en_US.UTF-8 | en_US.UTF-8

And so my beautiful theory is destroyed by your brutal facts. :slight_smile:

  1. I dump database with UTF-8 encoding parameter.

  2. Then I drop the databases.

  3. Create new database with following parameters:

    Name | Encoding | Collate | Ctype
    -------------±------------±----------------±-----------------
    rt4 | UTF8 | cs_CZ.UTF-8 | cs_CZ.UTF-8

  4. And then import database from dump.

But after that change names are loading from LDAP still with bad
characters :-/.

Indeed: the Collate and Ctype parameters are encoding-specific rulesets
for how characters are related to each other, not variations on
encoding.

When the user writes first email to queue, then is also autocreated as
unprivileged. If he/she was his/her name in From header, then is used
as
RealName RT attribute. But in this case is his/her name saved
correctly.

Example from the log - autocreated from LDAP:
[6937] [Tue Sep 27 15:59:25 2016] [info]:
RT::User::CanonicalizeUserInfoFromExternalAuth returning Disabled: ,
EmailAddress: novak@vsup.cz, Gecos: novak, Name: novak, Privileged: 1,
RealName: Matouš Novák, WorkPhone:
(/opt/rt4/sbin/…/lib/RT/User.pm:811)
[6937] [Tue Sep 27 15:59:25 2016] [info]: Autocreated external user
novak ( 61 ) (/opt/rt4/sbin/…/lib/RT/Authen/ExternalAuth.pm:356)
[6937] [Tue Sep 27 15:59:25 2016] [info]:
RT::Authen::ExternalAuth::LDAP::GetAuth External Auth OK ( My_LDAP ):
novak (/opt/rt4/sbin/…/lib/RT/Authen/ExternalAuth/LDAP.pm:348)
[6937] [Tue Sep 27 15:59:26 2016] [info]:
RT::User::CanonicalizeUserInfoFromExternalAuth returning EmailAddress:
novak@vsup.cz, Name: novak, RealName: Matouš Novák, WorkPhone:
(/opt/rt4/sbin/…/lib/RT/User.pm:811)
*
*Example from the log - autocreated from email:
[6026] [Mon Oct 10 06:26:02 2016] [info]:
RT::User::CanonicalizeUserInfoFromExternalAuth returning Comments:
Autocreated on ticket submission, Disabled: , EmailAddress:
Tereza.Skvarova@seznam.cz, Name: Tereza.Skvarova@seznam.cz,
Privileged:
, RealName: Tereza Škvárová (/opt/rt4/sbin/…/lib/RT/User.pm:811)

Any other ideas?

Yes: At least one of your FCGI handlers (PID 6937) is using an 8-bit
encoding and at least one (PID 6026) is using UTF-8.

Note that both of those cases are being logged by the
RT::User::CanonicalizeUserInfoFromExternalAuth method, which uses LDAP
to retrieve the attribute it uses for the “RealName” field in RT. The
first was logged by process 6937, the second by process 6026.

The reason for that is a bit of a mystery. It’s clear that the 2
processes were not started near the same time (unless that server is
VERY busy spawning processes) so if you can determine what was different
about how they were launched (likely a involving a locale environment
variable, most likely LANG or LC_ALL) you can probably make sure that
the improper launch doesn’t happen.