RT::Authen::ExternalAuth::LDAP, Net::LDAP, Net::SSLeay, SEGV

I’m currently hunting an irritating and elusive bug right at the moment.

I’m trying to use an external info provider that uses LDAP to supply user
details. Unfortunately, the trace I’m seeing looks like this…

_GetBoundLdapObj calls Net::LDAP (/data/rt/local/plugins/RT-Authen-ExternalAuth/lib/RT/Authen/ExternalAuth/LDAP.pm:434)

… at which point there’s a segfault deep inside SSLeay. This only
happens if I try to use SSL - either start_tls or scheme => ‘ldaps’ when
creating the Net::LDAP object.

I can do a brute-force extraction of a test-case that goes via RT’s
libraries to reproduce this reliably, and the issue doesn’t seem to be
directly inside RT. However, and incredibly irritatingly, when I take the
RT libraries out-of-the-loop and just create a test-case that calls
Net::LDAP directly, it works perfectly.

In other words: this bug appears to only get tickled inside RT for some
reason.

I take it that RT (or RT::Authen::ExternalAuth) don’t muck around with
internal SSLeay settings that might cause this, in a way that I’ve missed?

Has anyone else seen this?

This is on a shiny new debian lenny (although I’ve been seeing the
problem for a while now) [2.6.32-trunk-686 #1 SMP] - ie, 32-bit.

Cheers,
jan

jan grant, ISYS, University of Bristol. http://www.bris.ac.uk/
Tel +44 (0)117 3317661 http://ioctl.org/jan/
Generalisation is never appropriate.

[on Net::LDAP segfaulting…]

In other words: this bug appears to only get tickled inside RT for some
reason.

I take it that RT (or RT::Authen::ExternalAuth) don’t muck around with
internal SSLeay settings that might cause this, in a way that I’ve missed?

Looks like my guess was almost right. We’re (for better or worse) an
oracle shop. The backtrace looks like this:

[[[
% sudo -u www-data gdb --args perl ./try-ldap-canonicalize.pl
(gdb) run
Starting program: /usr/bin/perl ./try-ldap-canonicalize.pl
[Thread debugging using libthread_db enabled]

Program received signal SIGSEGV, Segmentation fault.
0xb6cb51bb in ?? () from /usr/lib/oracle/10.2.0.4/client/lib/libnnz10.so
(gdb) bt
#0 0xb6cb51bb in ?? () from
/usr/lib/oracle/10.2.0.4/client/lib/libnnz10.so
#1 0xb6cb48be in ?? () from
/usr/lib/oracle/10.2.0.4/client/lib/libnnz10.so
#2 0xb2255bfc in BN_MONT_CTX_set () from
/usr/lib/i686/cmov/libcrypto.so.0.9.8
#3 0xb2255ee9 in BN_MONT_CTX_set_locked ()
from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#4 0xb226c8f4 in ?? () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#5 0xb226d2ae in RSA_public_encrypt () from
/usr/lib/i686/cmov/libcrypto.so.0.9.8
#6 0xb2349140 in ssl3_send_client_key_exchange ()
from /usr/lib/i686/cmov/libssl.so.0.9.8
#7 0xb234cb4b in ssl3_connect () from /usr/lib/i686/cmov/libssl.so.0.9.8
#8 0xb23626ea in SSL_connect () from /usr/lib/i686/cmov/libssl.so.0.9.8
#9 0xb23540e3 in ssl23_connect () from /usr/lib/i686/cmov/libssl.so.0.9.8
#10 0xb23626ea in SSL_connect () from /usr/lib/i686/cmov/libssl.so.0.9.8
#11 0xb23c4605 in XS_Net__SSLeay_connect ()
from /usr/lib/perl5/auto/Net/SSLeay/SSLeay.so
#12 0x080d5d7b in Perl_pp_entersub ()
#13 0x080d4358 in Perl_runops_standard ()
#14 0x08079355 in perl_run ()
#15 0x080642fd in main ()
]]]

So the question is, has anyone seen this and do they have a workaround? (I
ask here because that’s where the original question was. If my esteemed
colleague Dr. Google comes up with an answer, I’ll follow up.)

jan grant, ISYS, University of Bristol. http://www.bris.ac.uk/
Tel +44 (0)117 3317661 http://ioctl.org/jan/
stty intr ^m

[on Net::LDAP segfaulting…]

In other words: this bug appears to only get tickled inside RT for some
reason.

I take it that RT (or RT::Authen::ExternalAuth) don’t muck around with
internal SSLeay settings that might cause this, in a way that I’ve missed?

Looks like my guess was almost right. We’re (for better or worse) an
oracle shop. The backtrace looks like this:

[[[
% sudo -u www-data gdb --args perl ./try-ldap-canonicalize.pl
(gdb) run
Starting program: /usr/bin/perl ./try-ldap-canonicalize.pl
[Thread debugging using libthread_db enabled]

Program received signal SIGSEGV, Segmentation fault.
0xb6cb51bb in ?? () from /usr/lib/oracle/10.2.0.4/client/lib/libnnz10.so
(gdb) bt
#0 0xb6cb51bb in ?? () from
/usr/lib/oracle/10.2.0.4/client/lib/libnnz10.so
#1 0xb6cb48be in ?? () from
/usr/lib/oracle/10.2.0.4/client/lib/libnnz10.so
#2 0xb2255bfc in BN_MONT_CTX_set () from
/usr/lib/i686/cmov/libcrypto.so.0.9.8
#3 0xb2255ee9 in BN_MONT_CTX_set_locked ()
from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#4 0xb226c8f4 in ?? () from /usr/lib/i686/cmov/libcrypto.so.0.9.8
#5 0xb226d2ae in RSA_public_encrypt () from
/usr/lib/i686/cmov/libcrypto.so.0.9.8
#6 0xb2349140 in ssl3_send_client_key_exchange ()
from /usr/lib/i686/cmov/libssl.so.0.9.8
#7 0xb234cb4b in ssl3_connect () from /usr/lib/i686/cmov/libssl.so.0.9.8
#8 0xb23626ea in SSL_connect () from /usr/lib/i686/cmov/libssl.so.0.9.8
#9 0xb23540e3 in ssl23_connect () from /usr/lib/i686/cmov/libssl.so.0.9.8
#10 0xb23626ea in SSL_connect () from /usr/lib/i686/cmov/libssl.so.0.9.8
#11 0xb23c4605 in XS_Net__SSLeay_connect ()
from /usr/lib/perl5/auto/Net/SSLeay/SSLeay.so
#12 0x080d5d7b in Perl_pp_entersub ()
#13 0x080d4358 in Perl_runops_standard ()
#14 0x08079355 in perl_run ()
#15 0x080642fd in main ()
]]]

So the question is, has anyone seen this and do they have a workaround? (I
ask here because that’s where the original question was. If my esteemed
colleague Dr. Google comes up with an answer, I’ll follow up.)

Okay. Looking at it, the suggestion is (since this is a symbol conflict in
the oracle client library) to set LD_PRELOAD=/usr/lib/libcrypto.{version}
to force that to load first. Whether that’ll kill the oracle client
library at the same time, I’m about to find out.

jan grant, ISYS, University of Bristol. http://www.bris.ac.uk/
Tel +44 (0)117 3317661 http://ioctl.org/jan/
I am now available for general use under a modified BSD licence.

Okay. Looking at it, the suggestion is (since this is a symbol conflict in
the oracle client library) to set LD_PRELOAD=/usr/lib/libcrypto.{version}
to force that to load first. Whether that’ll kill the oracle client
library at the same time, I’m about to find out.

I’ve added

FcgidInitialEnv LD_PRELOAD /usr/lib/libcrypto.so.0.9.8

to the virtual host configuration, restarted, and that looks happy: which
is to say, the oracle client is still talking to the oracle server.

I’ll try turning the LDAP integration back on shortly.

jan grant, ISYS, University of Bristol. http://www.bris.ac.uk/
Tel +44 (0)117 3317661 http://ioctl.org/jan/
…and then three milkmaids turned up
(to the delight and delactation of the crowd).