Dear everyone,
It's really hard to find you and offer this:
(and I found no way to directly find Solaris guys in the Oracle support messy mud, please tell me how to contact them if this is not proper place).
- 0. We have implemented our own nss module, named 'cdc' here.
- 1. On Solaris 9 with latest patch(patchID 122301-70 on Sol9 x86), (sometimes Solaris 10 may get this too):
bash-3.2# uname -a
SunOS sunos9 5.9 Generic_118559-11 i86pc i386 i86pc
bash-3.2# showrev -p | grep 122301-70
Patch: 122301-70 Obsoletes: 113108-01, ...
SUNWypr, SUNWypu
when /etc/nsswitch.conf has 'passwd' line set as 'passwd: cdc files nis' yet NIS is not actually
used/set and working, some 'foreach getpwent(3)' operations such as 'useradd user' may get core dump.
bash-3.2# cat /etc/nsswitch.conf |grep passwd:
passwd: cdc files nis
bash-3.2# ./foreach_getent
Segmentation Fault (core dumped)
- 2. the simplest code to trigger the core dump is:
bash-3.2# cat foreach_getent.c
#include <pwd.h>
int main()
{
while (getpwent ());
endpwent();
return 0;
}
- 3. We found this shall be a system libc(3) issue in Solaris:
when 'passwd' has 'files nis', its libc routines will try to free the source
from nss_files.so twice and then crash,
the reason that 'cdc' in the line triggers the crash is related to heap memory free(3) behavior in that libc(3),
Solaris libc(3) will maintain a list (array:flist) with size of 32 and holds the pointers to be freed, and will only
do realfree() for them when list is full or realloc(3) called,
so as default, the release of our nss_cdc.so source will bring nearly 32 pointers to be added into that list and cause the list to be full
then the previously added nss_files.so source pointer will be freed before the next incorrect try from libc to use it and thus crash.
- 4. How to prove 3.:
- 4.1 setup 'passwd' line in nsswitch.conf has only 'passwd: files nis', yet do not use nis(aka, no nis server/client running in machine)
bash-3.2# cat /etc/nsswitch.conf |grep passwd:
passwd: files nis
bash-3.2# ps -ef|grep yp||echo NONE
NONE
- 4.2 run the test prog in gdb:
bash-3.2# gcc foreach_getent.c -g -o foreach_getent
bash-3.2# gdb ./foreach_getent
GNU gdb 6.8
...
(gdb) start
Breakpoint 1 at 0x80506fd: file foreach_getent.c, line 4.
Starting program: /foreach_getent
main () at foreach_getent.c:4
4 while (getpwent ());
(gdb) c
Continuing.
Breakpoint 2, 0xd1272531 in _nss_files_destr () from /usr/lib/nss_files.so.1
(gdb) bt
#0 0xd1272531 in _nss_files_destr () from /usr/lib/nss_files.so.1
#1 0xd12efd2f in nss_put_backend_u () from /usr/lib/libc.so.1
#2 0xd12f0a7b in end_iter_u () from /usr/lib/libc.so.1
#3 0xd12f0f83 in nss_endent_u () from /usr/lib/libc.so.1
#4 0xd12f0ef9 in nss_getent_u () from /usr/lib/libc.so.1
#5 0xd12f0967 in nss_getent () from /usr/lib/libc.so.1
#6 0xd132e794 in getpwent_r () from /usr/lib/libc.so.1
#7 0xd12d7beb in getpwent () from /usr/lib/libc.so.1
#8 0x08050702 in main () at foreach_getent.c:4
- 4.3 now _nss_files_destr () is called to free nss_files.so sources, within it, it will free the nss_files backend data (aka the ptr that's used to get such function):
0xd1272593 <_nss_files_destr+114>: push %eax
0xd1272594 <_nss_files_destr+115>: call 0xd1271448 <free@plt>
(gdb) p/x $eax
$2 = 0x8062770
(gdb) p/x **$eax
$3 = 0xd1272521
(gdb) info sym 0xd1272521
_nss_files_destr in section .text
- 4.4, after this, the data 0x08062770 shall be freed or in the flist[32] to be freed later, anyway, it SHALLNOT be used anymore:
(gdb) fin
Run till exit from #0 0xd1272594 in _nss_files_destr () from /usr/lib/nss_files.so.1
0xd12efd2f in nss_put_backend_u () from /usr/lib/libc.so.1
(gdb) p *0xd135504c
$4 = 3
(gdb) x/4 0xd135506c
0xd135506c <flist>: 0x080613a0 0x08063170 0x08062770 0x00000000
- 4.5 But now we find then libc uses that ptr to call the _nss_files_destr() again:
(gdb) c
Continuing.
Breakpoint 4, 0xd12f005d in _nss_db_state_destr () from /usr/lib/libc.so.1
0xd12f00ad <_nss_db_state_destr+97>: push %eax
0xd12f00ae <_nss_db_state_destr+98>: call 0xd12eff71 <_nss_src_state_destr>
(gdb) p/x $eax
$10 = 0x8061338
(gdb) p/x *($eax+0x20)
$11 = 0x8062770
(gdb) info sym (void*)***($eax+0x20)
_nss_files_destr in section .text
(gdb) c
Continuing.
Breakpoint 2, 0xd1272531 in _nss_files_destr () from /usr/lib/nss_files.so.1
0xd1272531 <_nss_files_destr+16>: cmpl $0x0,0x8(%ebp)
(gdb) bt
#0 0xd1272531 in _nss_files_destr () from /usr/lib/nss_files.so.1
#1 0xd12f0024 in _nss_src_state_destr () from /usr/lib/libc.so.1
#2 0xd12f00b3 in _nss_db_state_destr () from /usr/lib/libc.so.1
#3 0xd12f012f in nss_delete () from /usr/lib/libc.so.1
#4 0xd12f0f04 in nss_getent_u () from /usr/lib/libc.so.1
#5 0xd12f0967 in nss_getent () from /usr/lib/libc.so.1
#6 0xd132e794 in getpwent_r () from /usr/lib/libc.so.1
#7 0xd12d7beb in getpwent () from /usr/lib/libc.so.1
#8 0x08050702 in main () at foreach_getent.c:4
- 4.6 so far as there are not enough ptrs to be added into flist or no realloc(3) called between the two callings
to _nss_files_destr, it will not crash, but the issue is clear now, and we can prove it by force a realfree()
in such inverval after 4.4:
(gdb) x/4 0xd135506c
0xd135506c <flist>: 0x080613a0 0x08063170 0x08062770 0x00000000
(gdb) call realfree(0x08062770)
$15 = -785035160
(gdb) p/x *0x8061358
$17 = 0x8062770
(gdb) p/x ***0x8061358
$19 = 0x20
(gdb) c
Continuing.
Program received signal SIGSEGV, Segmentation fault.
0x00000020 in ?? ()
(gdb) bt
#0 0x00000020 in ?? ()
#1 0xd12f0024 in _nss_src_state_destr () from /usr/lib/libc.so.1
#2 0xd12f00b3 in _nss_db_state_destr () from /usr/lib/libc.so.1
#3 0xd12f012f in nss_delete () from /usr/lib/libc.so.1
#4 0xd12f0f04 in nss_getent_u () from /usr/lib/libc.so.1
#5 0xd12f0967 in nss_getent () from /usr/lib/libc.so.1
#6 0xd132e794 in getpwent_r () from /usr/lib/libc.so.1
#7 0xd12d7beb in getpwent () from /usr/lib/libc.so.1
#8 0x08050702 in main () at foreach_getent.c:4
(gdb)
It crashed as expected, this definitely proved that libc tried to use the same data (not the re-allocated and happens to be at same addr) that
is already freed.
Please check this, and if more data needed, please contact us, thanks!