Skip to Main Content

Berkeley DB Family

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

BDB 4.2.52: Threads hang waiting for mutex

952529Jul 27 2012
We have come across a condition where multiple application threads hang waiting for mutex locks. This is a telecom system which has been running fine for many months before we ran into this issue. The backtrace of all the threads (12 threads are stuck) shows the following:

11 threads are stuck here:

#0 0xb7cfe1d7 in ?? () from /lib/libc.so.6
#1 0x08195b24 in __os_sleep (dbenv=0x0, secs=0, usecs=10000) at ../../mvl-cge-3.1/src/dist/../os/os_sleep.c:84
#2 0x08195c3a in __os_yield (dbenv=0x0, usecs=10000) at ../../mvl-cge-3.1/src/dist/../os/os_spin.c:112
#3 0x081a045e in __db_tas_mutex_lock (dbenv=0x84a3dc8, mutexp=0x97b3ae90) at ../../mvl-cge-3.1/src/dist/../mutex/mut_tas.c:169
#4 0x0817d156 in __kw_tas_mutex_lock (a=0x84a3dc8, b=0x97b3ae90) at ../../mvl-cge-3.1/src/dist/../dbinc/mutex.h:882
#5 0x0817ee7d in __lock_get_internal (lt=0x84dfbb8, locker=2627936712, flags=0, obj=0x88c2bc4, lock_mode=DB_LOCK_WRITE, timeout=0, lock=0x88c2c6c)
at ../../mvl-cge-3.1/src/dist/../lock/lock.c:990
#6 0x0817e0da in __lock_get (dbenv=0x84a3dc8, locker=2627936712, flags=0, obj=0x88c2bc4, lock_mode=DB_LOCK_WRITE, lock=0x88c2c6c)
at ../../mvl-cge-3.1/src/dist/../lock/lock.c:586
#7 0x081eec04 in __db_lget (dbc=0x88c2b58, action=0, pgno=4, mode=DB_LOCK_WRITE, lkflags=0, lockp=0x88c2c6c) at ../../mvl-cge-3.1/src/dist/../db/db_meta.c:459
#8 0x0821906d in __ham_lock_bucket (dbc=0x88c2b58, mode=DB_LOCK_WRITE) at ../../mvl-cge-3.1/src/dist/../hash/hash_page.c:1659
#9 0x08218e08 in __ham_get_cpage (dbc=0x88c2b58, mode=DB_LOCK_WRITE) at ../../mvl-cge-3.1/src/dist/../hash/hash_page.c:1572
#10 0x082148a8 in __ham_item_next (dbc=0x88c2b58, mode=DB_LOCK_WRITE, pgnop=0x914fdfe8) at ../../mvl-cge-3.1/src/dist/../hash/hash_page.c:386
#11 0x0820edf6 in __ham_lookup (dbc=0x88c2b58, key=0x914fe500, sought=0, mode=DB_LOCK_WRITE, pgnop=0x914fdfe8) at ../../mvl-cge-3.1/src/dist/../hash/hash.c:1706
#12 0x0820bbc5 in __ham_c_get (dbc=0x88c2b58, key=0x914fe500, data=0x914fe050, flags=28, pgnop=0x914fdfe8) at ../../mvl-cge-3.1/src/dist/../hash/hash.c:478
#13 0x081e1d5b in __db_c_get (dbc_arg=0x88ab2b0, key=0x914fe500, data=0x914fe050, flags=28) at ../../mvl-cge-3.1/src/dist/../db/db_cam.c:643
#14 0x081d9b98 in __db_del (dbp=0x88aabf0, txn=0x8a07f20, key=0x914fe500, flags=0) at ../../mvl-cge-3.1/src/dist/../db/db_am.c:533
#15 0x081e992f in __db_del_pp (dbp=0x88aabf0, txn=0x8a07f20, key=0x914fe500, flags=0) at ../../mvl-cge-3.1/src/dist/../db/db_iface.c:444

1 thread is stuck here:

#0 0xb7cfe1d7 in ?? () from /lib/libc.so.6
#1 0x08195b24 in __os_sleep (dbenv=0x0, secs=0, usecs=25000) at ../../mvl-cge-3.1/src/dist/../os/os_sleep.c:84
#2 0x08195c3a in __os_yield (dbenv=0x0, usecs=25000) at ../../mvl-cge-3.1/src/dist/../os/os_spin.c:112
#3 0x081a045e in __db_tas_mutex_lock (dbenv=0x84a3dc8, mutexp=0x995e6460) at ../../mvl-cge-3.1/src/dist/../mutex/mut_tas.c:169
#4 0x08192c95 in __kw_tas_mutex_lock (a=0x84a3dc8, b=0x995e6460) at ../../mvl-cge-3.1/src/dist/../dbinc/mutex.h:882
#5 0x08192f84 in __memp_sync_int (dbenv=0x84a3dc8, dbmfp=0x0, trickle_max=0, op=DB_SYNC_CACHE, wrotep=0x0) at ../../mvl-cge-3.1/src/dist/../mp/mp_sync.c:247
#6 0x08192bc8 in __memp_sync (dbenv=0x84a3dc8, lsnp=0x0) at ../../mvl-cge-3.1/src/dist/../mp/mp_sync.c:99
#7 0x0819ab25 in __txn_checkpoint (dbenv=0x84a3dc8, kbytes=0, minutes=0, flags=0) at ../../mvl-cge-3.1/src/dist/../txn/txn.c:1387
#8 0x0819a853 in __txn_checkpoint_pp (dbenv=0x84a3dc8, kbytes=0, minutes=0, flags=0) at ../../mvl-cge-3.1/src/dist/../txn/txn.c:1286


The dump of mutexp at frame #3 of all threads shows tas is set to 0x1;

(gdb) p/x *mutexp
$2 = {tas = 0x1, locked = 0x0, mutex_set_wait = 0x0, mutex_set_nowait = 0x0, mutex_set_spin = 0x0, mutex_set_spins = 0x0, flags = 0xc}


I see a few discussion on similar conditions here but didnt see any solution proposed. Does this look like a BDB bug? Any helpful hints would be much appreciated.

Note: We didn't have access to db files when this occurred. Will try to get it if it happens next time.

Thanks,
Peter
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Aug 24 2012
Added on Jul 27 2012
0 comments
405 views