Skip to Main Content

Berkeley DB Family

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

BerkeleyDB hangs, but no deadlock

user6435361Aug 21 2009 — edited Sep 8 2009
Hi,

I'm using BerkeleyDB 4.7.25 on Windows. I'm using the Java bindings (but not the pure Java version, just the JNI bindings that come with the distribution).

My application is multi-threaded and relies heavily on deadlock detection to resolve read/write conflicts - all access to the database environment is within a BDB transaction. I have a nasty intermittent problem that happens about a day or two after startup: the application hangs and a Java thread dump shows that the only threads that are actually doing something (besides waiting on IO or on Java mutexes) are accessing the database - one thread is reading and the other is writing and both get hung on the same spot after each restart.

The environment is configured to randomly abort transactions in case of a deadlock, but there's no deadlock. The command line utility doesn't find one and manual inspection of the 'db_stat -Co' output doesn't reveal one either. The process is eating up 50% of the machine's CPU so it seems that those threads are in some infinite loop. Here are the relevant stack trace portions:

"Reading" thread:

java.lang.Thread.State: RUNNABLE
at com.sleepycat.db.internal.db_javaJNI.Dbc_get(Native Method)
at com.sleepycat.db.internal.Dbc.get(Dbc.java:57)
at com.sleepycat.db.Cursor.getSearchKey(Cursor.java:614)
at org.hypergraphdb.storage.DefaultIndexImpl.count(DefaultIndexImpl.java:492)

From db_stat -Co I was able to associate the following locks with this thread:

801aeb83 READ 1 WAIT hgstore_idx_HGATOMTYPE page 4034
801aeb83 READ 1 HELD hgstore_idx_HGATOMTYPE page 1
801aeb83 READ 1 HELD hgstore_idx_hg_typesystem_java2hg_types page 3
801aeb83 READ 336 HELD hgstore_idx_HGATOMTYPE page 1306

"Writing" thread:

java.lang.Thread.State: RUNNABLE
at com.sleepycat.db.internal.db_javaJNI.DbTxn_commit0(Native Method)
at com.sleepycat.db.internal.DbTxn.commit0(DbTxn.java:88)
at com.sleepycat.db.internal.DbTxn.commit(DbTxn.java:45)
at com.sleepycat.db.Transaction.commit(Transaction.java:113)
at org.hypergraphdb.transaction.TransactionBDBImpl.commit(TransactionBDBImpl.java:48)

and the locks associated with this transaction are:

801aeb5a WRITE 1 HELD datadb page 21062
801aeb5a WRITE 1 HELD datadb page 21046
801aeb5a WRITE 1 HELD datadb page 22507
801aeb5a WRITE 1 HELD datadb page 19262
801aeb5a WRITE 1 HELD datadb page 18105
801aeb5a WRITE 1 HELD datadb page 29347
801aeb5a WRITE 1 HELD datadb page 29445
801aeb5a WRITE 1 HELD datadb page 6216
801aeb5a WRITE 1 HELD datadb page 7070
801aeb5a WRITE 1 HELD datadb page 7215
801aeb5a WRITE 1 HELD hgstore_idx_HGATOMVALUE page 4324
801aeb5a WRITE 1 HELD hgstore_idx_HGATOMVALUE page 4741
801aeb5a WRITE 1 HELD incidencedb page 6127
801aeb5a WRITE 1 HELD hgstore_idx_HGATOMVALUE page 6028
801aeb5a WRITE 1 HELD incidencedb page 2682
801aeb5a WRITE 1 HELD incidencedb page 2946
801aeb5a WRITE 1 HELD incidencedb page 3155
801aeb5a WRITE 1 HELD hgstore_idx_HGATOMVALUE page 3219
801aeb5a WRITE 1 HELD hgstore_idx_HGATOMVALUE page 3401
801aeb5a WRITE 7 HELD hgstore_idx_HGATOMTYPE page 4034
801aeb5a WRITE 1 HELD incidencedb page 396
801aeb5a WRITE 1 HELD hgstore_idx_HGATOMVALUE page 1057
801aeb5a WRITE 1 HELD datadb page 1528
801aeb5a WRITE 1 HELD datadb page 1806
801aeb5a WRITE 1 HELD hgstore_idx_HGATOMVALUE page 1850
801aeb5a WRITE 1 HELD datadb page 15023
801aeb5a WRITE 1 HELD datadb page 16053
801aeb5a WRITE 1 HELD incidencedb page 13148
801aeb5a WRITE 1 HELD incidencedb page 11310
801aeb5a WRITE 1 HELD hgstore_idx_5e9c4a7c-2945-4533-9c71-09ba486419cb_7b431657-ea98-4039-b278-255d94c6281c page 709
801aeb5a WRITE 1 HELD hgstore_idx_5e9c4a7c-2945-4533-9c71-09ba486419cb_7b431657-ea98-4039-b278-255d94c6281c page 907
801aeb5a WRITE 1 HELD datadb page 8674
801aeb5a WRITE 1 HELD hgstore_idx_5e9c4a7c-2945-4533-9c71-09ba486419cb_7b431657-ea98-4039-b278-255d94c6281c page 782
801aeb5a WRITE 1 HELD hgstore_idx_5e9c4a7c-2945-4533-9c71-09ba486419cb_7b431657-ea98-4039-b278-255d94c6281c page 416
801aeb5a WRITE 8 HELD incidencedb page 9014
801aeb5a WRITE 1 HELD incidencedb page 9434
801aeb5a WRITE 1 HELD hgstore_idx_5e9c4a7c-2945-4533-9c71-09ba486419cb_7b431657-ea98-4039-b278-255d94c6281c page 1942
801aeb5a WRITE 1 HELD hgstore_idx_5e9c4a7c-2945-4533-9c71-09ba486419cb_7b431657-ea98-4039-b278-255d94c6281c page 1943
801aeb5a WRITE 1 HELD hgstore_idx_5e9c4a7c-2945-4533-9c71-09ba486419cb_7b431657-ea98-4039-b278-255d94c6281c page 1957
801aeb5a WRITE 1 HELD hgstore_idx_HGATOMVALUE page 9695
801aeb5a WRITE 1 HELD datadb page 9799
801aeb5a WRITE 1 HELD hgstore_idx_5e9c4a7c-2945-4533-9c71-09ba486419cb_7b431657-ea98-4039-b278-255d94c6281c page 1150

I've re-arranged the output of db_stat to make it more readable. I can send you the exact output. The heading part is:

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Lock REGINFO information:
Lock Region type
5 Region ID
__db.005 Region name
0x530000 Original region address
0x530000 Region address
0x5300c8 Region primary address
0 Region maximum allocation
0 Region allocated
Region allocations: 42007 allocations, 0 failures, 0 frees, 1 longest
REGION_JOIN_OK Region flags
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

and I've removed all the database level locks (the ones with page=0) and grouped them by locker id.

Any help in tracking this issue would be greatly appreciated! The problem is happening on a regular basis and fortunately it is not a critical application, the server can be restarted and all is well for a while. It would be hard for me to write a test case that reproduces the problem. However I can follow instructions to generate more helpful traces, or experiment in anyway you think may help track down the problem.

Perhaps a silly question: given that I see only those two locker IDs (801aeb5a and 801aeb83) and I have those two threads that are hung into a BDB transaction, can I safely assume that I don't have any cursor remaining open from other transactions? It was the first think I checked, but it doesn't look like I have any.

Thanks very much in advance!

Regards,
Boris
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Oct 6 2009
Added on Aug 21 2009
1 comment
2,092 views