Berkeley DB Family

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

BDB 4.2.52 Windows : Hanging

612208Dec 23 2007 — edited Feb 10 2008

We are using BDB 4.2.52 on a Windows platform in a multithreaded application server. The application server has been in several production deployments for a few years and is stable. We have been troubled by a once-every-few-days 'hang' situation. At this deployment, there are about 100 users, but concurrent requests are typically between 3 and 4. The server is adequately sized (4 processor with 16GB RAM) and typical response times are about 400 to 600 msecs for the requests.

Our analysis of the request logs and archive logs indicate that hang seems to arise when two threads are attempting to insert records into a database that has about 1 million records - from the application perspective the two requests are same ie invoking the same application API function.

The particular database has 3 secondary dbs. The primary db has a 4 byte integer key and the data is typically around 60 bytes. The primary key is generated by another 'id generation' table. This particular database is like a log table and there are no updates. Delete operations (purging) are performed typically in maintenance mode. Given the use of the id generator table, the ids generated for the two threads are offset by 1 and so in the primary db will be adjacent to each other in the btree.

On one occasion, the insert operation returned a deadlock and this continued over several iterations - this is inferred from the archive logs. On the other occasions such a deadlock is not indicated. When a cursor operations returns a deadlock, our application logic forces the threads to sleep for an interval and then retry the operation. The sleep interval is a random number between 20 and 50 msecs - the attempt here being to have one thread sleep a little longer and thereby perhaps help prevent the deadlock from recurring. [ We have a separate thread that calls the deadlock detection function - this thread runs every 1 second ]

An analysis of the BDB code indicates considerable change in implementation of mutexes between the version we are using and the more recent version. During development, we have noticed, although very rarely, that the for loop in db_win32_mutex_lock sometimes does not break out. This was always put down to 'one of those things' since it never seemed to repeat itself consistently.

Is there any known issue with ver 4.2.52 that causes the kind of hang we are experiencing?

We have obtained the stats of the environment - logging, locking and transactions. We are compiling those into a spreadsheet to facilitate analysis. Is there any particular statistic that could reveal whether we have not configured the sub-systems correctly?

Thanks in advance for any help

Kimman

Locked Post

New comments cannot be posted to this locked post.

Locked on Mar 9 2008

Added on Dec 23 2007

#berkeley-db

7 comments

1,766 views