Skip to Main Content

Berkeley DB Family

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Log sequence error - possible causes?

537031Oct 2 2006 — edited Oct 5 2006
We're using C++, DBXML 2.1.7, with underlying Berkeley 4.3.28 - core 5 linux (2.6.16.28). We use transacted write's, with no nesting of transactions. We've been running with this version of DBXML for some time and this is the first time we've seen any sort of data corruption with the database.

In this case, the database server was shutdown, and the system restarted - on restart, the database server core dumped. Repeated attempts to restart the database gave the same failure. We enabled error output for the Berkeley DB and we get the following errors:

Finding last valid log LSN: file: 1 offset 8234100
Recovery starting from [1][7965842]
Log sequence error: page LSN 1 1664073; previous LSN 1 5236280
Recovery function for LSN 1 8228918 failed on forward pass
PANIC: Invalid argument
PANIC: fatal region error detected; run recovery (repeated several times)
followed by a segfault in libdb_cxx-4.3.so.

So I have 2 questions, the first (and most important) being - how can the log file get corrupted? Is this an OS/file system problem? Or could we have a problem in our database server? It's relatively simple - there is a single thread for read's/write's, and a separate "checkpoint" thread that periodically calls the txn_checkpoint function. Something was just changed on the system that has to do with mirroring, specifically on the partition that holds our database, but I don't know the details (I can get the info, though).

The second question - why is Berkeley choking on the error path, instead of causing a database panic? Granted, in this situation it would appear that we're hosed either way, but a panic is at least a little more user-friendy than a core dump. Looking at the core file, it appears that we've entered the error handling portion of dbenv_open, and the mp_handle of the environment object is NULL - we fail in the call to __dbenv_refresh because of that. (If you're interested, we rebuilt Berkeley with debug symbols - I can give you a stack trace with details for the segfault.)

Oh, the startup flags for the database server are: DB_CREATE|DB_INIT_LOCK|DB_INIT_LOG|DB_INIT_MPOOL|DB_INIT_TXN|DB_RECOVER|DB_THREAD

Thanks!

Wendy
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Nov 2 2006
Added on Oct 2 2006
8 comments
1,817 views