Frequent but unpredictable DB_PAGE_NOTFOUND corruption
673547Dec 2 2008 — edited Dec 3 2008Hi,
We have developed a multi-process data processing engine that uses BDB as state storage to store queues of pointers to datums in on-disk flat files. The engine is written in Perl, using the standard BerkeleyDB CPAN module as its interface to BDB.
Platform: Red Hat Enterprise Linux 5.1 x86-64
Perl: 5.8.8 (with 64-bit support)
BDB: 4.3.29 (the default for this version of RHEL)
After running in production for some time without any errors, occasionally one of the data queues (a Btree database) has started to corrupt after a few hours of record creation/deletion by forked children. The error (which is elicited after subsequent db_put() calls is "DB_PAGE_NOTFOUND: Requested page not found"), and running db_verify on the database returns:
"db_verify: Page 1: internal page is empty and should not be
db_verify: queue.db: DB_VERIFY_BAD: Database verification failed"
Worse, is that the error cannot be recreated on any of our development or staging environments - it just intermittently occurs in production, now maybe every 3 to 8 hours.
Some background:
Roughly - the child processes that seem to be causing the corruption read a bunch of key/values via a cursor, and then delete the keys from the DB.
The environment is created with: DB_CREATE | DB_INIT_LOCK | DB_INIT_LOG | DB_INIT_MPOOL | DB_THREAD | DB_INIT_TXN
The database is created with: DB_CREATE|DB_THREAD
The parent process closes all Env & DB handles before forking children, then re-opens upon returning from fork().
The child processes all open their own Env & DB handles after fork().
There are usually around 5-8 children running in parallel, and will execute the deletes on the DB in parallel.
Before exiting, the child processes always explicitly call db_sync() before calling db_close() - probably overkill.
Here's where my understanding of deadlocking in BDB gets shaky:
DB_INIT_LOCK should implement multiple-writer locking semantics, and because of the way the parent process distributes the work to the child processes, children are never competing to delete the same keys.
I suspect the reason for the corruption is that BDB's locking may be page-based, not key (record) based, and if (say) child A deleting a key causes an underlying page split (?) whilst child B is also deleting a key stored on that same page, corruption occurs. Am I on the right track here? The app is not yet doing any deadlock detection or resolution - we haven't yet gone down that route because nowhere are any errors regarding deadlocks being surfaced in the statuses of any DB calls, or the output of db_stat().
Interestingly, none of the db_del() calls in any of children fail, with deadlock errors or otherwise - the corruption is only noticed by calls to db_put() into the same database during a subsequent processing run - obviously after the in-memory cache has been synced to disk.
We haven't yet gone for upgrading BDB to 4.7 (or even 4.4) , but will attempt to do this if no other fix is forthcoming.
An alternative, quicker fix we're trying out is to use DB_INIT_CDB to enforce single-writer semantics on the children, or to move the responsibility of writing back up to the parent process, and have no multiple-writers at all.
I know my understanding of the pitfalls of deadlocking and how they relate to the underlying Btree store aren't great and suspect herein lies the real problem. Many thanks in advance for anyone with advice or recommendations here.