Skip to Main Content

[BDB JE 7.4.5] BDB storage utilization increases continuously due to unexpected data in utility data

User_BZQZ5Jul 29 2020 — edited Aug 3 2020

Hi,

We have an application which uses BDB JE 7.4.5 for storage purposes. The application has a spiky load pattern. Periodically, on peak loads the amount of business data stored in BDB JE reaches 12GB (or even more), and, than it shrinks back to zero (though the store is constantly updated with a new data which quickly gets removed ). After, a number of pick load iterations (usually 10-20 during the day), the store continues to grow in size even when business data size is around zero. Eventually, it consume the disk space and the application crashes. On restart, the BDB,JE  store cleans all the files and shrink back to nearly zero size.

The BDB JE is configured with CacheMode.EVICT_LN and a relatively small cache size (under 100MB). We cache business data  internally on application level. Thus, the extra caching on BDB JE level looks as being unnecessary for our use cases.

Here are the BDB JE settings we use

EnvironmentConfig.LOCK_N_LOCK_TABLES: "7"

EnvironmentConfig.STATS_COLLECT: "false"

EnvironmentConfig.FILE_LOGGING_LEVEL: "OFF"

EnvironmentConfig.CONSOLE_LOGGING_LEVEL: "OFF"

EnvironmentConfig.CLEANER_UPGRADE_TO_LOG_VERSION: "-1"

The rest of BDB JE settings are left unchanged including the number of cleaner thread being set to 1.

After several full disk consumption events caused by BDB JE, we changed the number of cleaner threads to 5. It looks like that the increase the number of  cleaner threads mitigated the issue with the disk utilization, but, it caused the performance degradation on BDB JE level. The business threads got blocked by the locks hold inside of BDB JE.

Here are some examples of such threads

"Checkpointer" #212 daemon prio=5 os_prio=0 tid=0x00007fb430362000 nid=0x3d52 runnable [0x00007fb25dede000]

   java.lang.Thread.State: RUNNABLE

at com.sleepycat.je.log.LogUtils.writePackedLong(LogUtils.java:273)

at com.sleepycat.je.dbi.DatabaseImpl.writeToLog(DatabaseImpl.java:2477)

at com.sleepycat.je.tree.MapLN.writeToLog(MapLN.java:315)

at com.sleepycat.je.log.entry.LNLogEntry.writeBaseLNEntry(LNLogEntry.java:865)

at com.sleepycat.je.log.entry.LNLogEntry.writeEntry(LNLogEntry.java:757)

at com.sleepycat.je.log.entry.BaseReplicableEntry.writeEntry(BaseReplicableEntry.java:54)

at com.sleepycat.je.log.entry.LNLogEntry.writeEntry(LNLogEntry.java:123)

at com.sleepycat.je.log.LogManager.marshallIntoBuffer(LogManager.java:740)

at com.sleepycat.je.log.LogManager.serialLogWork(LogManager.java:632)

at com.sleepycat.je.log.LogManager.serialLog(LogManager.java:476)

- locked <0x00000004a8f9b018> (a java.lang.Object)

at com.sleepycat.je.log.LogManager.logItem(LogManager.java:423)

at com.sleepycat.je.log.LogManager.log(LogManager.java:338)

at com.sleepycat.je.tree.LN.logInternal(LN.java:716)

at com.sleepycat.je.tree.LN.optionalLog(LN.java:419)

at com.sleepycat.je.dbi.CursorImpl.updateRecordInternal(CursorImpl.java:1677)

at com.sleepycat.je.dbi.CursorImpl.updateCurrentRecord(CursorImpl.java:1085)

at com.sleepycat.je.dbi.DbTree$RewriteMapLN.doWork(DbTree.java:847)

at com.sleepycat.je.tree.Tree.withRootLatchedExclusive(Tree.java:513)

at com.sleepycat.je.dbi.DbTree.modifyDbRoot(DbTree.java:815)

at com.sleepycat.je.dbi.DbTree.modifyDbRoot(DbTree.java:737)

at com.sleepycat.je.cleaner.UtilizationProfile.removePerDbMetadata(UtilizationProfile.java:355)

at com.sleepycat.je.cleaner.Cleaner.deleteSafeToDeleteFiles(Cleaner.java:893)

- locked <0x00000004a8f29258> (a com.sleepycat.je.cleaner.Cleaner)

at com.sleepycat.je.cleaner.Cleaner.updateFilesAtCheckpointEnd(Cleaner.java:1022)

at com.sleepycat.je.recovery.Checkpointer.doCheckpoint(Checkpointer.java:910)

- locked <0x00000004a8c6a108> (a com.sleepycat.je.recovery.Checkpointer)

at com.sleepycat.je.recovery.Checkpointer.onWakeup(Checkpointer.java:559)

at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:154)

at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:

- <0x00000004a92077f8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

- <0x00000004b067b988> (a java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)

"Cleaner-5" #211 daemon prio=5 os_prio=0 tid=0x00007fb430360000 nid=0x3d51 waiting for monitor entry [0x00007fb25dfe0000]

   java.lang.Thread.State: BLOCKED (on object monitor)

at com.sleepycat.je.cleaner.Cleaner.deleteSafeToDeleteFiles(Cleaner.java:669)

- waiting to lock <0x00000004a8f29258> (a com.sleepycat.je.cleaner.Cleaner)

at com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:361)

- locked <0x00000004a8ceb978> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:286)

- locked <0x00000004a8ceb978> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:154)

at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:

- None

"Cleaner-4" #210 daemon prio=5 os_prio=0 tid=0x00007fb43035e800 nid=0x3d50 waiting for monitor entry [0x00007fb25e0e1000]

   java.lang.Thread.State: BLOCKED (on object monitor)

at com.sleepycat.je.cleaner.Cleaner.deleteSafeToDeleteFiles(Cleaner.java:669)

- waiting to lock <0x00000004a8f29258> (a com.sleepycat.je.cleaner.Cleaner)

at com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:361)

- locked <0x00000004a8cdc4b8> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:286)

- locked <0x00000004a8cdc4b8> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:154)

at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:

- None

"Cleaner-3" #209 daemon prio=5 os_prio=0 tid=0x00007fb43035c800 nid=0x3d4f waiting on condition [0x00007fb25e1e2000]

   java.lang.Thread.State: TIMED_WAITING (parking)

at sun.misc.Unsafe.park(Native Method)

- parking to wait for  <0x00000004a92077f8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)

at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)

at java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:442)

at com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:61)

at com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107)

at com.sleepycat.je.tree.IN.latchShared(IN.java:560)

at com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372)

at com.sleepycat.je.tree.Tree.search(Tree.java:2275)

at com.sleepycat.je.tree.Tree.search(Tree.java:2172)

at com.sleepycat.je.dbi.CursorImpl.searchExact(CursorImpl.java:1951)

at com.sleepycat.je.dbi.CursorImpl.searchExact(CursorImpl.java:1914)

at com.sleepycat.je.dbi.DbTree.getDb(DbTree.java:1628)

at com.sleepycat.je.dbi.DbTree.getDb(DbTree.java:1578)

at com.sleepycat.je.cleaner.FileProcessor.processFile(FileProcessor.java:1022)

at com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:478)

- locked <0x00000004a8d15a38> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:286)

- locked <0x00000004a8d15a38> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:154)

at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:

- None

"Cleaner-2" #208 daemon prio=5 os_prio=0 tid=0x00007fb430371000 nid=0x3d4e waiting for monitor entry [0x00007fb25e2e3000]

   java.lang.Thread.State: BLOCKED (on object monitor)

at com.sleepycat.je.cleaner.Cleaner.deleteSafeToDeleteFiles(Cleaner.java:669)

- waiting to lock <0x00000004a8f29258> (a com.sleepycat.je.cleaner.Cleaner)

at com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:361)

- locked <0x00000004a8d16fc0> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:286)

- locked <0x00000004a8d16fc0> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:154)

at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:

- None

"Cleaner-1" #207 daemon prio=5 os_prio=0 tid=0x00007fb43036f000 nid=0x3d4d waiting for monitor entry [0x00007fb25e3e4000]

   java.lang.Thread.State: BLOCKED (on object monitor)

at com.sleepycat.je.cleaner.Cleaner.deleteSafeToDeleteFiles(Cleaner.java:669)

- waiting to lock <0x00000004a8f29258> (a com.sleepycat.je.cleaner.Cleaner)

at com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:361)

- locked <0x00000004a8d19be0> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:286)

- locked <0x00000004a8d19be0> (a com.sleepycat.je.cleaner.FileProcessor)

at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:154)

at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:

- None

"IO-1" #203 prio=5 os_prio=0 tid=0x00007fb430355800 nid=0x3d14 waiting on condition [0x00007fb25e9e6000]

   java.lang.Thread.State: TIMED_WAITING (parking)

at sun.misc.Unsafe.park(Native Method)

- parking to wait for  <0x000000049a4293e8> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)

at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)

at java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:442)

at com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:61)

at com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107)

at com.sleepycat.je.tree.IN.latchShared(IN.java:560)

at com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372)

at com.sleepycat.je.tree.Tree.search(Tree.java:2275)

at com.sleepycat.je.tree.Tree.search(Tree.java:2172)

at com.sleepycat.je.dbi.CursorImpl.searchExact(CursorImpl.java:1951)

at com.sleepycat.je.Cursor.searchExact(Cursor.java:4195)

at com.sleepycat.je.Cursor.searchNoDups(Cursor.java:4056)

at com.sleepycat.je.Cursor.search(Cursor.java:3858)

- locked <0x000000047f04cf48> (a com.sleepycat.je.Transaction)

at com.sleepycat.je.Database.deleteInternal(Database.java:1161)

at com.sleepycat.je.Database.delete(Database.java:1128)

at com.sleepycat.je.Database.delete(Database.java:1230)

    ...

   

"IO-2" #200 prio=5 os_prio=0 tid=0x00007fb43034f800 nid=0x3d11 waiting on condition [0x00007fb25ece8000]

   java.lang.Thread.State: TIMED_WAITING (parking)

at sun.misc.Unsafe.park(Native Method)

- parking to wait for  <0x000000049a427418> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)

at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)

at java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:442)

at com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:61)

at com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107)

at com.sleepycat.je.tree.IN.latchShared(IN.java:560)

at com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372)

at com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1817)

at com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1748)

at com.sleepycat.je.tree.Tree.searchSplitsAllowed(Tree.java:1666)

at com.sleepycat.je.tree.Tree.findBinForInsert(Tree.java:1637)

at com.sleepycat.je.dbi.CursorImpl.insertRecordInternal(CursorImpl.java:1337)

at com.sleepycat.je.dbi.CursorImpl.insertOrUpdateRecord(CursorImpl.java:1221)

at com.sleepycat.je.Cursor.putNoNotify(Cursor.java:2962)

at com.sleepycat.je.Cursor.putNotify(Cursor.java:2800)

at com.sleepycat.je.Cursor.putNoDups(Cursor.java:2647)

at com.sleepycat.je.Cursor.putInternal(Cursor.java:2478)

- locked <0x000000047fe3c400> (a com.sleepycat.je.Transaction)

at com.sleepycat.je.Cursor.putInternal(Cursor.java:830)

at com.sleepycat.je.Database.put(Database.java:1574)

at com.sleepycat.je.Database.put(Database.java:1627)

    ...

"IO-3" #197 prio=5 os_prio=0 tid=0x00007fb430349800 nid=0x3d0e waiting for monitor entry [0x00007fb25efec000]

   java.lang.Thread.State: BLOCKED (on object monitor)

at com.sleepycat.je.log.LogManager.serialLog(LogManager.java:470)

- waiting to lock <0x00000004a8f9b018> (a java.lang.Object)

at com.sleepycat.je.log.LogManager.logItem(LogManager.java:423)

at com.sleepycat.je.log.LogManager.log(LogManager.java:338)

at com.sleepycat.je.tree.LN.logInternal(LN.java:713)

- locked <0x000000046d6e2388> (a com.sleepycat.je.txn.Txn)

at com.sleepycat.je.tree.LN.optionalLog(LN.java:419)

at com.sleepycat.je.dbi.CursorImpl.deleteCurrentRecord(CursorImpl.java:962)

at com.sleepycat.je.Cursor.deleteNoNotify(Cursor.java:2412)

- locked <0x000000046d6e24e8> (a com.sleepycat.je.Transaction)

at com.sleepycat.je.Cursor.deleteInternal(Cursor.java:2353)

at com.sleepycat.je.Database.deleteInternal(Database.java:1169)

at com.sleepycat.je.Database.delete(Database.java:1128)

at com.sleepycat.je.Database.delete(Database.java:1230)

    ...

"IO-4" #190 prio=5 os_prio=0 tid=0x00007fb43033b800 nid=0x3d07 waiting for monitor entry [0x00007fb25f6f3000]

   java.lang.Thread.State: BLOCKED (on object monitor)

at com.sleepycat.je.log.LogManager.serialLog(LogManager.java:470)

- waiting to lock <0x00000004a8f9b018> (a java.lang.Object)

at com.sleepycat.je.log.LogManager.logItem(LogManager.java:423)

at com.sleepycat.je.log.LogManager.log(LogManager.java:338)

at com.sleepycat.je.tree.LN.logInternal(LN.java:713)

- locked <0x000000047f748c90> (a com.sleepycat.je.txn.Txn)

at com.sleepycat.je.tree.LN.optionalLog(LN.java:419)

at com.sleepycat.je.dbi.CursorImpl.deleteCurrentRecord(CursorImpl.java:962)

at com.sleepycat.je.Cursor.deleteNoNotify(Cursor.java:2412)

- locked <0x000000047f748d80> (a com.sleepycat.je.Transaction)

at com.sleepycat.je.Cursor.deleteInternal(Cursor.java:2353)

at com.sleepycat.je.Database.deleteInternal(Database.java:1169)

at com.sleepycat.je.Database.delete(Database.java:1128)

at com.sleepycat.je.Database.delete(Database.java:1230)

    ...

"IO-5" #187 prio=5 os_prio=0 tid=0x00007fb430335800 nid=0x3d04 waiting on condition [0x00007fb25f9f6000]

   java.lang.Thread.State: TIMED_WAITING (parking)

at sun.misc.Unsafe.park(Native Method)

- parking to wait for  <0x000000049a427418> (a java.util.concurrent.locks.ReentrantLock$NonfairSync)

at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215)

at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireNanos(AbstractQueuedSynchronizer.java:934)

at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireNanos(AbstractQueuedSynchronizer.java:1247)

at java.util.concurrent.locks.ReentrantLock.tryLock(ReentrantLock.java:442)

at com.sleepycat.je.latch.LatchImpl.acquireExclusive(LatchImpl.java:61)

at com.sleepycat.je.latch.LatchImpl.acquireShared(LatchImpl.java:107)

at com.sleepycat.je.tree.IN.latchShared(IN.java:560)

at com.sleepycat.je.tree.Tree.latchChildShared(Tree.java:372)

at com.sleepycat.je.tree.Tree.search(Tree.java:2275)

at com.sleepycat.je.tree.Tree.search(Tree.java:2172)

at com.sleepycat.je.dbi.CursorImpl.searchExact(CursorImpl.java:1951)

at com.sleepycat.je.Cursor.searchExact(Cursor.java:4195)

at com.sleepycat.je.Cursor.searchNoDups(Cursor.java:4056)

at com.sleepycat.je.Cursor.search(Cursor.java:3858)

- locked <0x000000047793d958> (a com.sleepycat.je.Transaction)

at com.sleepycat.je.Database.deleteInternal(Database.java:1161)

at com.sleepycat.je.Database.delete(Database.java:1128)

at com.sleepycat.je.Database.delete(Database.java:1230)

...

The heap dumps collected for the issues were dominated by instances of com.sleepycat.je.cleaner.DbFileSummary referred from com.sleepycat.je.cleaner.DbFileSummaryMap in DatabaseImpl for _jeUtilization, _jeExpiration and our sequence database.

What we noticed, that regardless a number of cleaner threads, the amount of instances of com.sleepycat.je.cleaner.DbFileSummary is constantly growing (for  _jeUtilization, _jeExpiration and our sequence databases). The databases for storing business data do not exhibit such behavior.

For example, on some point we had 188,574 instances of DbFileSummary referred from _jeUtilization, 173585  instances created for _jeExpiration and 202091 instances created for sequence database. In a couple of hours, the number of DbFileSummary instances has instanced for databases _jeUtilization, _jeExpiration and sequence db to 188734, 173752 and 202104 accordingly. On other hand, the number of com.sleepycat.je.cleaner.FileSummary instances was 33 and 21 for the above points of time.

We are currently using BDB JE version 7.4.5. We have not tested the latest version 7.5.11 due to other issues, and, I am not sure whether upgrade to version 7.5.11 could solve our problem with disk usage and performance degradation.

At the moment it is not completely clear for me why storage consumption grows in size even when there is no much data stored in the business databases.

How can I evaluate an optimal number of cleaner threads? Perhaps, using more cleaner threads can solve our problem with disk utilization and performance degradation.

We can collect and provide more details about the reported issue, as it manifests quite often.

Kind Regards,

Alex

Comments
Post Details
Added on Jul 29 2020
1 comment
112 views