Hi
We have run into another occurrence of the issue reported as part of thread https://community.oracle.com/thread/4195598 . As that discussion thread is already archived, I decided to open a new discussion thread.
We have 3-nodes BDB JE HA cluster running with BDB JE 7.4.5.
Yesterday, the master node reported EnvironmentFailureException from Feeder Output thread. The exception was reported 3 times from different classes. The stack traces are provided below:
2019-11-06 19:33:54,031 ERROR [Feeder Output for node_1] (c.s.j.r.i.RepImpl) - node_2 Halted log file reading at file 0x334f63 offset 0x8ce9e offset(decimal)=577182 prev=0x8ce6c:
entry=DEL_LN_TXtype=31,version=14)
prev=0x8ce6c
size=27
Next entry should be at 0x8cecf
com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects.
at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)
at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)
at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)
at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)
at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)
at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)
at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)
at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)
at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)
2019-11-06 19:33:54,032 ERROR [Feeder Output for node_1] (c.s.j.r.i.n.Feeder) - node_2 Unexpected exception: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000
at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)
at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)
at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)
at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)
at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)
at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)
at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)
at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)
at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)
2019-11-06 19:33:54,034 ERROR [Feeder Output for node_1] (c.s.j.r.i.n.Feeder) - node_2 Uncaught exception in feeder thread Thread[Feeder Output for node_1,5,main](JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000
at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)
at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)
at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)
at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)
at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)
at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)
at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)
at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)
at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)
After the above, a warning below was reported:
2019-11-06 19:33:54,034 WARN [MASTER node_2(3)] (c.s.j.r.i.n.FeederManager) - node_2 Feeder manager unexpected interrupt
After that, the node was detached with the following exception
2019-11-06 19:33:54,055 ERROR [DETACHED node_2(3)] (c.s.j.r.i.n.RepNode) - node_2 node_2(3):/path/to/node_2/data:DETACHED node_2(3) exited unexpectedly with exception com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3)com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node3_2(3)
at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)
at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)
at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)
at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)
at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)
at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)
at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)
at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)
at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)
The exception was also reported via the exception listener as below
2019-11-06 19:33:54,061 ERROR [DETACHED node_2(3)] (o.a.q.s.s.b.r.ReplicatedEnvironmentFacade) - Asynchronous exception thrown by BDB thread 'DETACHED node_2(3)'
com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) node_2(3):/path/to/node_2/data com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3) UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.
at com.sleepycat.je.utilint.StoppableThread.handleUncaughtException(StoppableThread.java:243)
at com.sleepycat.je.utilint.StoppableThread$UncaughtHandler.uncaughtException(StoppableThread.java:274)
at java.lang.Thread.dispatchUncaughtException(Thread.java:1959)
Caused by: com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3)
at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)
at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)
at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)
at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)
at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)
at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)
at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)
at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)
at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)
The same exception was reported again via exception listener from a cleaner thread
2019-11-06 19:33:54,094 ERROR [Cleaner-1] (o.a.q.s.s.b.r.ReplicatedEnvironmentFacade) - Asynchronous exception thrown by BDB thread 'Cleaner-1'
com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) node_2(3):/path/to/node_2/data com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3) UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.
at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228)
at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766)
at com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:459)
at com.sleepycat.je.log.LogBufferPool.writeDirty(LogBufferPool.java:419)
at com.sleepycat.je.log.LogBufferPool.bumpAndWriteDirty(LogBufferPool.java:373)
at com.sleepycat.je.log.LogBufferPool.getWriteBuffer(LogBufferPool.java:320)
at com.sleepycat.je.log.LogManager.serialLogWork(LogManager.java:651)
at com.sleepycat.je.log.LogManager.serialLog(LogManager.java:476)
at com.sleepycat.je.log.LogManager.logItem(LogManager.java:423)
at com.sleepycat.je.log.LogManager.log(LogManager.java:338)
at com.sleepycat.je.tree.LN.logInternal(LN.java:716)
at com.sleepycat.je.tree.LN.optionalLog(LN.java:419)
at com.sleepycat.je.dbi.CursorImpl.updateRecordInternal(CursorImpl.java:1677)
at com.sleepycat.je.dbi.CursorImpl.updateCurrentRecord(CursorImpl.java:1085)
at com.sleepycat.je.dbi.DbTree$RewriteMapLN.doWork(DbTree.java:847)
at com.sleepycat.je.tree.Tree.withRootLatchedExclusive(Tree.java:513)
at com.sleepycat.je.dbi.DbTree.modifyDbRoot(DbTree.java:815)
at com.sleepycat.je.dbi.DbTree.modifyDbRoot(DbTree.java:737)
at com.sleepycat.je.cleaner.UtilizationProfile.removePerDbMetadata(UtilizationProfile.java:355)
at com.sleepycat.je.cleaner.Cleaner.deleteSafeToDeleteFiles(Cleaner.java:893)
at com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:361)
at com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:286)
at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:154)
at java.lang.Thread.run(Thread.java:748)
Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) node_2(3):/path/to/node_2/data com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3) UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.
at com.sleepycat.je.utilint.StoppableThread.handleUncaughtException(StoppableThread.java:243)
at com.sleepycat.je.utilint.StoppableThread$UncaughtHandler.uncaughtException(StoppableThread.java:274)
at java.lang.Thread.dispatchUncaughtException(Thread.java:1959)
Caused by: com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3)
at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)
at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)
at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)
at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)
at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)
at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)
at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)
at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)
at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)
The application commits BDB JE transactions using com.sleepycat.je.Transaction#commit(com.sleepycat.je.Durability) (where durability is configured as new Durability( SyncPolicy.NO_SYNC, SyncPolicy.NO_SYNC, ReplicaAckPolicy.SIMPLE_MAJORITY))
The transaction syncing to disk is performed in a separate thread by invoking com.sleepycat.je.Environment#flushLog(true) (Essentially, after transaction commit the application is awaiting for #flushLog(true) to complete for multiple parallel transactions committed with local policy SyncPolicy.NO_SYNC ). The syncing policy SyncPolicy.NO_SYNC is used intentionally as local syncing policy, in order to aggregate the syncing to disk of multiple parallel transactions using Environment#flushLog for performance optimization.
Whilst invoking com.sleepycat.je.Environment#flushLog(true), the EnvironmentFailureException was re-thrown again:
2019-11-06 19:33:54,094 ERROR [Commit-Thread-nodes] (o.a.q.s.u.ServerScopedRuntimeException) - Exception whilst syncing data to disk
com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) Environment must be closed, caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) node_2(3):/path/to/node_2/data com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3) UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.
at com.sleepycat.je.EnvironmentFailureException.wrapSelf(EnvironmentFailureException.java:228)
at com.sleepycat.je.dbi.EnvironmentImpl.checkIfInvalid(EnvironmentImpl.java:1766)
at com.sleepycat.je.log.LogBufferPool.writeBufferToFile(LogBufferPool.java:459)
at com.sleepycat.je.log.LogBufferPool.writeDirty(LogBufferPool.java:419)
at com.sleepycat.je.log.LogManager.flushInternal(LogManager.java:1270)
at com.sleepycat.je.log.LogManager.flushSync(LogManager.java:1210)
at com.sleepycat.je.dbi.EnvironmentImpl.flushLog(EnvironmentImpl.java:2265)
at com.sleepycat.je.Environment.flushLog(Environment.java:1638)
at org.apache.qpid.server.store.berkeleydb.replication.ReplicatedEnvironmentFacade.flushLog(ReplicatedEnvironmentFacade.java:746)
at org.apache.qpid.server.store.berkeleydb.CoalescingCommiter$CommitThread.processJobs(CoalescingCommiter.java:201)
at org.apache.qpid.server.store.berkeleydb.CoalescingCommiter$CommitThread.run(CoalescingCommiter.java:180)
Caused by: com.sleepycat.je.EnvironmentFailureException: Environment invalid because of previous exception: (JE 7.4.5) node_2(3):/path/to/node_2/data com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3) UNCAUGHT_EXCEPTION: Uncaught Exception in internal thread, unable to continue. Environment is invalid and must be closed.
at com.sleepycat.je.utilint.StoppableThread.handleUncaughtException(StoppableThread.java:243)
at com.sleepycat.je.utilint.StoppableThread$UncaughtHandler.uncaughtException(StoppableThread.java:274)
at java.lang.Thread.dispatchUncaughtException(Thread.java:1959)
Caused by: com.sleepycat.je.EnvironmentFailureException: (JE 7.4.5) want to read 17,087,569,685 but reader at 17,087,569,693 UNEXPECTED_STATE: Unexpected internal state, may have side effects. MasterFeederSource fetching vlsn=17,087,569,685 waitTime=1000 Uncaught exception in feeder thread:Thread[Feeder Output for node_1,5,main] Originally thrown by HA thread: MASTER node_2(3)
at com.sleepycat.je.EnvironmentFailureException.unexpectedState(EnvironmentFailureException.java:428)
at com.sleepycat.je.rep.stream.FeederReader.checkForPassingTarget(FeederReader.java:297)
at com.sleepycat.je.rep.stream.FeederReader.isTargetEntry(FeederReader.java:317)
at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:332)
at com.sleepycat.je.log.FileReader.readNextEntry(FileReader.java:245)
at com.sleepycat.je.rep.stream.FeederReader.scanForwards(FeederReader.java:280)
at com.sleepycat.je.rep.stream.MasterFeederSource.getWireRecord(MasterFeederSource.java:70)
at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.writeAvailableEntries(Feeder.java:1266)
at com.sleepycat.je.rep.impl.node.Feeder$OutputThread.run(Feeder.java:1144)
Usually, on running into BDB JE issues reported with EnvironmentFailureException the application tries to recover from the problem by recreating the Environment instance and re-joining the new environment instance to the group, though, the exception thrown from EnvironmentImpl.flushLog are treated as terminal and cause JVM halt.
Potentially, the code can be changed to try to recreate the Environment even on exceptions thrown from EnvironmentImpl.flushLog, though that might results in not exactly right reaction on critical system issues like running out of dusk space or disk failures. I am not sure that Environment recreation would be a feasible approach to work-around the issue.
What could be the best way to work around the reported problem?
Can upgrade to BDB JE version 7.5.11 help to mitigate the issue?
If more details about the issue is required, I can try to provide them. Though, the BDB JE files have gone after manual restart of the application. The restart recovers BDB JE environment.
Kind Regards,
Alex