I'm doing a health check for EPM 11.1.2.4 (HFM) deployed in WebLogic 10.3.6.
Everything seems to be working fine (i.e. APP can be accessed, all managed server state are running, and health are OK), but if I navigate to the WebLogic under:
I could see the health show "Warning" and the reason "ThreadPool has stuck threads".
1. Increased database connection pool
Services > Data Sources > EPMSystemRegistry > Configuration > Connection Pool
2. Increased Max Stuck Thread
3. Increased JVM (Java Heap)
Navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Hyperion Solutions\EPMServer0\HyS9EPMServer_epmsystem1
After all amendment and server reboot, I still encountered the stuck thread after 3000 seconds.
<Jul 10, 2018 9:01:01 AM> <Error> <Diagnostics> <BEA-320142> <An error was encountered while performing size based data retirement on archive EventsDataArchive
weblogic.diagnostics.accessor.DiagnosticDataAccessException: weblogic.store.PersistentStoreException: weblogic.store.PersistentStoreException: [Store:280029]The persistent store record 674 could not be found
at weblogic.diagnostics.archive.wlstore.PersistentStoreDataArchive.deleteDataRecords(PersistentStoreDataArchive.java:1368)
at weblogic.diagnostics.archive.wlstore.PersistentStoreDataArchive.retireOldestRecords(PersistentStoreDataArchive.java:1211)
at weblogic.diagnostics.archive.DataRetirementByQuotaTaskImpl.performDataRetirement(DataRetirementByQuotaTaskImpl.java:92)
at weblogic.diagnostics.archive.DataRetirementByQuotaTaskImpl.run(DataRetirementByQuotaTaskImpl.java:49)
at weblogic.diagnostics.archive.DataRetirementTaskImpl.run(DataRetirementTaskImpl.java:261)
Truncated. see log file for complete stacktrace
Caused By: weblogic.store.PersistentStoreException: weblogic.store.PersistentStoreException: [Store:280029]The persistent store record 674 could not be found
at weblogic.diagnostics.archive.wlstore.PersistentStoreDataArchive.readRecord(PersistentStoreDataArchive.java:698)
at weblogic.diagnostics.archive.wlstore.PersistentStoreDataArchive.readRecord(PersistentStoreDataArchive.java:668)
at weblogic.diagnostics.archive.wlstore.PersistentStoreDataArchive.getWrapper(PersistentStoreDataArchive.java:1767)
at weblogic.diagnostics.archive.wlstore.PersistentStoreDataArchive.removeGarbageInPage(PersistentStoreDataArchive.java:1813)
at weblogic.diagnostics.archive.wlstore.PersistentStoreDataArchive.cleanupPages(PersistentStoreDataArchive.java:1697)
Truncated. see log file for complete stacktrace
Caused By: weblogic.store.PersistentStoreException: [Store:280029]The persistent store record 674 could not be found
at weblogic.store.io.file.FileStoreIO$TypeRecord.getSlot(FileStoreIO.java:1097)
at weblogic.store.io.file.FileStoreIO.readInternal(FileStoreIO.java:262)
at weblogic.store.io.file.FileStoreIO.read(FileStoreIO.java:253)
at weblogic.store.internal.ReadRequest.run(ReadRequest.java:34)
at weblogic.store.internal.StoreRequest.doTheIO(StoreRequest.java:64)
Truncated. see log file for complete stacktrace
>
<Jul 10, 2018 9:22:15 AM> <Error> <WebLogicServer> <BEA-000337> <[STUCK] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "3,039" seconds working on the request "Workmanager: default, Version: 0, Scheduled=true, Started=true, Started time: 3039944 ms
", which is more than the configured time (StuckThreadMaxTime) of "3,000" seconds. Stack trace:
Thread-206 "[STUCK] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, suspended, parked, priority=1, DAEMON> {
sun.misc.Unsafe.park(Unsafe.java:???)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:154)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1981)
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:393)
com.hyperion.calcmgr.database.cache.CacheEventThread.run(CacheEventThread.java:37)
com.hyperion.calcmgr.thread.WorkDelegate.run(WorkDelegate.java:30)
weblogic.work.j2ee.J2EEWorkManager$WorkWithListener.run(J2EEWorkManager.java:170)
weblogic.work.ExecuteThread.execute(ExecuteThread.java:250)
weblogic.work.ExecuteThread.run(ExecuteThread.java:213)
}
>
<Jul 10, 2018 9:22:15 AM> <Notice> <Diagnostics> <BEA-320068> <Watch 'StuckThread' with severity 'Notice' on server 'EPMServer0' has triggered at Jul 10, 2018 9:22:15 AM. Notification details:
WatchRuleType: Log
WatchRule: (SEVERITY = 'Error') AND ((MSGID = 'WL-000337') OR (MSGID = 'BEA-000337'))
WatchData: DATE = Jul 10, 2018 9:22:15 AM SERVER = EPMServer0 MESSAGE = [STUCK] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)' has been busy for "3,039" seconds working on the request "Workmanager: default, Version: 0, Scheduled=true, Started=true, Started time: 3039944 ms
", which is more than the configured time (StuckThreadMaxTime) of "3,000" seconds. Stack trace:
Thread-206 "[STUCK] ExecuteThread: '6' for queue: 'weblogic.kernel.Default (self-tuning)'" <alive, suspended, parked, priority=1, DAEMON> {
sun.misc.Unsafe.park(Unsafe.java:???)
java.util.concurrent.locks.LockSupport.park(LockSupport.java:154)
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1981)
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:393)
com.hyperion.calcmgr.database.cache.CacheEventThread.run(CacheEventThread.java:37)
com.hyperion.calcmgr.thread.WorkDelegate.run(WorkDelegate.java:30)
weblogic.work.j2ee.J2EEWorkManager$WorkWithListener.run(J2EEWorkManager.java:170)
weblogic.work.ExecuteThread.execute(ExecuteThread.java:250)
weblogic.work.ExecuteThread.run(ExecuteThread.java:213)
}
SUBSYSTEM = WebLogicServer USERID = <WLS Kernel> SEVERITY = Error THREAD = [ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)' MSGID = BEA-000337 MACHINE = APP01 TXID = CONTEXTID = TIMESTAMP = 1531228935954
WatchAlarmType: AutomaticReset
WatchAlarmResetPeriod: 600000
>
Any hint will be much appreciated.