Clear cache operation blows up the cluster
754801Dec 7 2011 — edited Jan 5 2012Hi,
We are running a 9 node coherence 3.6 cluster and every night we have a java process that starts up, issues a cache.clear() command on each cache and terminates itself. This appears to be causing adverse effects on the cluster health. Right after this process enters and leaves, several nodes run out of memory and quits. See error below
2011-12-07 03:05:18.786/57714.576 Oracle Coherence GE 3.6.1.0 <D5> (thread=Cluster, member=1): Service guardian is 23184ms late, indicating that this JVM may be running slowly or experienced a long GC
2011-12-07 03:05:20.518/57716.308 Oracle Coherence GE 3.6.1.0 <Error> (thread=DistributedCache:OA-DistributedCache, member=1): Terminating PartitionedCache due to unhandled exception: java.lang.OutOfMemoryError
2011-12-07 03:05:20.518/57716.308 Oracle Coherence GE 3.6.1.0 <Error> (thread=DistributedCache:OA-DistributedCache, member=1):
java.lang.OutOfMemoryError: Java heap space
at com.tangosol.coherence.component.net.memberSet.ActualMemberSet.setMember(ActualMemberSet.CDB:11)
at com.tangosol.coherence.component.net.memberSet.ActualMemberSet.add(ActualMemberSet.CDB:6)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.PartitionedService.getOwnershipMemberSet(PartitionedService.CDB:13)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.PartitionedService.getOwnershipSenior(PartitionedService.CDB:10)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.PartitionedService.checkDistribution(PartitionedService.CDB:71)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.PartitionedService.onNotify(PartitionedService.CDB:15)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache.onNotify(PartitionedCache.CDB:3)
at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:42)
at java.lang.Thread.run(Unknown Source)
This creates a ripple effect to all members and the cluster goes down after thsi happens. Can you please let me know if you have seen this or what might be causing this?
Sairam
Edited by: SKR on Dec 7, 2011 9:47 AM