Skip to Main Content

One coherence application instance hangs while the other is active

3483854Jun 20 2017 — edited Jun 27 2017

PRODUCT NAME: Coherence
JAVA LIBRARIES: coherence-12.1.2.0.jar and coherence-common-12.3.1.jar

Hello,

I have multiple different applications deployed to tomcat, which successfully communicate without problem between instances using coherence, using a clustering address. There is, however, a specific application deployed on tomcat which is configured in a very similar fashion to the others. It, unlike the others, fails in communication.

The issue is that one always seems to be hung up on coherence and never returns a response. The load balancer is smart enough to determine it is no longer functioning and redirects traffic to the functioning version.

If I stop the instance which is not hung up, the instance that is hung up begins to function, however, the other instance, when restarted, will become hung up on coherence requests.

Investigating the system that hangs up, the number of these threads appear to build up, never returning for a response.

"http-bio-9011-exec-33" #137 daemon prio=5 os_prio=0 tid=0x00002b04a8089000 nid=0x578 in Object.wait() [0x00002b046a623000]

   java.lang.Thread.State: WAITING (on object monitor)

at java.lang.Object.wait(Native Method)

at com.tangosol.coherence.component.net.Poll.waitCompletion(Poll.CDB:7)

- locked <0x00000000e8bda788> (a com.tangosol.coherence.component.net.Poll)

at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.poll(Grid.CDB:24)

at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.poll(Grid.CDB:11)

at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.ReplicatedCache.requestIssue(ReplicatedCache.CDB:8)

at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.ReplicatedCache.updateResource(ReplicatedCache.CDB:38)

at com.tangosol.coherence.component.util.CacheHandler.put(CacheHandler.CDB:11)

at com.tangosol.coherence.component.util.CacheHandler.put(CacheHandler.CDB:1)

at com.tangosol.coherence.component.util.SafeNamedCache.put(SafeNamedCache.CDB:1)

...

Because one hangs and not the other, it would seem be a correct assumption that they are communicating to some extent. Is there a good way to help to determine what is happening? Environmental or other?

Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked due to inactivity on Jul 25 2017
Added on Jun 20 2017
1 comment
167 views