I'm tailing my gc logs and coherence logs for my 4 storage enabled cluster nodes (they're also extend proxies). They're all running on the same physical machine. The machine has 48 cores and 128GB ram. Check it out:
http://www.nmedia.net/~andrew/pix/morthra.jpg
I get these communication delay messages between nodes on this machine frequently. By just eyeballing the logs it appears that when GC times are ~0.01 sec everything is fine but when they're 0.02 to 0.04 I'm more likely to see those messages.
Are my GC times out of acceptable range?
I'm starting up the JVM wtih
-Xms1324m
-Xmx1324m
-XX:+UseConcMarkSweepGC
and I'm running 64 bit sun hotspot. All cluster nodes are on the same gigabit switch.
Thanks!
Andrew
MasterMemberSet
(
ThisMember=Member(Id=15, Timestamp=2011-01-20 10:21:49.988, Address=192.168.3.8:8088, MachineId=27400, Location=machine:box2,process:4148,member:andrew, Role=CoherenceConsole)
OldestMember=Member(Id=5, Timestamp=2011-01-18 21:55:47.288, Address=192.168.3.6:8088, MachineId=59924, Location=machine:mothra,process:3408,member:Administrator, Role=CoherenceServer)
ActualMemberSet=MemberSet(Size=14, BitSetCount=2
Member(Id=1, Timestamp=2011-01-20 02:33:33.194, Address=192.168.3.5:8088, MachineId=40086, Location=machine:blackjack,process:11720,member:Administrator, Role=DJclientStartMe)
Member(Id=2, Timestamp=2011-01-20 02:56:42.73, Address=192.168.3.5:8090, MachineId=40086, Location=machine:blackjack,process:3820,member:Administrator, Role=DJclientBboClient)
Member(Id=3, Timestamp=2011-01-20 05:22:35.746, Address=192.168.3.5:8092, MachineId=40086, Location=machine:blackjack,process:5000,member:Administrator, Role=CoupGsecGsecCoup)
Member(Id=4, Timestamp=2011-01-20 05:24:51.81, Address=192.168.3.5:8094, MachineId=40086, Location=machine:blackjack,process:9432,member:Administrator, Role=DJclientRestarter)
Member(Id=5, Timestamp=2011-01-18 21:55:47.288, Address=192.168.3.6:8088, MachineId=59924, Location=machine:mothra,process:3408,member:Administrator, Role=CoherenceServer)
Member(Id=6, Timestamp=2011-01-18 21:55:52.685, Address=192.168.3.6:8090, MachineId=59924, Location=machine:mothra,process:3368,member:Administrator, Role=CoherenceServer)
Member(Id=7, Timestamp=2011-01-18 21:56:00.007, Address=192.168.3.6:8092, MachineId=59924, Location=machine:mothra,process:10596,member:Administrator, Role=CoherenceServer)
Member(Id=8, Timestamp=2011-01-18 21:57:23.994, Address=192.168.3.6:8094, MachineId=59924, Location=machine:mothra,process:12668,member:Administrator, Role=CoherenceServer)
Member(Id=9, Timestamp=2011-01-20 05:48:03.175, Address=192.168.3.25:8088, MachineId=27417, Location=machine:amd6,process:4596,member:Administrator, Role=OmsOms)
Member(Id=10, Timestamp=2011-01-20 06:00:06.078, Address=192.168.3.6:8096, MachineId=59924, Location=machine:mothra,process:10072,member:Administrator, Role=PositioneditorQuoteProcessor)
Member(Id=12, Timestamp=2011-01-20 08:36:19.055, Address=192.168.3.5:8096, MachineId=40086, Location=machine:blackjack,process:8252,member:CL, Role=MainTest)
Member(Id=13, Timestamp=2011-01-20 06:39:25.972, Address=192.168.3.6:8098, MachineId=59924, Location=machine:mothra,process:6992,member:Administrator, Role=Execution_viewer)
Member(Id=14, Timestamp=2011-01-20 08:50:47.26, Address=192.168.3.6:8100, MachineId=59924, Location=machine:mothra,process:11680,member:Administrator, Role=Execution_viewer)
Member(Id=15, Timestamp=2011-01-20 10:21:49.988, Address=192.168.3.8:8088, MachineId=27400, Location=machine:box2,process:4148,member:andrew, Role=CoherenceConsole)
)
RecycleMillis=1200000
RecycleSet=MemberSet(Size=0, BitSetCount=0
)
)
==> coherence_server_4.Administrator.192.168.3.6.20110118_0957PM.gc <==
132210.882: [GC 318472K->183743K(1338880K), 0.0241680 secs]
==> coherence_server_1.Administrator.192.168.3.6.20110118_0955PM.gc <==
132301.821: [GC 682215K->548866K(1338880K), 0.0151272 secs]
132307.731: [GC 684546K->550291K(1338880K), 0.0155802 secs]
==> coherence_server_2.Administrator.192.168.3.6.20110118_0955PM.gc <==
132297.843: [GC 840561K->705750K(1338880K), 0.0327622 secs]
132303.444: [GC 841430K->705130K(1338880K), 0.0248750 secs]
==> coherence_server_3.Administrator.192.168.3.6.20110118_0955PM.gc <==
132295.258: [GC 226846K->91827K(1338880K), 0.0234439 secs]
==> coherence_server_2.Administrator.192.168.3.6.20110118_0955PM.log <==
2011-01-20 10:40:55.521/132308.423 Oracle Coherence GE 3.6.0.0 <D6> (thread=PacketPublisher, member=6): Member(Id=5, Timestamp=2011-01-18 21:55:47.288, Address=192.168.3.6:8088, MachineId=59924, Location=machine:mothra,process:3408,me
mber:Administrator, Role=CoherenceServer) has failed to respond to 17 packets; declaring this member as paused.
2011-01-20 10:40:55.848/132308.750 Oracle Coherence GE 3.6.0.0 <D6> (thread=PacketPublisher, member=6): Experienced a 327 ms communication delay (probable remote GC) with Member(Id=5, Timestamp=2011-01-18 21:55:47.288, Address=192.168
.3.6:8088, MachineId=59924, Location=machine:mothra,process:3408,member:Administrator, Role=CoherenceServer); 69 packets rescheduled, PauseRate=1.0E-4, Threshold=367
==> coherence_server_4.Administrator.192.168.3.6.20110118_0957PM.log <==
2011-01-20 10:40:55.530/132217.203 Oracle Coherence GE 3.6.0.0 <D6> (thread=PacketPublisher, member=8): Member(Id=5, Timestamp=2011-01-18 21:55:47.288, Address=192.168.3.6:8088, MachineId=59924, Location=machine:mothra,process:3408,me
mber:Administrator, Role=CoherenceServer) has failed to respond to 17 packets; declaring this member as paused.
2011-01-20 10:40:55.848/132217.521 Oracle Coherence GE 3.6.0.0 <D6> (thread=PacketPublisher, member=8): Experienced a 317 ms communication delay (probable remote GC) with Member(Id=5, Timestamp=2011-01-18 21:55:47.288, Address=192.168
.3.6:8088, MachineId=59924, Location=machine:mothra,process:3408,member:Administrator, Role=CoherenceServer); 42 packets rescheduled, PauseRate=0.0, Threshold=426
==> coherence_server_4.Administrator.192.168.3.6.20110118_0957PM.gc <==
132220.836: [GC 319423K->183177K(1338880K), 0.0096245 secs]
==> coherence_server_1.Administrator.192.168.3.6.20110118_0955PM.gc <==
132313.019: [GC 685971K->547749K(1338880K), 0.0314242 secs]
==> coherence_server_2.Administrator.192.168.3.6.20110118_0955PM.gc <==
132308.913: [GC 840810K->707133K(1338880K), 0.0276408 secs]
==> coherence_server_3.Administrator.192.168.3.6.20110118_0955PM.gc <==
132304.729: [GC 227507K->91929K(1338880K), 0.0241285 secs]
==> coherence_server_4.Administrator.192.168.3.6.20110118_0957PM.gc <==
132231.087: [GC 318857K->183729K(1338880K), 0.0161122 secs]
==> coherence_server_1.Administrator.192.168.3.6.20110118_0955PM.gc <==
132320.347: [GC 683429K->550760K(1338880K), 0.0261682 secs]
132327.143: [GC 686440K->550474K(1338880K), 0.0124790 secs]
==> coherence_server_2.Administrator.192.168.3.6.20110118_0955PM.gc <==
132315.204: [GC 842813K->707091K(1338880K), 0.0308590 secs]
132321.668: [GC 842771K->709678K(1338880K), 0.0293830 secs]
==> coherence_server_3.Administrator.192.168.3.6.20110118_0955PM.gc <==
132314.371: [GC 227609K->93093K(1338880K), 0.0144011 secs]
==> coherence_server_4.Administrator.192.168.3.6.20110118_0957PM.gc <==
132240.704: [GC 319409K->183510K(1338880K), 0.0098917 secs]
==> coherence_server_1.Administrator.192.168.3.6.20110118_0955PM.gc <==
132331.818: [GC 686154K->550354K(1338880K), 0.0147556 secs]
132338.050: [GC 686034K->550616K(1338880K), 0.0167612 secs]
==> coherence_server_2.Administrator.192.168.3.6.20110118_0955PM.gc <==
132326.089: [GC 845358K->708502K(1338880K), 0.0307610 secs]
132332.257: [GC 844182K->709745K(1338880K), 0.0267290 secs]
==> coherence_server_3.Administrator.192.168.3.6.20110118_0955PM.gc <==
132322.808: [GC 228773K->92228K(1338880K), 0.0115491 secs]
==> coherence_server_2.Administrator.192.168.3.6.20110118_0955PM.log <==
2011-01-20 10:41:25.110/132338.012 Oracle Coherence GE 3.6.0.0 <D6> (thread=PacketPublisher, member=6): Member(Id=5, Timestamp=2011-01-18 21:55:47.288, Address=192.168.3.6:8088, MachineId=59924, Location=machine:mothra,process:3408,me
mber:Administrator, Role=CoherenceServer) has failed to respond to 17 packets; declaring this member as paused.
2011-01-20 10:41:25.124/132338.026 Oracle Coherence GE 3.6.0.0 <D8> (thread=PacketPublisher, member=6): Experienced a 14 ms communication delay (probable remote GC) with Member(Id=5, Timestamp=2011-01-18 21:55:47.288, Address=192.168.
3.6:8088, MachineId=59924, Location=machine:mothra,process:3408,member:Administrator, Role=CoherenceServer); 66 packets rescheduled, PauseRate=1.0E-4, Threshold=435
==> coherence_server_4.Administrator.192.168.3.6.20110118_0957PM.gc <==
132249.945: [GC 319190K->182693K(1338880K), 0.0342840 secs]
==> coherence_server_1.Administrator.192.168.3.6.20110118_0955PM.gc <==
132343.603: [GC 686296K->550138K(1338880K), 0.0249380 secs]
132349.395: [GC 685818K->551474K(1338880K), 0.0166090 secs]
==> coherence_server_2.Administrator.192.168.3.6.20110118_0955PM.gc <==
132337.252: [GC 845425K->709330K(1338880K), 0.0220137 secs]
132342.579: [GC 845010K->709136K(1338880K), 0.0287769 secs]
==> coherence_server_3.Administrator.192.168.3.6.20110118_0955PM.gc <==
132331.062: [GC 227908K->92625K(1338880K), 0.0208366 secs]
==> coherence_server_4.Administrator.192.168.3.6.20110118_0957PM.gc <==
132259.926: [GC 318373K->184208K(1338880K), 0.0158035 secs]
==> coherence_server_1.Administrator.192.168.3.6.20110118_0955PM.gc <==
132355.216: [GC 687154K->551166K(1338880K), 0.0136097 secs]
==> coherence_server_2.Administrator.192.168.3.6.20110118_0955PM.gc <==
132347.860: [GC 844816K->708928K(1338880K), 0.0276086 secs]
132353.742: [GC 844608K->709988K(1338880K), 0.0250959 secs]
==> coherence_server_3.Administrator.192.168.3.6.20110118_0955PM.gc <==
132340.000: [GC 228305K->93025K(1338880K), 0.0193733 secs]
132348.642: [GC 228705K->93703K(1338880K), 0.0150384 secs]
==> coherence_server_4.Administrator.192.168.3.6.20110118_0957PM.gc <==
132269.516: [GC 319888K->183008K(1338880K), 0.0200467 secs]
==> coherence_server_1.Administrator.192.168.3.6.20110118_0955PM.gc <==
132360.769: [GC 686846K->552805K(1338880K), 0.0158615 secs]
132367.688: [GC 688485K->551065K(1338880K), 0.0151886 secs]
==> coherence_server_2.Administrator.192.168.3.6.20110118_0955PM.gc <==
132359.008: [GC 845668K->709433K(1338880K), 0.0332839 secs]
==> coherence_server_3.Administrator.192.168.3.6.20110118_0955PM.gc <==
132358.636: [GC 229383K->94422K(1338880K), 0.0128452 secs]
==> coherence_server_4.Administrator.192.168.3.6.20110118_0957PM.gc <==
132280.113: [GC 318688K->185573K(1338880K), 0.0155607 secs]
==> coherence_server_1.Administrator.192.168.3.6.20110118_0955PM.gc <==
132374.375: [GC 686666K->555011K(1338880K), 0.0135171 secs]
==> coherence_server_2.Administrator.192.168.3.6.20110118_0955PM.gc <==
132365.756: [GC 845113K->710648K(1338880K), 0.0311965 secs]
132371.419: [GC 846328K->712085K(1338880K), 0.0196361 secs]
==> coherence_server_3.Administrator.192.168.3.6.20110118_0955PM.gc <==
132368.031: [GC 230102K->92812K(1338880K), 0.0180031 secs]
==> coherence_server_4.Administrator.192.168.3.6.20110118_0957PM.gc <==
132290.733: [GC 321253K->186199K(1338880K), 0.0098142 secs]
==> coherence_server_1.Administrator.192.168.3.6.20110118_0955PM.gc <==
132380.689: [GC 690691K->553579K(1338880K), 0.0168928 secs]
132386.818: [GC 689259K->553010K(1338880K), 0.0165164 secs]
==> coherence_server_2.Administrator.192.168.3.6.20110118_0955PM.gc <==
132377.292: [GC 847765K->709891K(1338880K), 0.0280660 secs]
132383.521: [GC 845571K->711339K(1338880K), 0.0233020 secs]
==> coherence_server_3.Administrator.192.168.3.6.20110118_0955PM.gc <==
132377.636: [GC 228492K->94649K(1338880K), 0.0152827 secs]
==> coherence_server_4.Administrator.192.168.3.6.20110118_0957PM.gc <==
132302.684: [GC 321879K->184985K(1338880K), 0.0300545 secs]
==> coherence_server_1.Administrator.192.168.3.6.20110118_0955PM.gc <==
132394.020: [GC 688690K->553424K(1338880K), 0.0607984 secs]
==> coherence_server_2.Administrator.192.168.3.6.20110118_0955PM.gc <==
132390.366: [GC 847019K->711928K(1338880K), 0.0254368 secs]
==> coherence_server_3.Administrator.192.168.3.6.20110118_0955PM.gc <==
132388.236: [GC 230329K->95891K(1338880K), 0.0164623 secs]
==> coherence_server_3.Administrator.192.168.3.6.20110118_0955PM.log <==
2011-01-20 10:42:24.003/132389.635 Oracle Coherence GE 3.6.0.0 <D6> (thread=PacketPublisher, member=7): Member(Id=6, Timestamp=2011-01-18 21:55:52.685, Address=192.168.3.6:8090, MachineId=59924, Location=machine:mothra,process:3368,me
mber:Administrator, Role=CoherenceServer) has failed to respond to 17 packets; declaring this member as paused.
2011-01-20 10:42:24.015/132389.647 Oracle Coherence GE 3.6.0.0 <D8> (thread=PacketPublisher, member=7): Experienced a 13 ms communication delay (probable remote GC) with Member(Id=6, Timestamp=2011-01-18 21:55:52.685, Address=192.168.
3.6:8090, MachineId=59924, Location=machine:mothra,process:3368,member:Administrator, Role=CoherenceServer); 278 packets rescheduled, PauseRate=0.0, Threshold=619
==> coherence_server_4.Administrator.192.168.3.6.20110118_0957PM.log <==
2011-01-20 10:42:24.010/132305.683 Oracle Coherence GE 3.6.0.0 <D6> (thread=PacketPublisher, member=8): Member(Id=6, Timestamp=2011-01-18 21:55:52.685, Address=192.168.3.6:8090, MachineId=59924, Location=machine:mothra,process:3368,me
mber:Administrator, Role=CoherenceServer) has failed to respond to 17 packets; declaring this member as paused.
2011-01-20 10:42:24.024/132305.697 Oracle Coherence GE 3.6.0.0 <D8> (thread=PacketPublisher, member=8): Experienced a 14 ms communication delay (probable remote GC) with Member(Id=6, Timestamp=2011-01-18 21:55:52.685, Address=192.168.
3.6:8090, MachineId=59924, Location=machine:mothra,process:3368,member:Administrator, Role=CoherenceServer); 352 packets rescheduled, PauseRate=0.0, Threshold=651
==> coherence_server_4.Administrator.192.168.3.6.20110118_0957PM.gc <==
132311.160: [GC 320665K->187167K(1338880K), 0.0134196 secs]
==> coherence_server_1.Administrator.192.168.3.6.20110118_0955PM.gc <==
132400.942: [GC 689104K->555673K(1338880K), 0.0131572 secs]
132406.374: [GC 691353K->556143K(1338880K), 0.0178320 secs]
==> coherence_server_2.Administrator.192.168.3.6.20110118_0955PM.gc <==
132396.137: [GC 847608K->714214K(1338880K), 0.0359755 secs]
132401.322: [GC 849894K->714334K(1338880K), 0.0241271 secs]
==> coherence_server_3.Administrator.192.168.3.6.20110118_0955PM.gc <==
132395.980: [GC 231571K->94653K(1338880K), 0.0173678 secs]
==> coherence_server_4.Administrator.192.168.3.6.20110118_0957PM.gc <==
132322.507: [GC 322847K->188115K(1338880K), 0.0314827 secs]
==> coherence_server_1.Administrator.192.168.3.6.20110118_0955PM.gc <==
132411.851: [GC 691823K->555581K(1338880K), 0.0145664 secs]
132419.441: [GC 691261K->556816K(1338880K), 0.0148239 secs]
==> coherence_server_2.Administrator.192.168.3.6.20110118_0955PM.gc <==
132407.135: [GC 850014K->713397K(1338880K), 0.0958260 secs]
132413.914: [GC 849077K->713390K(1338880K), 0.0367655 secs]
==> coherence_server_3.Administrator.192.168.3.6.20110118_0955PM.gc <==
132406.540: [GC 230333K->95392K(1338880K), 0.0195952 secs]
==> coherence_server_4.Administrator.192.168.3.6.20110118_0957PM.gc <==
132332.398: [GC 323795K->187022K(1338880K), 0.0255665 secs]
==> coherence_server_1.Administrator.192.168.3.6.20110118_0955PM.gc <==
132425.229: [GC 692496K->556178K(1338880K), 0.0160146 secs]
==> coherence_server_2.Administrator.192.168.3.6.20110118_0955PM.gc <==
132419.255: [GC 849070K->714208K(1338880K), 0.0215988 secs]
==> coherence_server_3.Administrator.192.168.3.6.20110118_0955PM.gc <==
132415.334: [GC 231072K->97237K(1338880K), 0.0254270 secs]