Gaps in GC Logging!
816504Nov 19 2010 — edited Nov 22 2010Hi Folks,
[It is possible there might be a better forum for this. If so, can somebody please point me.]
We're having real problems with a high performance, high duration (12hrs+) simulation written in Java. Essentially we are seeing mysterious 'gaps' of seconds during which nothing seems to happen. No log messages are being written by the application, and GC logging apparently has coincident gaps. This causes us misery because it causes timeouts on a server which effectively breaks the simulation. We have run with the assumption that large GC runs have become problem and we monitor GC activity via gc logging. However, we are seeing these gaps when memory consumption and GC seem to be 'normal'. We have also tried various GC parameters, and although we have improved the memory 'behaviour', we still see these gaps.
It has now dawned on us that GC itself is stalling - in other words,perhaps the whole JVM stalls. After the 'stall' everything resumes as per normal - except that timeouts generate failures.
Unfortuantely these log files are too large to post, but here is a small snippet from gc.log which shows a gap...
38620.641: [GC 38620.641: [ParNew: 113847K->11489K(118016K), 0.0271140 secs] 1684447K->1583182K(2084096K) icms_dc=8 , 0.0272330 secs] [Times: user=0.21 sys=0.00, real=0.03 secs]
38620.933: [GC 38620.933: [ParNew: 116449K->9040K(118016K), 0.0235770 secs] 1688142K->1582408K(2084096K) icms_dc=8 , 0.0236980 secs] [Times: user=0.14 sys=0.00, real=0.03 secs]
*38621.218: [GC 38621.218: [ParNew (promotion failed): 114000K->112000K(118016K), 0.4032980 secs]38621.621: [CMS38621.811: [CMS-concurrent-mark: 7.517/80.724 secs] [Times: user=428.08 sys=199.24, real=80.71 secs]*
*(concurrent mode failure): 1576932K->1029050K(1966080K), 6.0958110 secs] 1687368K->1029050K(2084096K), [CMS Perm : 47736K->46200K(65536K)] icms_dc=8 , 6.4993050 secs] [Times: user=6.81 sys=0.25, real=6.50 secs]*
*38627.824: [GC [1 CMS-initial-mark: 1029050K(1966080K)] 1079971K(2084096K), 0.1177220 secs] [Times: user=0.12 sys=0.00, real=0.12 secs]*
*38627.942: [CMS-concurrent-mark-start]*
38628.089: [GC 38628.089: [ParNew: 104960K->2656K(118016K), 0.0072050 secs] 1134010K->1031706K(2084096K) icms_dc=8 , 0.0073010 secs] [Times: user=0.04 sys=0.01, real=0.00 secs]
Can anybody please shed any light on what might be going on?
Why would GC apparent stop like this?
Has anybody had similar experiences?
Many Thanks for any help/pointers
Ian Hunter
====
A bit more info. The GC related options we are currently running with are...
-server -Xmx2048M -Xms2048M -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode -XX:+CMSIncrementalPacing -XX:CMSIncrementalDutyCycle=10 -XX:CMSIncrementalDutyCycleMin=0 -verbose:gc -Xloggc:gc.log -XX:+PrintGCDetails