Garbage collection issues on heavy load jvm
905310Dec 14 2011 — edited Jan 17 2012Hello,
A JVM is started with the following Memory/GC related options:
-server
-Xms2048m
-Xmx2048m
-XX:MaxPermSize=512m
-XX:-HeapDumpOnOutOfMemoryError
-verbose:gc
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:GCTimeRatio=9
Java version:
Java HotSpot(TM) 64-Bit Server VM (build 16.0-b13, mixed mode)
The server has 4GB of RAM, running GNU/Llinux (kernel 2.6.25.20-0.7 x86_64). The JVM is running Apache Tomcat, which hosts a heavy loaded site (OpenCMS).
The java process is not making use of swap space.
Initially everything is fine (the following lines are extracted from catalina.out):
11.622: [GC [PSYoungGen: 524288K->30711K(611648K)] 524288K->30711K(2009792K), 0.0512050 secs] [Times: user=0.07 sys=0.03, real=0.05 secs]
20.734: [GC [PSYoungGen: 554999K->40084K(611648K)] 554999K->40084K(2009792K), 0.0316040 secs] [Times: user=0.09 sys=0.03, real=0.04 secs]
78.522: [GC [PSYoungGen: 564372K->62217K(611648K)] 564372K->62217K(2009792K), 0.9700730 secs] [Times: user=0.24 sys=2.27, real=0.96 secs]
123.802: [GC [PSYoungGen: 586505K->87342K(611648K)] 586505K->89654K(2009792K), 0.3679460 secs] [Times: user=0.24 sys=0.71, real=0.38 secs]
129.581: [GC [PSYoungGen: 611630K->87334K(598592K)] 613942K->91423K(1996736K), 0.1302340 secs] [Times: user=0.23 sys=0.21, real=0.14 secs]
...
About 5 hours later it is still running fine:
...
18393.534: [GC [PSYoungGen: 662307K->6839K(677760K)] 2036256K->1384199K(2075904K), 0.0264830 secs] [Times: user=0.04 sys=0.00, real=0.03 secs]
18401.523: [GC [PSYoungGen: 660176K->7693K(680704K)] 2037535K->1390565K(2078848K), 0.0232960 secs] [Times: user=0.06 sys=0.01, real=0.02 secs]
18405.616: [GC [PSYoungGen: 661248K->18287K(679552K)] 1018894K->379118K(2077696K), 0.0313450 secs] [Times: user=0.04 sys=0.02, real=0.03 secs]
18409.499: [GC [PSYoungGen: 679535K->6371K(675456K)] 1040366K->381293K(2073600K), 0.0181840 secs] [Times: user=0.04 sys=0.00, real=0.02 secs]
...
More than 7 hours after it started:
...
27815.893: [GC [PSYoungGen: 626577K->40528K(656128K)] 1908471K->1339451K(2054272K), 0.0331920 secs] [Times: user=0.10 sys=0.00, real=0.03 secs]
27820.568: [GC [PSYoungGen: 655409K->42878K(586240K)] 1954331K->1406759K(1984384K), 0.0537190 secs] [Times: user=0.17 sys=0.00, real=0.06 secs]
27826.168: [GC [PSYoungGen: 542340K->60300K(603712K)] 1076041K->594001K(2001856K), 0.0284320 secs] [Times: user=0.06 sys=0.00, real=0.03 secs]
27834.604: [GC [PSYoungGen: 603660K->28766K(619136K)] 1137361K->619097K(2017280K), 0.0358350 secs] [Times: user=0.12 sys=0.01, real=0.04 secs]
27850.055: [GC [PSYoungGen: 563166K->28980K(614272K)] 1153497K->647408K(2012416K), 0.0359200 secs] [Times: user=0.14 sys=0.00, real=0.04 secs]
...
A little later a full GC is performed for the first time:
...
28310.088: [GC [PSYoungGen: 233088K->144771K(466048K)] 1313924K->1225608K(1864192K), 0.1431430 secs] [Times: user=0.34 sys=0.01, real=0.15 secs]
28310.521: [GC [PSYoungGen: 377859K->80784K(466048K)] 1458696K->1306772K(1864192K), 0.1971250 secs] [Times: user=0.64 sys=0.00, real=0.20 secs]
28317.621: [Full GC [PSYoungGen: 233088K->0K(466048K)] [PSOldGen: 1361843K->1020104K(1398144K)] 1594931K->1020104K(1864192K) [PSPermGen: 170495K->170201K(182016K)], 2.8967150 secs] [Times: user=2.90 sys=0.01, real=2.90 secs]
28321.253: [GC [PSYoungGen: 233088K->9362K(466048K)] 1253192K->1029467K(1864192K), 0.0598610 secs] [Times: user=0.16 sys=0.01, real=0.06 secs]
...
Then it starts to full GC forever:
...
28333.072: [Full GC [PSYoungGen: 233088K->89524K(466048K)] [PSOldGen: 1383371K->1398143K(1398144K)] 1616459K->1487668K(1864192K) [PSPermGen: 170315K->170315K(179776K)], 2.9065990 secs] [Times: user=2.90 sys=0.00, real=2.91 secs]
28336.190: [Full GC [PSYoungGen: 233088K->142894K(466048K)] [PSOldGen: 1398143K->1398143K(1398144K)] 1631231K->1541038K(1864192K) [PSPermGen: 170322K->170322K(178688K)], 2.9158660 secs] [Times: user=2.91 sys=0.01, real=2.92 secs]
28339.233: [Full GC [PSYoungGen: 233088K->121673K(466048K)] [PSOldGen: 1398143K->1398143K(1398144K)] 1631231K->1519817K(1864192K) [PSPermGen: 170332K->170247K(177600K)], 3.7541460 secs] [Times: user=3.75 sys=0.00, real=3.76 secs]
28343.126: [Full GC [PSYoungGen: 233088K->162479K(466048K)] [PSOldGen: 1398143K->1398143K(1398144K)] 1631231K->1560623K(1864192K) [PSPermGen: 170247K->170247K(176896K)], 2.9491090 secs] [Times: user=2.95 sys=0.00, real=2.95 secs]
...
Then full GC does not cut it anymore:
...
28446.398: [Full GC [PSYoungGen: 233088K->231117K(466048K)] [PSOldGen: 1398143K->1398143K(1398144K)] 1631231K->1629261K(1864192K) [PSPermGen: 168786K->168786K(169088K)] GC time would exceed GCTimeLimit of 98%
28449.498: [Full GC [PSYoungGen: 233088K->233088K(466048K)] [PSOldGen: 1398143K->1398143K(1398144K)] 1631231K->1631231K(1864192K) [PSPermGen: 168786K->168786K(169088K)] GC time would exceed GCTimeLimit of 98%
28452.531: [Full GC [PSYoungGen: 233088K->231690K(466048K)] [PSOldGen: 1398143K->1398143K(1398144K)] 1631231K->1629834K(1864192K) [PSPermGen: 168786K->168786K(169088K)] GC time would exceed GCTimeLimit of 98%
...
And finally the jvm blows up:
...
29152.196: [Full GC [PSYoungGen: 233088K->220887K(466048K)] [PSOldGen: 1398143K->1398144K(1398144K)] 1631231K->1619031K(1864192K) [PSPermGen: 168553K->168551K(168576K)] GC time would exceed GCTimeLimit of 98%
Exception java.lang.OutOfMemoryError: GC overhead limit exceeded
The heap dump does not reveal any clear memory leak or top consumer. In fact, most of the objects are marked "unreachable": of the 1.8GB heap dump, 1.3GB is unreachable, according to MAT (Memory Analyser Tool). I suspect the GC is not beeing aggressive enough for the load on the JVM, but I am not sure what is the best option to fix this. Initially the GCTimeRatio option was absent, then I tried 19, then 9. The result is always the same: out of memory exception after some time.
Thanks in advance,
Marcio