We've noticed some odd behavior with CMS during some application stress testing. The application tier in question accepts large numbers of persistant clients connections, multiplexes down to a few streams and passes on to the next application tier.
-Xmx2g -Xms2g -Xmn1g -XX:+UseParNewGC -XX:+UseConcMarkSweepGC \
-XX:SurvivorRatio=256 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=1 \
-XX:+DisableExplicitGC
Most of the object are temporary, hence the relatively large young gen, and low tenuring threshold.
When CMS kicks in for the first time @ 50% Old Gen capacity, it successfullly reduces it down to ~22% - pause times look good. The problem is, after it runs the first time, it continues to run over and over at very short intervals for reasons I can't figure out. I've searched through the bug database, and I can't find anything that resembles this behavior.
This first log is the output from jstat -gcutil around the time of the first CMS run:
13:13:12: S0 S1 E O P YGC YGCT FGC FGCT GCT
...
15:27:29: 0.00 13.18 59.86 49.58 78.34 1535 15.431 0 0.000 15.431
15:27:39: 0.00 12.81 86.85 49.64 78.34 1537 15.452 0 0.000 15.452
15:27:49: 10.77 0.00 12.35 49.71 78.34 1540 15.485 0 0.000 15.485
15:27:59: 11.02 0.00 36.06 49.76 78.34 1542 15.508 0 0.000 15.508
15:28:09: 14.38 0.00 73.79 49.81 78.34 1544 15.530 0 0.000 15.530
15:28:19: 0.00 13.36 2.51 49.89 78.34 1547 15.563 0 0.000 15.563
15:28:29: 0.00 10.20 33.73 49.94 78.34 1549 15.586 0 0.000 15.586
15:28:39: 0.00 11.37 65.14 49.99 78.34 1551 15.607 0 0.000 15.607
15:28:49: 0.00 9.61 86.89 36.46 78.36 1553 15.629 2 0.564 16.193
15:28:59: 13.96 0.00 8.23 21.89 78.36 1556 15.681 5 1.236 16.917
15:29:09: 9.78 0.00 53.03 21.89 78.36 1558 15.709 8 1.638 17.347
15:29:19: 20.34 0.00 78.44 21.86 78.36 1560 15.740 10 2.364 18.104
15:29:29: 0.00 9.38 15.40 21.89 78.36 1563 15.781 13 2.740 18.521
15:29:39: 0.00 11.17 55.50 21.90 78.36 1565 15.809 15 3.119 18.929
15:29:49: 0.00 10.80 82.75 21.90 78.36 1567 15.838 17 3.477 19.315
15:29:59: 12.05 0.00 10.76 21.94 78.36 1570 15.882 19 3.841 19.723
15:30:09: 43.54 0.00 43.29 21.93 78.36 1572 15.923 21 4.208 20.131
15:30:19: 11.97 0.00 69.29 21.89 78.36 1574 15.953 24 4.948 20.901
15:30:29: 0.00 10.64 12.60 21.92 78.36 1577 15.995 27 5.321 21.316
15:30:39: 0.00 12.09 53.40 21.92 78.36 1579 16.023 29 5.690 21.713
...
And, here is a portion of the gc.log from an identical run, beginning shortly before first CMS starts:
...
7634.050: [GC 7634.051: [ParNew: 1041089K->444K(1044544K), 0.0125843 secs] 1562456K->522105K(2093120K), 0.0127342 secs]
7637.897: [GC 7637.897: [ParNew: 1040956K->1796K(1044544K), 0.0243873 secs] 1562617K->523723K(2093120K), 0.0245495 secs]
7642.007: [GC 7642.007: [ParNew: 1042308K->428K(1044544K), 0.0120285 secs] 1564235K->522621K(2093120K), 0.0121912 secs]
7646.197: [GC 7646.197: [ParNew: 1040940K->485K(1044544K), 0.0127692 secs] 1563133K->522960K(2093120K), 0.0129331 secs]
7649.921: [GC 7649.921: [ParNew: 1040997K->412K(1044544K), 0.0123867 secs] 1563472K->523160K(2093120K), 0.0125375 secs]
7654.347: [GC 7654.348: [ParNew: 1040924K->2013K(1044544K), 0.0179963 secs] 1563672K->525003K(2093120K), 0.0181659 secs]
7658.307: [GC 7658.308: [ParNew: 1042525K->446K(1044544K), 0.0125344 secs] 1565515K->523721K(2093120K), 0.0127004 secs]
7662.498: [GC 7662.498: [ParNew: 1040958K->452K(1044544K), 0.0127244 secs] 1564233K->524006K(2093120K), 0.0129010 secs]
7666.586: [GC 7666.586: [ParNew: 1040964K->428K(1044544K), 0.0121975 secs] 1564518K->524265K(2093120K), 0.0123686 secs]
7670.777: [GC 7670.778: [ParNew: 1040940K->1519K(1044544K), 0.0209896 secs] 1564777K->525612K(2093120K), 0.0211376 secs]
7674.852: [GC 7674.852: [ParNew: 1042031K->460K(1044544K), 0.0119284 secs] 1566124K->524800K(2093120K), 0.0121251 secs]
7674.865: [GC [1 CMS-initial-mark: 524340K(1048576K)] 524857K(2093120K), 0.0045928 secs]
7674.876: [CMS-concurrent-mark-start]
7677.195: [CMS-concurrent-mark: 2.319/2.319 secs]
7677.195: [CMS-concurrent-preclean-start]
7677.250: [CMS-concurrent-preclean: 0.052/0.055 secs]
7677.250: [CMS-concurrent-abortable-preclean-start]
7679.157: [GC 7679.158: [ParNew: 1040972K->437K(1044544K), 0.0129911 secs] 1565312K->525067K(2093120K), 0.0131378 secs]
7681.729: [CMS-concurrent-abortable-preclean: 2.347/4.480 secs]
7681.737: [GC[YG occupancy: 669523 K (1044544 K)]7681.737: [Rescan (parallel) , 0.5274024 secs]7682.265: [weak refs processing, 0.1709008 secs] [1 CMS-remark: 524630K(1048576K)] 1194154K(2093120K), 0.7040315 secs]
7682.754: [CMS-concurrent-sweep-start]
7683.244: [GC 7683.244: [ParNew: 1040949K->428K(1044544K), 0.0229211 secs] 1493370K->453125K(2093120K), 0.0230680 secs]
7684.384: [CMS-concurrent-sweep: 1.606/1.631 secs]
7684.384: [CMS-concurrent-reset-start]
7684.403: [CMS-concurrent-reset: 0.018/0.018 secs]
7686.409: [GC [1 CMS-initial-mark: 302653K(1048576K)] 1059834K(2093120K), 0.3308640 secs]
7686.740: [CMS-concurrent-mark-start]
7688.150: [CMS-concurrent-mark: 1.411/1.411 secs]
7688.150: [CMS-concurrent-preclean-start]
7688.186: [CMS-concurrent-preclean: 0.034/0.035 secs]
7688.186: [CMS-concurrent-abortable-preclean-start]
7688.465: [GC 7688.465: [ParNew: 1040940K->1464K(1044544K), 0.0165840 secs] 1343593K->304365K(2093120K), 0.0167509 secs]
7690.093: [CMS-concurrent-abortable-preclean: 1.012/1.907 secs]
7690.095: [GC[YG occupancy: 522484 K (1044544 K)]7690.095: [Rescan (parallel) , 0.3665541 secs]7690.462: [weak refs processing, 0.0003850 secs] [1 CMS-remark: 302901K(1048576K)] 825385K(2093120K), 0.3670690 secs]
7690.463: [CMS-concurrent-sweep-start]
7691.771: [GC 7691.771: [ParNew: 1041976K->428K(1044544K), 0.0123657 secs] 1319021K->277787K(2093120K), 0.0125519 secs]
7694.077: [CMS-concurrent-sweep: 3.593/3.614 secs]
7694.077: [CMS-concurrent-reset-start]
7694.106: [CMS-concurrent-reset: 0.029/0.029 secs]
7695.917: [GC 7695.918: [ParNew: 1040940K->565K(1044544K), 0.0132790 secs] 1278564K->238400K(2093120K), 0.0134459 secs]
7695.945: [GC [1 CMS-initial-mark: 237835K(1048576K)] 257823K(2093120K), 0.0048516 secs]
7695.950: [CMS-concurrent-mark-start]
7697.487: [CMS-concurrent-mark: 1.537/1.537 secs]
7697.487: [CMS-concurrent-preclean-start]
7697.524: [CMS-concurrent-preclean: 0.033/0.036 secs]
7697.524: [CMS-concurrent-abortable-preclean-start]
7700.188: [GC 7700.188: [ParNew: 1041077K->580K(1044544K), 0.0140529 secs] 1278912K->238712K(2093120K), 0.0142205 secs]
7702.044: [CMS-concurrent-abortable-preclean: 1.499/4.520 secs]
7702.047: [GC[YG occupancy: 523697 K (1044544 K)]7702.047: [Rescan (parallel) , 0.3507112 secs]7702.398: [weak refs processing, 0.0007348 secs] [1 CMS-remark: 238132K(1048576K)] 761829K(2093120K), 0.3515838 secs]
7702.443: [CMS-concurrent-sweep-start]
7702.994: [CMS-concurrent-sweep: 0.552/0.552 secs]
7702.994: [CMS-concurrent-reset-start]
7703.051: [CMS-concurrent-reset: 0.057/0.057 secs]
7704.137: [GC 7704.137: [ParNew: 1041092K->436K(1044544K), 0.0134015 secs] 1278394K->238028K(2093120K), 0.0135497 secs]
...
I'm just not sure why CMS continues to run once the old gen is reduced to ~22%. Any thoughts would be appreciated.
Thanks,
__Jason