Skip to Main Content

Java HotSpot Virtual Machine

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Uncorrelated GC STW pauses in ParNew + sys cpu time spikes from jvm calls

843829Dec 7 2007 — edited Aug 14 2008
Hi, JVM experts.
GC is not an easy subject to grab all its subtleties, so I'm looking for some advice with my situation where to dig further.
The overall picture:
1) Linux smp (2.6.5) x86_64 host with 8G memory, 2x2Ghz xeon (HT) CPUs
Java HotSpot(TM) 64-Bit Server VM (build 1.6.0_02-b05, mixed mode)
2) Heap: -Xms4500M -Xmx4500M -XX:MaxNewSize=128m -XX:NewSize=128m -XX:MaxPermSize=128m -XX:PermSize=128m
3) GC: -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=0 -XX:SurvivorRatio=128 -XX:CMSInitiatingOccupancyFraction=60 -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled -XX:+DisableExplicitGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintClassHistogram

4) Application is running under consistent traffic for two weeks.
5) I monitor the system with "sar"

6) Once or twice a week I observe the kernel-space cpu usage spike to ~90% for a duration <10 seconds.
7) I detect the cpu spike and dump a snapshot of the system situation. It's the JVM which consumes the syscpu time. JVM is sent SIGQUIT at the same instant.

There are two points, I'm rather vague about.
1) All cpu load situations have the following similar pattern in GC log file:

195098.019: [GC 195098.019: [ParNew
Desired survivor size 491520 bytes, new threshold 0 (max 0)
: 129152K->0K(130112K), 0.0754030 secs] 1549718K->1448253K(4607040K), 0.0755550 secs]
Total time for which application threads were stopped: 0.0758180 seconds
Application time: 0.5316910 seconds
Total time for which application threads were stopped: 0.0003550 seconds
Application time: 0.0695990 seconds
Total time for which application threads were stopped: 0.0001620 seconds
Application time: 0.1657730 seconds
Total time for which application threads were stopped: 0.0001780 seconds
Application time: 0.0000350 seconds
Total time for which application threads were stopped: 0.0000440 seconds
Application time: 0.2738210 seconds
Total time for which application threads were stopped: 0.0003530 seconds
Application time: 0.3108570 seconds
195099.448: [GC 195099.448: [ParNew
Desired survivor size 491520 bytes, new threshold 0 (max 0)
: 129151K->0K(130112K), 0.0712390 secs] 1577405K->1476947K(4607040K), 0.0713980 secs]
Total time for which application threads were stopped: 0.0716750 seconds
Application time: 0.0364560 seconds
Total time for which application threads were stopped: 8.2666520 seconds
Application time: 0.0000700 seconds
Total time for which application threads were stopped: 0.0055730 seconds
Application time: 0.0069140 seconds
Total time for which application threads were stopped: 0.0017350 seconds
Application time: 0.0011930 seconds
Total time for which application threads were stopped: 0.0064760 seconds
Application time: 0.0000720 seconds
Total time for which application threads were stopped: 0.0001120 seconds
Application time: 0.0001010 seconds
Total time for which application threads were stopped: 0.0000650 seconds
Application time: 0.0001570 seconds
195107.840: [Full GC 195107.840: [CMS: 1476947K->1092516K(4476928K), 7.6193100 secs] 1488921K->1092516K(4607040K), [CMS Perm : 46862K->46641K(131072K)], 7.6194800 secs]

num #instances #bytes class name
--------------------------------------
1: 1962568 306032400 [Ljava.util.HashMap$Entry;
2: 1962446 125596544 java.util.HashMap
...


The "delay" of the SIGQUIT/Histogram dump can be ~1-2 seconds from the CPU overload detection. I'm curious about the "suspeciously spurious" SWT preceding the cpu spike. What could be the possible reasons?

2) Per thread analysis of the JVM threads hinted me to one thread which sometimes monopolized ~60% of the cpu time in kernel space. This is the native posix thread which "calls back into" JVM on certain events. Could it somehow cause the aforementioned STW?

My attempts to get a gdb stack dump at cpu load event failed (gdb can't resolve the stack in some of the jvm threads and enters infinite loop)

Any advice is greatly appreciated
regards,
- andrey
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Sep 11 2008
Added on Dec 7 2007
1 comment
270 views