Skip to Main Content

Berkeley DB Family

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Java G1 GC and the off-heap cache

Greybird-OracleAug 31 2015 — edited Aug 31 2015

This is my second follow-up post on the new off-heap cache feature. We've seen very good results when using G1 GC.

--mark

Tuning G1 GC Performance for JE In-Memory Workloads

This note assumes that the reader understands the basics of Java GC, and is also familiar with JE cache sizing and the DbCacheSize documentation:http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/util/DbCacheSize.html

As part of our recent off-heap cache testing, we have been testing JE performance using large data sets with high throughput, in-memory workloads where GC overhead can be high and long pauses can occur. With the help of Yu Zhang (see her G1 GC blog: https://blogs.oracle.com/g1gc/) we have arrived at a set of G1 GC parameters that work well in our tests. In these tests we used the Oracle JDK version 8u40.

There are two variants of these parameters, and which one works best depends on whether young generation or old generation pauses are the biggest issue. For example, in our off-heap cache testing we ran two different workloads:

  1. All BINs (Btree bottom internal nodes) fit in the main cache, and all LNs (leaf nodes) fit in the off-heap cache. CacheMode.EVICT_LN is used, so LNs and BINs are strictly divided between the two caches. In this situation, old generation GC is not a big issue because BINs (which do become old over time) are never evicted from the main cache. However, young generation GC can be an issue, because LNs are short lived and are being moved from the off-heap cache into the Java heap at a very high rate.
  2. A small data size is used and therefore the data is embedded in the BINs, and LNs are immediately discarded and never used again. (For more on embedded LNs see:http://docs.oracle.com/cd/E17277_02/html/java/com/sleepycat/je/EnvironmentConfig.html#TREE_MAX_EMBEDDED_LN). In this situation, the in-memory data set is composed almost exclusively of BINs. All BINs fit in the two caches (main and off-heap), but BINs are frequently moved between the two caches as records are accessed by random keys, and then evicted from the main cache according to LRU. This is a worst case scenario for old generation GC and long pauses can occur.

These tests demonstrate the extreme ends of these two situations, and of course there are workloads that are in between the two. If you are experiencing long GC pauses in your application, the important thing is to identify whether young generation or old generation GC is the biggest issue. The best resource for this is GC log itself, along with Yu Zhang's blog. Another thing that can help, if you are tuning JE performance and trying to correlate GC pauses with other factors, is to look at the 'Jvm:G1 Old Generation.time' and 'Jvm:G1 Young Generation.time' columns in the je.stat.csv file.

Parameters to Minimize Old Generation Pauses

In addition to using the following parameters to minimize old generation pauses, we found it very important to leave enough free space in the Java heap as GC working space. In our tests we used a 31GiB heap and an 18GiB JE main cache, leaving around 12GiB free.

Also, -XX:ConcGCThreads should probably be adjusted based on the number of cores and number of active app threads. I used -XX:ConcGCThreads=12 in my tests on a 32 core machine with 24 active app threads.

-XX:+UseG1GC
-XX:+UnlockDiagnosticVMOptions
-XX:+UnlockExperimentalVMOptions
-XX:G1NewSizePercent=1
-XX:MaxGCPauseMillis=100
-XX:InitiatingHeapOccupancyPercent=85
-XX:G1HeapRegionSize=32m
-XX:G1MixedGCCountTarget=32
-XX:G1RSetRegionEntries=2560
-XX:G1HeapWastePercent=5
-XX:ConcGCThreads=12
-XX:-ResizePLAB
-XX:+DisableExplicitGC

In addition we used the following parameters to collect information for GC log analysis.

-XX:+PrintGC
-XX:+PrintAdaptiveSizePolicy
-XX:+PrintTenuringDistribution
-XX:+PrintGCDetails
-XX:+PrintGCTimeStamps
-Xloggc:<file name>

We also use the following parameters. In my tests I configured a Linux huge page pool to reduce TLB cache misses (I posted about that separately).

-XX:+UseLargePages
-XX:+AlwaysPreTouch
-XX:+UseCompressedOops

I recommend trying the above parameters if old generation pauses are an issue, or simply as a starting point for a new JE-based application.

Parameters to Minimize Young Generation Pauses

The parameters we found worked best to minimize young generation pauses are the same as above (for old generation pauses) with one addition:

-Xmn2g 

However, this added parameter made old generation pauses worse for the workloads where old generation pauses were the main issue. So we could not use a single set of GC parameters for both types of workloads. Therefore, I recommend adding this parameter only if new generation pauses are an issue.

Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Sep 28 2015
Added on Aug 31 2015
0 comments
1,679 views