Running a Hadoop namenode proc on JDK 1.8.0.92 with Xmx and Xmx = 75Gb we observed a spike in system memory within x interval. And slowly an OOM happened.
During the debugging process we found the following:
RSS went higher than Xmx
Is that heap - no it's not heap (could see the usage in JMX which is under control)
Is that non-heap or leak - From heap dump and jmap -histo no leak observed.
Then what? Enabled NMT on namenode proc env.
Then what? Could see the growth is happening on the Internal Section of Native Memory And the pattern is as follows - when it reaches high count on Thread no's, the committed memory went high and it grows incrementally.
Fixes: Tried -XX: MaxDirectMemorySize=3g (No changes still it's breaching the limit) Tried -Djdk.nio.maxCachedBufferSize (Tried with jdk1.8.0.192 with enabling this option still leak observed) MALLOC_ARENA_MAX - By default this is set to 4 on Hadoop-configs.sh tried with 1 and 2 still leak is happening.
Questions
How to control the internal native memory section?
How to list the cause of this issue and usage of this internal section?
How to figure out the Malloc value in JVM?