Hello all,
I have a 3 node production system running 4 databases 11.2.0.4.5 running on 96GB of RAM, just recently we had a multiple nodes evictions and restarts.
After analysis, and running OraChk utility I added 2 kernel parameters and now node eviction has stopped:
vm.min_free_kbytes=524288
vm.swappiness=100
Even before adding these kernel parameters we noticed a high memory consumption caused by oracle processes, that it fills all available memory of the server to a limit that cluster cannot communicate and eventually cause complete freeze to the server.
I have made some analysis, and again from orachk report I found the following points are highly related to the current situation:
Hugepages are not being used by database
AND
PGA allocation for all databases is more than total memory available on this system
For huge pages issue, each node is having 96GB of RAM, and the shared memory and huge pages settings is as following:
kernel.shmmni = 4096
kernel.shmmax = 50682953728
kernel.shmall = 25165824
vm.nr_hugepages = 23067
Total SGA and PGA is as following:
SGA Total for all DBs | PGA Total for all DBs
|
---|
| |
This is the output 'grep Huge /proc/meminfo' on one of the nodes:
HugePages_Total: 23067
HugePages_Free: 1063
HugePages_Rsvd: 1041
Hugepagesize: 2048 kB
My concerns are:
1. Why huge pages is not being used? are the above calculations for PGA and SGA should be considered in order to set hugepages?
2. Why oracle processes consuming all server resources while I made sure it doesn't cross 10% of the system memory? and also how to correct this if my calculations are wrong?
3. Why in orachk tool it multiple the PGA_AGGREGATE_TARGET by 3? and what would be the right calculations for PGA if this is really overestimated for the server available resources?
Many Thanks in advance for your helping into this as it causes multiple node outage and restarts...