We have an Exadata system running 12.1.1.1.1.140712 and Kernel 2.6.39-400.128.17.el5uek.
The system has 256G of RAM and is using HugePages (75% allocated). All SGAs are loaded into HugePages with plenty of free pages.
There is currently 28G of free memory
[root@ep01dbadm04 ~]# uname -a
Linux ep01dbadm04.uhc.com 2.6.39-400.128.17.el5uek #1 SMP Tue May 27 13:20:24 PDT 2014 x86_64 x86_64 x86_64 GNU/Linux
[root@ep01dbadm04 ~]# free -g
total used free shared buffers cached
Mem: 251 223 28 0 0 0
-/+ buffers/cache: 223 28
Swap: 99 23 76
The system does not appear to have its memory overly fragmented:
Node 0 zone Normal 573600 1669228 613332 83075 2155 | 2 | 0 | 0 | 0 | 0 | 1 |
The issue we are running into is the amount of swapping that is going on. Typically with a system having over 28G of free memory to work with, I wouldn't expect kswapd0 (80-100% CPU at any one time) to be high on the processes list. However, over the last few days the system has been in this state with heavy swapping going on and a number of processes in the blocked state.
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
4 1 24316664 29445740 2904 703880 2 1 13 53 0 0 10 3 87 0 0
4 1 24316220 29444592 2672 724244 3845 0 21593 1169 90571 112767 8 5 83 4 0
7 3 24315500 29490760 2972 719060 5157 0 13822 1169 87476 109841 7 4 85 5 0
5 2 24313852 29472144 2944 723464 5694 0 15809 982 85435 112939 7 5 85 4 0
5 3 24311228 29428164 2756 725768 6191 0 20837 1442 85267 107464 6 4 85 4 0
5 6 24309760 29402600 2776 723408 5496 0 24201 1377 88959 106011 9 4 82 5 0
5 5 24306572 29422064 3072 738040 5342 0 28700 1693 90457 111352 9 4 81 6 0
6 12 24301608 29370704 4444 773704 5285 0 35799 2575 91987 110721 11 4 76 8 0
10 14 24297292 29356968 22068 798588 4899 0 43248 16264 115554 128268 14 4 70 12 0
8 0 24293688 29265484 3976 809824 6624 0 25754 1626 97261 115929 13 5 76 6 0
[root@ep01dbadm04 ~]# vmstat 5 10
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------
r b swpd free buff cache si so bi bo in cs us sy id wa st
5 1 24217708 29346084 5760 736912 2 1 13 53 0 0 10 3 87 0 0
3 3 24217436 29292988 5364 776232 385 0 24944 1169 98311 118919 8 5 83 4 0
21 1 24216348 29307336 5832 813984 1613 4 19093 1282 108541 125906 9 5 83 3 0
8 1 24215948 29318536 38564 729608 851 7 12685 16330 111150 131518 10 6 82 3 0
20 1 24215520 29360400 6384 758028 782 16 17300 1146 97840 118925 8 4 84 3 0
7 3 24215400 29416444 5824 741388 602 25 17230 1067 100370 120148 9 5 82 4 0
7 9 24215172 29291040 5912 769784 1167 29 30114 1248 118197 132910 9 5 74 12 0
7 7 24214228 29368572 6840 784816 1791 42 18962 1036 108624 121779 11 6 74 9 0
7 5 24213968 29465484 4744 731472 839 62 14943 1179 99566 115793 9 4 83 4 0
8 5 24213744 29380736 4264 772592 782 76 34939 1687 104635 121384 13 6 75 6 0
I know the Linux kernel is pretty smart, and don't want to go modifying the swappiness factor as we typically don't have this problem on other systems. I want to know why it's happening.
This is affecting performance on our systems as you would expect.
Questions:
1) What could be the possible cause of this issue?
2) Could a memory leak cause the memory to appear free but really be in use? (Bug 18508710)
2.1) If so, is there a way to show the is the issue?
Other thoughts or opinions are more than welcome. If you need any other information I can certainly provide it.
-Chris