Hi,
I have high system cpu usage on my Solaris 10 server and waht to know why this is so.
===The Stats
The machine is a M9000 domain with 24 cores and 11 zones.
The memory is 82GB. the zones run Oracle DB (11g).
This is a sample of the vmstat I am getting:
kthr memory page disk faults cpu
r b w swap free re mf pi po fr de sr m7 m7 m7 m7 in sy cs us sy id
28 5 0 5147248 9110808 5005 7865 102594 5 5 0 0 0 0 0 0 13585 27718 15681 11 74 15
2 0 0 5080616 9030024 1652 5701 48861 11 11 0 0 0 0 0 0 7729 24492 7340 10 63 27
8 2 0 5236464 9120752 3861 7866 92762 6 6 0 0 0 0 0 0 16634 36049 18052 12 36 52
1 0 0 5219256 9082256 1948 6755 48082 3 3 0 0 0 0 0 0 7852 27991 7793 7 41 53
As you can see %sys is up to 74% busy!
I do not think that it is a memory problem as the amount of memory used is less than amount of real memoey:
/usr/sbin/swap -s
total: 44016440k bytes allocated + 18565912k reserved = 62582352k used, 5215816k available
mpstat is showing most of the cpus are in %sys time:
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl
0 541 91 2435 490 23 522 189 151 596 1 1155 43 49 0 8
1 480 127 3737 436 41 730 214 196 731 1 2086 22 57 0 21
2 415 152 4105 498 64 827 203 200 2462 0 1601 23 53 0 24
3 465 112 5855 3750 3496 670 191 162 2511 1 1305 20 60 0 20
4 663 112 4132 418 45 688 164 170 807 1 1781 21 53 0 26
5 537 149 3464 436 46 710 201 154 662 2 1894 26 58 0 16
6 323 185 3688 483 59 801 157 171 617 2 2350 29 44 0 28
7 344 119 4708 393 42 642 176 158 671 1 1993 20 62 0 18
40 823 149 4816 486 52 882 216 310 814 1 2189 16 54 0 30
41 745 121 3950 380 33 696 147 231 685 2 1914 16 53 0 31
42 459 165 3513 511 48 987 249 329 893 1 2234 18 51 0 30
43 450 164 5111 466 42 883 244 294 732 2 2050 23 48 0 30
44 393 135 5088 412 44 775 196 281 798 1 1769 17 50 0 34
45 610 167 4558 505 50 941 245 301 782 1 3739 21 48 0 31
46 474 175 2501 520 53 993 223 316 667 1 2392 23 41 0 37
47 494 238 3905 545 40 1135 257 340 819 0 2702 27 45 0 28
64 615 97 2426 332 25 615 217 158 765 1 1727 23 63 0 14
65 267 141 2714 979 609 634 179 131 560 0 1897 18 58 0 24
66 726 170 1773 553 159 730 208 141 600 0 4284 23 61 0 16
67 343 123 2166 396 39 687 192 150 659 1 1767 21 46 0 32
68 309 117 2223 353 32 604 207 131 530 2 1478 35 41 0 25
69 323 183 1494 396 35 732 190 128 571 1 1367 23 52 0 25
70 534 130 1031 356 36 655 203 129 604 1 1823 23 53 0 25
71 516 77 2216 258 28 437 97 84 491 0 6410 32 43 0 25
Doing a intrstat:
device | cpu46 %tim cpu47 %tim cpu64 %tim cpu65 %tim cpu66 %tim cpu67 %tim cpu68 %tim
-------------+---------------------------------------------------------------------------------------------------------
bge#0 | 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0
mpt#0 | 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0
mpt#1 | 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0
nxge#0 | 0 0.0 0 0.0 0 0.0 1381 6.5 160 0.8 0 0.0 0 0.0
nxge#4 | 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0
qlc#0 | 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0
qlc#2 | 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0
device | cpu0 %tim cpu1 %tim cpu2 %tim cpu3 %tim cpu4 %tim cpu5 %tim cpu6 %tim
-------------+---------------------------------------------------------------------------------------------------------
bge#0 | 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0
mpt#0 | 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0
mpt#1 | 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0
nxge#0 | 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0 0 0.0
nxge#4 | 0 0.0 0 0.0 1 0.0 0 0.0 0 0.0 0 0.0 0 0.0
qlc#0 | 0 0.0 0 0.0 0 0.0 2072 6.4 0 0.0 0 0.0 0 0.0
qlc#2 | 0 0.0 0 0.0 0 0.0 2072 6.3 0 0.0 0 0.0 0 0.0
=== The Question
The QUESTION is does the nxge (Ethernet card) or qlc(FC card) account for the high sys cpu usage?
The high qlc usage can come from DB accesses but should the nxge be so high or is it a faulty card?
Thanks for any help.