Weird CPU overload spikes on solaris 8 (UDP messaging)
I am facing a strange problem of CPU spikes every 30 mins. This is
a sun fire V210 machine (with 2 processors) and solaris 8. (5.8 Generic_117350-26 sun4u sparc SUNW,Sun-Fire-V210)
I am running simple client-server (c++) applications that exchange messages over UDP (around 1000 messages per sec). Most of the time the application (per top/prstat) takes hardly 0.5 - 2% cpu, but during spikes the cpu for the process goes up to 78% and recovers from spike in 3-4 mins.
I tried monitoring with prstat -Lm 1 but that shows me every thread of
the application pretty much consuming consistent cpu (user and
system). (even during the spike)
i am unable to catch the culprit. i do see the prstat -m shows ICX
(involuntary context switches) for application threads incrasing
during the spike and so does the latency (it goes from 0.0 to 4.5).
i wonder if this has something to do with UDP, i read a similar post on this forum.
i have the lockstat for kernel profiling during the spike as below:
==
Profiling interrupt: 11639 events in 59.996 seconds (194 events/sec)
Count genr cuml rcnt nsec Hottest CPU+PIL Caller
--------------------------------------------------------------------------- �----
11643 100% ---- 1.00 874 cpu0 current_thread
11639 100% ---- 1.00 875 cpu0 lockstat_intr
11639 100% ---- 1.00 875 cpu0 cyclic_fire
11639 100% ---- 1.00 875 cpu0 cbe_level14
10957 94% ---- 1.00 878 cpu0 idle
10451 90% ---- 1.00 899 cpu[0] disp_getwork
3452 30% ---- 1.00 903 cpu0 splx
2249 19% ---- 1.00 893 cpu0 splhigh
1078 9% ---- 1.00 821 cpu[0] (usermode)
408 4% ---- 1.00 957 cpu0 putnext
178 2% ---- 1.00 727 cpu[0]+6 bge_gld_intr
177 2% ---- 1.00 730 cpu[0]+6 pci_intr_wrapper
174 1% ---- 1.00 745 cpu[0]+6 gld_intr
163 1% ---- 1.00 966 cpu0 sendto32
157 1% ---- 1.00 965 cpu0 sendto
157 1% ---- 1.00 967 cpu0 sendit
148 1% ---- 1.00 990 cpu0 sosendmsg
142 1% ---- 1.00 985 cpu0 sosend_dgram
131 1% ---- 1.00 970 cpu0 kstrputmsg
130 1% ---- 1.00 777 cpu[0]+6 bge_receive
121 1% ---- 1.00 963 cpu0 strput
115 1% ---- 1.00 804 cpu0 lwp_sema_wait
106 1% ---- 1.00 1015 cpu0 recvfrom
106 1% ---- 1.00 986 cpu0 udp_wput
102 1% ---- 1.00 1024 cpu0 recvfrom32
101 1% ---- 1.00 1030 cpu0 recvit
99 1% ---- 1.00 957 cpu0 ip_wput
96 1% ---- 1.00 1005 cpu0 ip_wput_ire
95 1% ---- 1.00 733 cpu[0]+6 intr_thread
86 1% ---- 1.00 997 cpu0 sorecvmsg
83 1% ---- 1.00 798 cpu[0]+6 gld_recv
81 1% ---- 1.00 1032 cpu0 kstrgetmsg
74 1% ---- 1.00 755 cpu[0]+6 pci_dma_flush
73 1% ---- 1.00 794 cpu0 mutex_enter
71 1% ---- 1.00 831 cpu[0]+6 pci_pbm_dma_sync
71 1% ---- 1.00 843 cpu[0]+6 ip_rput
71 1% ---- 1.00 780 cpu0+0xb lwp_sema_post
67 1% ---- 1.00 819 cpu[0]+6 ip_rput_local
60 1% ---- 1.00 778 cpu0 post_syscall
59 1% ---- 1.00 946 cpu0 gld_start
58 0% ---- 1.00 929 cpu0 gld_wput
58 0% ---- 1.00 722 cpu0+0xb lwp_release
58 0% ---- 1.00 786 cpu0+0xb swtch
49 0% ---- 1.00 438 cpu0+0xb resumefrom_idle
47 0% ---- 1.00 690 cpu[0]+6 bge_receive_ring
46 0% ---- 1.00 707 cpu[0]+6 bge_receive_packet
44 0% ---- 1.00 964 cpu0 bge_gld_send
43 0% ---- 1.00 850 cpu0+0xb disp
37 0% ---- 1.00 686 cpu[0]+6 bcopy
37 0% ---- 1.00 908 cpu[0]+11 setfrontdq
36 0% ---- 1.00 954 cpu0 bge_send_copy
34 0% ---- 1.00 615 cpu0+0xb resume
33 0% ---- 1.00 836 cpu[0]+6 ip_ocsum
31 0% ---- 1.00 1023 cpu0 uiomove
29 0% ---- 1.00 980 cpu[0]+6 udp_rput
29 0% ---- 1.00 1163 cpu0 struiocopyout
27 0% ---- 1.00 923 cpu0 syscall_trap32
27 0% ---- 1.00 406 cpu0+0xb fp_restore
25 0% ---- 1.00 634 cpu[0] getsonode
23 0% ---- 1.00 1100 cpu0 kmem_cache_alloc
23 0% ---- 1.00 1073 cpu0 prstop
23 0% ---- 1.00 593 cpu0 utl0
21 0% ---- 1.00 914 cpu0 pre_syscall
20 0% ---- 1.00 1039 cpu0 default_copyout
19 0% ---- 1.00 809 cpu[0]+11 lwp_mutex_lock
18 0% ---- 1.00 1422 cpu[0]+6 strrput
18 0% ---- 1.00 1006 cpu0 new_mstate
17 0% ---- 1.00 722 cpu[0] ip_cksum
17 0% ---- 1.00 968 cpu[0] default_copyin
16 0% ---- 1.00 640 cpu0 ufs_scan_inodes
16 0% ---- 1.00 1234 cpu[0]+6 allocb
16 0% ---- 1.00 900 cpu0+0xb lwp_block
16 0% ---- 1.00 474 cpu[0] trap
15 0% ---- 1.00 582 cpu0 ufs_sync
15 0% ---- 1.00 582 cpu0 ufs_update
15 0% ---- 1.00 1209 cpu0 getq_noenab
15 0% ---- 1.00 848 cpu0 get_lwpchan
15 0% ---- 1.00 737 cpu0
lwpchan_get_mapping
15 0% ---- 1.00 582 cpu0 fsflush
15 0% ---- 1.00 1250 cpu0+0xb disp_ratify
14 0% ---- 1.00 1122 cpu0 dupb
14 0% ---- 1.00 1099 cpu[0]
flush_user_windows_to_stack
13 0% ---- 1.00 727 cpu[0]+6 bge_status_sync
13 0% ---- 1.00 848 cpu0 dupmsg
13 0% ---- 1.00 1161 cpu[0] getdents64
12 0% ---- 1.00 1244 cpu[0]
pr_readdir_procdir
12 0% ---- 1.00 845 cpu0 gethrtime
12 0% ---- 1.00 1147 cpu[0] getf
11 0% ---- 1.00 858 cpu0 fp_prsave
11 0% ---- 1.00 599 cpu[0] as_fault
11 0% ---- 1.00 1181 cpu[0] copyb
11 0% ---- 1.00 1397 cpu0 freemsg
11 0% ---- 1.00 599 cpu[0] segvn_fault
11 0% ---- 1.00 1397 cpu0 freemsg
11 0% ---- 1.00 599 cpu[0] segvn_fault
11 0% ---- 1.00 746 cpu[0] lwp_mutex_wakeup
10 0% ---- 1.00 842 cpu[0]+11 ts_wakeup
10 0% ---- 1.00 930 cpu[0]+6 bstore_commit_c
10 0% ---- 1.00 931 cpu0 fp_fksave
9 0% ---- 1.00 1055 cpu0
gld_interpret_ether
9 0% ---- 1.00 695 cpu[0] segvn_faultpage
9 0% ---- 1.00 626 cpu0+0xb thread_lock
9 0% ---- 1.00 721 cpu[0]+11 preempt
9 0% ---- 1.00 530 cpu0+0xb atomic_add_32
9 0% ---- 1.00 668 cpu[0] pagefault
8 0% ---- 1.00 1491 cpu0 putbq
8 0% ---- 1.00 866 cpu0 write
8 0% ---- 1.00 892 cpu[0]+6 canputnext
8 0% ---- 1.00 1289 cpu[0] mutex_exit
7 0% ---- 1.00 225 cpu[0]+6 bge_cfg_set32
7 0% ---- 1.00 863 cpu0 strmakedata
7 0% ---- 1.00 1523 cpu0 strget
7 0% ---- 1.00 615 cpu0 write32
7 0% ---- 1.00 1309 cpu[0] pid_entry
7 0% ---- 1.00 231 cpu0+0xb cv_wait_sig
7 0% ---- 1.00 1065 cpu0 save_syscall_args
7 0% ---- 1.00 237 cpu0+0xb trap_rtt
6 0% ---- 1.00 722 cpu0 bge_send_claim
6 0% ---- 1.00 978 cpu0 copyin_name
6 0% ---- 1.00 1106 cpu0 soallocproto2
6 0% ---- 1.00 1106 cpu0 soallocproto1
6 0% ---- 1.00 304 cpu[0] ufs_getpage
6 0% ---- 1.00 241 cpu0+0xb str_cv_wait
6 0% ---- 1.00 241 cpu0+0xb strwaitq
6 0% ---- 1.00 1503 cpu0 putback
6 0% ---- 1.00 1697 cpu[0]+6 putq
6 0% ---- 1.00 1362 cpu[0] copymsg
6 0% ---- 1.00 1160 cpu[0]+11
sleepq_wakeall_chan
6 0% ---- 1.00 697 cpu0+0xb restore_mstate
6 0% ---- 1.00 1160 cpu[0]+11 cv_broadcast
6 0% ---- 1.00 1034 cpu0+0xb disp_getbest
6
==
Any help/pointers are greatly appreciated.
thanks
rp.