I'm trying to troubleshoot a performance issue and I'm having trouble interpreting the STATSPACK report. It seems like the STATSPACK report is missing information that I expect to be there. I'll explain below.
Header
STATSPACK report for
Database DB Id Instance Inst Num Startup Time Release RAC
~~~~~~~~ ----------- ------------ -------- --------------- ----------- ---
2636235846 testdb 1 30-Jan-11 16:10 11.2.0.2.0 NO
Host Name Platform CPUs Cores Sockets Memory (G)
~~~~ ---------------- ---------------------- ----- ----- ------- ------------
TEST Microsoft Windows IA ( 4 2 0 3.4
Snapshot Snap Id Snap Time Sessions Curs/Sess Comment
~~~~~~~~ ---------- ------------------ -------- --------- ------------------
Begin Snap: 3427 01-Feb-11 06:40:00 65 4.4
End Snap: 3428 01-Feb-11 07:00:00 66 4.1
Elapsed: 20.00 (mins) Av Act Sess: 7.3
DB time: 146.39 (mins) DB CPU: 8.27 (mins)
Cache Sizes Begin End
~~~~~~~~~~~ ---------- ----------
Buffer Cache: 192M 176M Std Block Size: 8K
Shared Pool: 396M 412M Log Buffer: 10,848K
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ ------------------ ----------------- ----------- -----------
DB time(s): 7.3 2.0 0.06 0.04
DB CPU(s): 0.4 0.1 0.00 0.00
Redo size: 6,366.0 1,722.1
Logical reads: 1,114.6 301.5
Block changes: 35.8 9.7
Physical reads: 44.9 12.1
Physical writes: 1.5 0.4
User calls: 192.2 52.0
Parses: 101.5 27.5
Hard parses: 3.6 1.0
W/A MB processed: 0.1 0.0
Logons: 0.1 0.0
Executes: 115.1 31.1
Rollbacks: 0.0 0.0
Transactions: 3.7
As you can see a significant amount of time was spent in database calls (DB Time) with relatively little time on CPU (DB CPU). Initially that made me think there were some significant wait events.
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time
----------------------------------------- ------------ ----------- ------ ------
log file sequential read 48,166 681 14 7.9
CPU time 484 5.6
db file sequential read 35,357 205 6 2.4
control file sequential read 50,747 23 0 .3
Disk file operations I/O 16,518 18 1 .2
-------------------------------------------------------------
However, looking at the Top 5 Timed Events I don't see anything out of the ordinary given my normal operations. the log file sequential read may be a little slow but it doesn't make up a significant portion of the execution time.
Based on an Excel/VB spreadsheet I wrote, which converts STATSPACK data to graphical form, I suspected that there was a wait event not listed here. So I decided to query the data directly. Here is the query and result.
SQL> SELECT wait_class
2 , event
3 , delta/POWER(10,6) AS delta_sec
4 FROM
5 (
6 SELECT syev.snap_id
7 , evna.wait_class
8 , syev.event
9 , syev.time_waited_micro
10 , syev.time_waited_micro - LAG(syev.time_waited_micro) OVER (PARTITION BY syev.event ORDER BY syev.snap_id) AS delta
11 FROM perfstat.stats$system_event syev
12 JOIN v$event_name evna ON evna.name = syev.event
13 WHERE syev.snap_id IN (3427,3428)
14 )
15 WHERE delta > 0
16 ORDER BY delta DESC
17 ;
?WAIT_CLASS EVENT DELTA_SEC
------------------------- --------------------------------------------------------------------------- ----------
Idle SQL*Net message from client 21169.742
Idle rdbms ipc message 19708.390
Application enq: TM - contention 7199.819
Idle Space Manager: slave idle wait 3001.719
Idle DIAG idle wait 2382.943
Idle jobq slave wait 1258.829
Idle smon timer 1220.902
Idle Streams AQ: qmn coordinator idle wait 1204.648
Idle Streams AQ: qmn slave idle wait 1204.637
Idle pmon timer 1197.898
Idle Streams AQ: waiting for messages in the queue 1197.484
Idle Streams AQ: waiting for time management or cleanup tasks 791.803
System I/O log file sequential read 681.444
User I/O db file sequential read 204.721
System I/O control file sequential read 23.168
User I/O Disk file operations I/O 17.737
User I/O db file parallel read 14.536
System I/O log file parallel write 7.618
Commit log file sync 7.150
User I/O db file scattered read 3.488
Idle SGA: MMAN sleep for component shrink 2.461
User I/O direct path read 1.621
Other process diagnostic dump 1.418
... snip ...
So based on the above it looks like there was a significant amount of time spent in
enq: TM - contention
Question 1
Why does this wait event not show up in the
Top 5 Timed Events section? Note that this wait event is also not listed in any of the other wait events sections either.
Moving on, I decided to look at the Time Model Statistics
Time Model System Stats DB/Inst: testdb /testdb Snaps: 3427-3428
-> Ordered by % of DB time desc, Statistic name
Statistic Time (s) % DB time
----------------------------------- -------------------- ---------
sql execute elapsed time 8,731.0 99.4
PL/SQL execution elapsed time 1,201.1 13.7
DB CPU 496.3 5.7
parse time elapsed 26.4 .3
hard parse elapsed time 21.1 .2
PL/SQL compilation elapsed time 2.8 .0
connection management call elapsed 0.6 .0
hard parse (bind mismatch) elapsed 0.5 .0
hard parse (sharing criteria) elaps 0.5 .0
failed parse elapsed time 0.0 .0
repeated bind elapsed time 0.0 .0
sequence load elapsed time 0.0 .0
DB time 8,783.2
background elapsed time 87.1
background cpu time 2.4
Great, so it looks like I spent >99% of DB Time in SQL calls. I decided to scroll to the
SQL ordered by Elapsed time section. The header information surprised me.
SQL ordered by Elapsed time for DB: testdb Instance: testdb Snaps: 3427 -3
-> Total DB Time (s): 8,783
-> Captured SQL accounts for 4.1% of Total DB Time
-> SQL reported below exceeded 1.0% of Total DB Time
If I'm spending > 99% of my time in SQL, I would have expected the captured % to be higher.
Question 2
Am I correct in assuming that a long running SQL that started before the first snap and is still running at the end of the second snap would not display in this section?
Question 3
Would that answer my wait event question above? Ala, are wait events not reported until the action that is waiting (execution of a SQL statement for example) is complete?
So I looked a few snaps past what I have posted here. I still haven't determined why the
enq: TM - contention wait is not displayed anywhere in the STATSPACK reports. I did end up finding an interesting PL/SQL block that may have been causing the issues. Here is the
SQL ordered by Elapsed time for a snapshot that was taken an hour after the one I posted.
SQL ordered by Elapsed time for DB: testdb Instance: testdb Snaps: 3431 -3
-> Total DB Time (s): 1,088
-> Captured SQL accounts for ######% of Total DB Time
-> SQL reported below exceeded 1.0% of Total DB Time
Elapsed Elap per CPU Old
Time (s) Executions Exec (s) %Total Time (s) Physical Reads Hash Value
---------- ------------ ---------- ------ ---------- --------------- ----------
26492.65 29 913.54 ###### 1539.34 480 1013630726
Module: OEM.CacheModeWaitPool
BEGIN EMDW_LOG.set_context(MGMT_JOB_ENGINE.MODULE_NAME, :1); BEG
IN MGMT_JOB_ENGINE.process_wait_step(:2);END; EMDW_LOG.set_conte
xt; END;
I'm still not sure if this is the problem child or not.
I just wanted to post this to get your thoughts on how I correctly/incorrectly attacked this problem and to see if you can fill in any gaps in my understanding.
Thanks!