I have an Oracle 11.2.0.2, Linux 64 bit 3-node RAC system that has highly concurrent application, which inserts into a 32 way hash partitioned table with a locally partitioned primary key index and also has a securefile lob which is partitioned and stored in another tablespace with large uniform extents. The chunk size is 32K
on lob, the data seems to averaging around ~48K with ~10K standard deviation, so I expect each lob to be stored in around 1-3 chunks..
The problem is the concurrent inserts, although I see CPU is fully utilized but I see that there is serialization happening in Oracle server and concurrency issues are the root cause and it is not maximizing CPU to do the real inserts. May be someone can shed some light to improve the concurrency issue.
I see a lots of concurrency waits in top-5 events of 1 hour report.
DB Name DB Id Instance Inst Num Startup Time Release RAC
------------ ----------- ------------ -------- --------------- ----------- ---
TESTDB 1115683089 TESTDB1 1 11-Jun-12 15:52 11.2.0.2.0 YES
Host Name Platform CPUs Cores Sockets Memory(GB)
---------------- -------------------------------- ---- ----- ------- ----------
test000001 Linux x86 64-bit 24 12 2 70.73
Snap Id Snap Time Sessions Curs/Sess
--------- ------------------- -------- ---------
Begin Snap: 7739 22-Jun-12 22:00:02 658 9.3
End Snap: 7740 22-Jun-12 23:00:06 655 9.3
Elapsed: 60.06 (mins)
DB Time: 13,464.52 (mins)
Cache Sizes Begin End
~~~~~~~~~~~ ---------- ----------
Buffer Cache: 28,992M 28,992M Std Block Size: 8K
Shared Pool Size: 3,328M 3,328M Log Buffer: 44,608K
Load Profile Per Second Per Transaction Per Exec Per Call
~~~~~~~~~~~~ --------------- --------------- ---------- ----------
DB Time(s): 224.2 2.0 0.56 0.78
DB CPU(s): 12.1 0.1 0.03 0.04
Redo size: 2,020,373.1 17,651.4
Logical reads: 486,776.0 4,252.8
Block changes: 2,076.8 18.1
Physical reads: 292.6 2.6
Physical writes: 461.7 4.0
User calls: 288.5 2.5
Parses: 152.1 1.3
Hard parses: 0.3 0.0
W/A MB processed: 1.4 0.0
Logons: 0.2 0.0
Executes: 398.4 3.5
Rollbacks: 64.7 0.6
Transactions: 114.5
Instance Efficiency Percentages (Target 100%)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Buffer Nowait %: 99.93 Redo NoWait %: 100.00
Buffer Hit %: 99.98 In-memory Sort %: 100.00
Library Hit %: 99.33 Soft Parse %: 99.79
Execute to Parse %: 61.81 Latch Hit %: 71.67
Parse CPU to Parse Elapsd %: 18.05 % Non-Parse CPU: 99.95
Shared Pool Statistics Begin End
------ ------
Memory Usage %: 78.32 78.52
% SQL with executions>1: 90.66 91.65
% Memory for SQL w/exec>1: 96.02 95.74
Top 5 Timed Foreground Events
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Avg
wait % DB
Event Waits Time(s) (ms) time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
latch: row cache objects 7,939,816 520,984 66 64.5 Concurrenc
buffer busy waits 524,647 87,735 167 10.9 Concurrenc
gc current block busy 967,786 65,631 68 8.1 Cluster
gc buffer busy release 309,371 46,632 151 5.8 Cluster
DB CPU 43,556 5.4
Host CPU (CPUs: 24 Cores: 12 Sockets: 2)
~~~~~~~~ Load Average
Begin End %User %System %WIO %Idle
--------- --------- --------- --------- --------- ---------
21.24 21.54 54.7 2.7 2.5 41.8
Instance CPU
~~~~~~~~~~~~
% of total CPU for Instance: 53.7
% of busy CPU for Instance: 92.3
%DB time waiting for CPU - Resource Mgr: 0.0
Memory Statistics
~~~~~~~~~~~~~~~~~ Begin End
Host Mem (MB): 72,429.2 72,429.2
SGA use (MB): 32,768.0 32,768.0
PGA use (MB): 6,148.7 6,143.4
% Host Mem used for SGA+PGA: 53.73 53.72
latch: row cache objects has 7,939,816 waits...
Looking further down at the foreground waits..
Foreground Wait Events DB/Inst: TESTDB/TESTDB1 Snaps: 7739-7740
-> s - second, ms - millisecond - 1000th of a second
-> Only events with Total Wait Time (s) >= .001 are shown
-> ordered by wait time desc, waits desc (idle events last)
-> %Timeouts: value of 0 indicates value was < .5%. Value of null is truly 0
Avg
%Time Total Wait wait Waits % DB
Event Waits -outs Time (s) (ms) /txn time
-------------------------- ------------ ----- ---------- ------- -------- ------
latch: row cache objects 7,939,816 0 520,984 66 19.2 64.5
buffer busy waits 524,647 0 87,735 167 1.3 10.9
gc current block busy 967,786 0 65,631 68 2.3 8.1
gc buffer busy release 309,371 0 46,632 151 0.7 5.8
gc buffer busy acquire 367,004 0 38,030 104 0.9 4.7
Looking at the SQL stats
SQL ordered by Elapsed Time DB/Inst: TESTDB/TESTDB1 Snaps: 7739-7740
-> Resources reported for PL/SQL code includes the resources used by all SQL
statements called by the code.
-> % Total DB Time is the Elapsed Time of the SQL statement divided
into the Total Database Time multiplied by 100
-> %Total - Elapsed Time as a percentage of Total DB time
-> %CPU - CPU Time as a percentage of Elapsed Time
-> %IO - User I/O Time as a percentage of Elapsed Time
-> Captured SQL account for 93.7% of Total DB Time (s): 807,871
-> Captured PL/SQL account for 98.9% of Total DB Time (s): 807,871
Elapsed Elapsed Time
Time (s) Executions per Exec (s) %Total %CPU %IO SQL Id
---------------- -------------- ------------- ------ ------ ------ -------------
795,535.5 83,733 9.50 98.5 5.2 .0 78q2d6rnnf919
Module: JDBC Thin Client
BEGIN ALERTS_R30.saveResultData(:1 ,:2 ); END;
752,602.4 63,707 11.81 93.2 5.3 .0 79uwb2a8br53h
Module: JDBC Thin Client
INSERT INTO resultsqueue (WORKPACKAGEID,ALERTTYPE,RESULTDATA,PROCESSORID,PROCESS
DATETIME,STATUS) VALUES (:1, :2, :3, :4, :5, :6)
The first SQL_id is the package and the second SQL_ID is the insert statement where the CLOB is.
Now.. looking at the latch statistics
Pct Avg Wait Pct
Get Get Slps Time NoWait NoWait
Latch Activity DB/Inst: TESTDB/TESTDB1 Snaps: 7739-7740
-> "Get Requests", "Pct Get Miss" and "Avg Slps/Miss" are statistics for
willing-to-wait latch get requests
-> "NoWait Requests", "Pct NoWait Miss" are for no-wait latch get requests
-> "Pct Misses" for both should be very close to 0.0
Latch Name Requests Miss /Miss (s) Requests Miss
------------------------ -------------- ------ ------ ------ ------------ ------
row cache objects 2,276,658,477 92.9 0.0 5.E+05 316 0.0
.....
Latch Sleep Breakdown DB/Inst: TESTDB/TESTDB1 Snaps: 7739-7740
-> ordered by misses desc
Get Spin
Latch Name Requests Misses Sleeps Gets
-------------------------- --------------- ------------ ----------- -----------
row cache objects 2,276,658,477 2.114073E+09 8,264,734 2.10608E+09
cache buffers chains 3,543,575,814 7,458,551 20,390 7,440,021
undo global data 1,289,959,428 3,141,288 10,318 3,131,531
....
Latch Miss Sources DB/Inst: TESTDB/TESTDB1 Snaps: 7739-7740
-> only latches with sleeps are shown
-> ordered by name, sleeps desc
...
row cache objects kqrpre: find obj 0 3,318,282 3.02E+06
row cache objects kqreqd: reget 0 2,732,761 2.59E+06
row cache objects kqreqd 0 2,206,019 2.65E+06
....
Looking at the dictionary cache..
Dictionary Cache Stats DB/Inst: TESTDB/TESTDB1 Snaps: 7739-7740
-> "Pct Misses" should be very low (< 2% in most cases)
-> "Final Usage" is the number of cache entries being used
Get Pct Scan Pct Mod Final
Cache Requests Miss Reqs Miss Reqs Usage
------------------------- ------------ ------ ------- ----- -------- ----------
dc_awr_control 74 2.7 0 N/A 0 1
dc_global_oids 373,873 0.0 0 N/A 0 39
dc_histogram_data 23,409 1.6 0 N/A 0 2,378
dc_histogram_defs 108,585 0.3 0 N/A 0 2,590
dc_object_grants 1,254 0.0 0 N/A 0 253
dc_objects 409,792 0.1 0 N/A 4 3,451
dc_profiles 28 0.0 0 N/A 0 1
dc_rollback_segments 755,770,769 0.0 0 N/A 0 1,008
dc_segments 5,563 15.6 0 N/A 26 216
dc_sequences 584 87.0 0 N/A 584 3
dc_tablespaces 238,945 0.0 0 N/A 0 23
dc_users 618,442 0.0 0 N/A 0 54
global database name 5,504 0.0 0 N/A 0 1
outstanding_alerts 130 83.1 0 N/A 0 29
sch_lj_oids 4 0.0 0 N/A 0 11
-------------------------------------------------------------
Dictionary Cache Stats (RAC) DB/Inst: TESTDB/TESTDB1 Snaps: 7739-7740
GES GES GES
Cache Requests Conflicts Releases
------------------------- ------------ ------------ ------------
dc_awr_control 2 2 0
dc_global_oids 47 0 47
dc_histogram_defs 323 836 0
dc_objects 46 65 0
dc_segments 921 38 852
dc_sequences 1,167 506 2
dc_users 60 0 57
outstanding_alerts 216 108 0
-------------------------------------------------------------
so it appears that dc_sequences and dc_segments are high and so is outstanding_alerts
Using Tanel Podar utility with other method.. I see latch being the problem...
SQL> @latchprofx sid,name,func,hmode % "row cache objects" 100000
-- LatchProfX 2.01 by Tanel Poder ( http://www.tanelpoder.com )
SID NAME FUNC HMODE Held Gets Held % Held ms Avg hold ms
---------- ----------------------------------- ---------------------------------------- ------------ ---------- ---------- ------- ----------- -----------
844 row cache objects kqrpre: find obj exclusive 262 187 .26 34.532 .185
1039 row cache objects kqrpre: find obj exclusive 234 234 .23 30.841 .132
1101 row cache objects kqrpre: find obj exclusive 223 220 .22 29.391 .134
1374 row cache objects kqrpre: find obj exclusive 219 210 .22 28.864 .137
799 row cache objects kqrpre: find obj exclusive 219 214 .22 28.864 .135
669 row cache objects kqrpre: find obj exclusive 214 214 .21 28.205 .132
599 row cache objects kqrpre: find obj exclusive 209 209 .21 27.546 .132
68 row cache objects kqrpre: find obj exclusive 206 204 .21 27.151 .133
533 row cache objects kqrpre: find obj exclusive 198 139 .20 26.096 .188
914 row cache objects kqreqd exclusive 194 115 .19 25.569 .222
610 row cache objects kqrpre: find obj exclusive 193 191 .19 25.437 .133
522 row cache objects kqrpre: find obj exclusive 189 185 .19 24.910 .135
1092 row cache objects kqrpre: find obj exclusive 189 186 .19 24.910 .134
836 row cache objects kqrpre: find obj exclusive 189 189 .19 24.910 .132
1174 row cache objects kqrpre: find obj exclusive 186 185 .19 24.515 .133
333 row cache objects kqrpre: find obj exclusive 185 184 .19 24.383 .133
79 row cache objects kqrpre: find obj exclusive 185 185 .19 24.383 .132
1040 row cache objects kqrpre: find obj exclusive 185 185 .19 24.383 .132
864 row cache objects kqrpre: find obj exclusive 185 182 .19 24.383 .134
791 row cache objects kqrpre: find obj exclusive 183 179 .18 24.119 .135
SQL> @latchprofx sid,name,sqlid,laddr % "row cache objects" 100000
-- LatchProfX 2.01 by Tanel Poder ( http://www.tanelpoder.com )
SID NAME SQLID LADDR Held Gets Held % Held ms Avg hold ms
---------- ----------------------------------- ------------- ---------------- ---------- ---------- ------- ----------- -----------
199 row cache objects 79uwb2a8br53h 000000081B85AB10 1190 1188 1.19 91.749 .077
1413 row cache objects 79uwb2a8br53h 000000081B85AB10 718 717 .72 55.358 .077
594 row cache objects 79uwb2a8br53h 000000081B85AB10 662 662 .66 51.040 .077
456 row cache objects 79uwb2a8br53h 000000081B85AB10 636 635 .64 49.036 .077
86 row cache objects 79uwb2a8br53h 000000081B85AB10 515 515 .52 39.707 .077
207 row cache objects 79uwb2a8br53h 000000081B85AB10 482 480 .48 37.162 .077
1302 row cache objects 79uwb2a8br53h 000000081B85AB10 469 469 .47 36.160 .077
991 row cache objects 79uwb2a8br53h 000000081B85AB10 414 414 .41 31.919 .077
94 row cache objects 79uwb2a8br53h 000000081B85AB10 406 405 .41 31.303 .077
224 row cache objects 79uwb2a8br53h 000000081B85AB10 405 405 .41 31.226 .077
993 row cache objects 79uwb2a8br53h 000000081B85AB10 395 395 .40 30.455 .077
1314 row cache objects 79uwb2a8br53h 000000081B85AB10 393 393 .39 30.300 .077
653 row cache objects 79uwb2a8br53h 000000081B85AB10 385 385 .39 29.684 .077
1052 row cache objects 79uwb2a8br53h 000000081B85AB10 384 384 .38 29.606 .077
526 row cache objects 79uwb2a8br53h 000000081B85AB10 381 381 .38 29.375 .077
984 row cache objects 79uwb2a8br53h 000000081B85AB10 374 374 .37 28.835 .077
522 row cache objects 79uwb2a8br53h 000000081B85AB10 371 371 .37 28.604 .077
537 row cache objects 79uwb2a8br53h 000000081B85AB10 369 369 .37 28.450 .077
138 row cache objects 79uwb2a8br53h 000000081B85AB10 362 358 .36 27.910 .078
1222 row cache objects 79uwb2a8br53h 000000081B85AB10 349 347 .35 26.908 .078
SQL> select * from v$latch_children where addr='000000081B85AB10';
ADDR LATCH# CHILD# LEVEL# NAME HASH GETS MISSES SLEEPS IMMEDIATE_GETS IMMEDIATE_MISSES WAITERS_WOKEN WAITS_HOLDING_LATCH SPIN_GETS SLEEP1 SLEEP2 SLEEP3 SLEEP4 SLEEP5 SLEEP6 SLEEP7 SLEEP8 SLEEP9 SLEEP10 SLEEP11 WAIT_TIME
---------------- ---------- ---------- ---------- ---------------------------------------------------------------- ---------- ---------- ---------- ---------- -------------- ---------------- ------------- ------------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
000000081B85AB10 279 1 4 row cache objects 2412510220 1159017924 259092043 80873286 16 0 0 0 186219488 0 0 0 0 0 0 0 0 0 0 0 2.0725E+11
So.. my question is that it appears that there is a major latch issue, row cache objects being in particular for specific insert. There is sequence involved in that insert, and it has a very large cache size 300K and it is a 3-node RAC,the sequence was created as NOORDER. From here I'm bit lost and I do not know what the issue is.
I can see that a specific latch address is a problem, I do not not know what that latch (i.e. which specific sequence?) is protecting and where and why is the serialization occurring... dc_sequences being very high miss, I do not understand from here, how to diagnose the root-cause and what steps to improve the concurrency.
Any help with resolving this issue or help with understanding "kqrpre: find ob", "kqreqd: reget", "kqreqd" are much appreciated.