Hi,
I have tested this query on my Test DB which is an exact duplicate of PROD in terms of configuration. Only difference is 2 things.
1. Test has slightly less data
2. db_writer_processes is set to 32 on my test and on my prod its set to 4
This job is stuck on this one query and its been stuck on there for over 24 hours...whereas on my test db it completes in under 2 hours. When I look at my AWR report, I am seeing the following:
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
PX Deq Credit: send blkd 959,256 1,876,663 1956 92.4 Other
CPU time 126,325 6.2
db file scattered read 29,068 505 17 0.0 User I/O
db file sequential read 145,547 215 1 0.0 User I/O
cursor: pin S wait on X 10,084 196 19 0.0 Concurrenc
-------------------------------------------------------------
Wait Events DB/Inst: VADP/VADP Snaps: 265-282
-> s - second
-> cs - centisecond - 100th of a second
-> ms - millisecond - 1000th of a second
-> us - microsecond - 1000000th of a second
-> ordered by wait time desc, waits desc (idle events last)
Avg
%Time Total Wait wait Waits
Event Waits -outs Time (s) (ms) /txn
---------------------------- -------------- ------ ----------- ------- ---------
PX Deq Credit: send blkd 959,256 100.7 1,876,663 1956 1,668.3
db file scattered read 29,068 .0 505 17 50.6
db file sequential read 145,547 .0 215 1 253.1
cursor: pin S wait on X 10,084 100.0 196 19 17.5
enq: BF - allocation content 292 62.0 178 611 0.5
control file parallel write 20,400 .0 39 2 35.5
log file parallel write 4,027 .0 33 8 7.0
log buffer space 71 19.7 16 221 0.1
db file parallel write 5,432 .0 10 2 9.4
os thread startup 39 .0 5 122 0.1
control file sequential read 35,222 .0 3 0 61.3
SQL*Net break/reset to clien 257 .0 2 7 0.4
latch free 22 .0 1 64 0.0
rdbms ipc reply 66 .0 1 19 0.1
log file sync 43 .0 1 14 0.1
kksfbc child completion 11 100.0 1 49 0.0
SQL*Net more data to client 3,741 .0 0 0 6.5
Data file init write 36 .0 0 7 0.1
log file switch completion 2 .0 0 67 0.0
LGWR wait for redo copy 477 .0 0 0 0.8
SQL*Net more data from clien 531 .0 0 0 0.9
direct path read temp 12,767 .0 0 0 22.2
latch: session allocation 34 .0 0 3 0.1
direct path write temp 12,834 .0 0 0 22.3
log file single write 4 .0 0 14 0.0
direct path read 4,573 .0 0 0 8.0
db file single write 56 .0 0 1 0.1
library cache load lock 1 .0 0 37 0.0
log file sequential read 4 .0 0 8 0.0
SQL*Net message to client 10,991 .0 0 0 19.1
reliable message 37 .0 0 0 0.1
PX Deq: Signal ACK 6 .0 0 1 0.0
enq: RO - fast object reuse 37 .0 0 0 0.1
db file parallel read 1 .0 0 8 0.0
read by other session 1 .0 0 5 0.0
enq: JS - queue lock 1 .0 0 3 0.0
latch: shared pool 1 .0 0 1 0.0
buffer busy waits 8 .0 0 0 0.0
enq: PS - contention 1 .0 0 1 0.0
direct path write 45 .0 0 0 0.1
cursor: mutex X 37 .0 0 0 0.1
PX qref latch 26 84.6 0 0 0.0
undo segment extension 5 100.0 0 0 0.0
cursor: pin S 2 .0 0 0 0.0
PX Deq: Table Q Normal 941,382 98.9 1,828,941 1943 1,637.2
SQL*Net message from client 10,987 .0 90,810 8265 19.1
PX Deq: Execute Reply 32,878 92.2 59,707 1816 57.2
virtual circuit status 2,039 100.0 59,517 29189 3.5
Streams AQ: qmn slave idle w 2,184 .0 59,457 27224 3.8
Streams AQ: qmn coordinator 4,374 50.1 59,457 13593 7.6
Streams AQ: waiting for time 14 100.0 32,743 ####### 0.0
PX Deq: Execution Msg 12,188 43.9 10,840 889 21.2
PX Idle Wait 131 48.1 204 1555 0.2
jobq slave wait 33 93.9 93 2807 0.1
PX Deq Credit: need buffer 77 .0 4 54 0.1
PX Deq: Parse Reply 8 12.5 3 393 0.0
PL/SQL lock timer 1 100.0 1 986 0.0
PX Deq: Msg Fragment 64 .0 0 1 0.1
class slave wait 36 .0 0 0 0.1
PX Deq: Join ACK 13 .0 0 0 0.0
-------------------------------------------------------------
From my ASH report I see the following as well related to the wait event above:
ASH Report For VADP/VADP
DB Name DB Id Instance Inst Num Release RAC Host
------------ ----------- ------------ -------- ----------- --- ------------
VADP 3238746855 VADP 1 10.2.0.4.0 NO whms3207
CPUs SGA Size Buffer Cache Shared Pool ASH Buffer Size
---- ------------------ ------------------ ------------------ ------------------
32 6,368M (100%) 4,160M (65.3%) 758M (11.9%) 50.0M (0.8%)
Analysis Begin Time: 19-Oct-09 16:00:00
Analysis End Time: 20-Oct-09 10:15:59
Elapsed Time: 1,096.0 (mins)
Sample Count: 217,894
Average Active Sessions: 33.14
Avg. Active Session per CPU: 1.04
Report Target: None specified
Top User Events DB/Inst: VADP/VADP (Oct 19 16:00 to 10:15)
Avg Active
Event Event Class % Activity Sessions
----------------------------------- --------------- ---------- ----------
PX Deq Credit: send blkd Other 92.78 30.74
CPU + Wait for CPU CPU 7.01 2.32
-------------------------------------------------------------
Top Background Events DB/Inst: VADP/VADP (Oct 19 16:00 to 10:15)
No data exists for this section of the report.
-------------------------------------------------------------
Top Event P1/P2/P3 Values DB/Inst: VADP/VADP (Oct 19 16:00 to 10:15)
Event % Event P1 Value, P2 Value, P3 Value % Activity
------------------------------ ------- ----------------------------- ----------
Parameter 1 Parameter 2 Parameter 3
-------------------------- -------------------------- --------------------------
PX Deq Credit: send blkd 92.74 "268501022","146","0" 0.06
sleeptime/senderid passes qref
-------------------------------------------------------------
Top Service/Module DB/Inst: VADP/VADP (Oct 19 16:00 to 10:15)
Service Module % Activity Action % Action
-------------- ------------------------ ---------- ------------------ ----------
SYS$USERS SQL*Plus 97.82 UNNAMED 97.82
toad.exe 1.96 UNNAMED 1.96
-------------------------------------------------------------
Top Client IDs DB/Inst: VADP/VADP (Oct 19 16:00 to 10:15)
No data exists for this section of the report.
-------------------------------------------------------------
Top SQL Command Types DB/Inst: VADP/VADP (Oct 19 16:00 to 10:15)
-> 'Distinct SQLIDs' is the count of the distinct number of SQLIDs
with the given SQL Command Type found over all the ASH samples
in the analysis period
Distinct Avg Active
SQL Command Type SQLIDs % Activity Sessions
---------------------------------------- ---------- ---------- ----------
SELECT 175 99.78 33.06
-------------------------------------------------------------
Top SQL Statements DB/Inst: VADP/VADP (Oct 19 16:00 to 10:15)
SQL ID Planhash % Activity Event % Event
------------- ----------- ---------- ------------------------------ ----------
813dx8yw93vrr 2672097588 61.62 PX Deq Credit: send blkd 58.93
select p.polno, p.GL_CURR, f.TO_CURR, f.eff_date, f.RATE, c.OP_MAIN_RES,
round(c.OP_MAIN_RES*f.RATE,2) as USD_OP_MAIN_RES, t.JH_ILI_RES_OPENING_METRIC_AM
T from ( select ILI_POL_EDW_ID,ILI_POL_NB from fdss_dli.EDW_DLI_ILI_P
OL where load_id=4 ) d, ( select ILI_POL_EDW_ID,ILI_COV_EDW_ID,ILI_COV_COD
CPU + Wait for CPU 2.68
f48gq65yfhrrx 2672097588 32.49 PX Deq Credit: send blkd 31.46
select p.polno, p.GL_CURR, f.TO_CURR, f.eff_date, f.RATE, c.CL_MAIN_RES,
round(c.CL_MAIN_RES*f.RATE,2) as USD_CL_MAIN_RES, t.JH_ILI_RES_CLOSING_METRIC_AM
T from ( select ILI_POL_EDW_ID,ILI_POL_NB from fdss_dli.EDW_DLI_ILI_P
OL where load_id=4 ) d, ( select ILI_POL_EDW_ID,ILI_COV_EDW_ID,ILI_COV_COD
CPU + Wait for CPU 1.02
813dx8yw93vrr 339408942 2.20 PX Deq Credit: send blkd 2.11
select p.polno, p.GL_CURR, f.TO_CURR, f.eff_date, f.RATE, c.OP_MAIN_RES,
round(c.OP_MAIN_RES*f.RATE,2) as USD_OP_MAIN_RES, t.JH_ILI_RES_OPENING_METRIC_AM
T from ( select ILI_POL_EDW_ID,ILI_POL_NB from fdss_dli.EDW_DLI_ILI_P
OL where load_id=4 ) d, ( select ILI_POL_EDW_ID,ILI_COV_EDW_ID,ILI_COV_COD
-------------------------------------------------------------
Top SQL using literals DB/Inst: VADP/VADP (Oct 19 16:00 to 10:15)
No data exists for this section of the report.
-------------------------------------------------------------
Top PL/SQL Procedures DB/Inst: VADP/VADP (Oct 19 16:00 to 10:15)
No data exists for this section of the report.
-------------------------------------------------------------
Top Sessions DB/Inst: VADP/VADP (Oct 19 16:00 to 10:15)
-> '# Samples Active' shows the number of ASH samples in which the session
was found waiting for that particular event. The percentage shown
in this column is calculated with respect to wall clock time
and not total database activity.
-> 'XIDs' shows the number of distinct transaction IDs sampled in ASH
when the session was waiting for that particular event
-> For sessions running Parallel Queries, this section will NOT aggregate
the PQ slave activity into the session issuing the PQ. Refer to
the 'Top Sessions running PQs' section for such statistics.
Sid, Serial# % Activity Event % Event
--------------- ---------- ------------------------------ ----------
User Program # Samples Active XIDs
-------------------- ------------------------------ ------------------ --------
1032, 148 1.85 PX Deq Credit: send blkd 1.85
MACQUAI oracle@whms3207 (P037) 4,023/6,576 [ 61%] 0
1039, 156 1.85 PX Deq Credit: send blkd 1.85
MACQUAI oracle@whms3207 (P048) 4,021/6,576 [ 61%] 0
1082, 149 1.85 PX Deq Credit: send blkd 1.85
MACQUAI oracle@whms3207 (P050) 4,021/6,576 [ 61%] 0
1037, 151 1.84 PX Deq Credit: send blkd 1.84
MACQUAI oracle@whms3207 (P053) 4,020/6,576 [ 61%] 0
1064, 149 1.84 PX Deq Credit: send blkd 1.84
MACQUAI oracle@whms3207 (P052) 4,020/6,576 [ 61%] 0
-------------------------------------------------------------
Top Blocking Sessions DB/Inst: VADP/VADP (Oct 19 16:00 to 10:15)
No data exists for this section of the report.
-------------------------------------------------------------
Top Sessions running PQs DB/Inst: VADP/VADP (Oct 19 16:00 to 10:15)
-> This section aggregates all the work done by the PQ slaves into
the session issuing the parallel query.
Sid,Srl# (Inst) % Activity SQL ID Event % Event
--------------- ---------- ------------- ----------------------------- --------
User Program
-------------------- ------------------------------
1022,UNKNOWN(1) 97.14 813dx8yw93vrr PX Deq Credit: send blkd 61.05
UserID:
select p.polno, p.GL_CURR, f.TO_CURR, f.eff_date, f.RATE, c.CL_MAIN_RES,
round(c.CL_MAIN_RES*f.RATE,2) as USD_CL_MAIN_RES, t.JH_ILI_RES_CLOSING_METRIC_AM
T from ( select ILI_POL_EDW_ID,ILI_POL_NB from fdss_dli.EDW_DLI_ILI_P
OL where load_id=4 ) d, ( select ILI_POL_EDW_ID,ILI_COV_EDW_ID,ILI_COV_COD
f48gq65yfhrrx PX Deq Credit: send blkd 31.34
UserID:
select p.polno, p.GL_CURR, f.TO_CURR, f.eff_date, f.RATE, c.CL_MAIN_RES,
round(c.CL_MAIN_RES*f.RATE,2) as USD_CL_MAIN_RES, t.JH_ILI_RES_CLOSING_METRIC_AM
T from ( select ILI_POL_EDW_ID,ILI_POL_NB from fdss_dli.EDW_DLI_ILI_P
OL where load_id=4 ) d, ( select ILI_POL_EDW_ID,ILI_COV_EDW_ID,ILI_COV_COD
813dx8yw93vrr CPU + Wait for CPU 2.76
UserID:
select p.polno, p.GL_CURR, f.TO_CURR, f.eff_date, f.RATE, c.CL_MAIN_RES,
round(c.CL_MAIN_RES*f.RATE,2) as USD_CL_MAIN_RES, t.JH_ILI_RES_CLOSING_METRIC_AM
T from ( select ILI_POL_EDW_ID,ILI_POL_NB from fdss_dli.EDW_DLI_ILI_P
OL where load_id=4 ) d, ( select ILI_POL_EDW_ID,ILI_COV_EDW_ID,ILI_COV_COD
-------------------------------------------------------------
Top DB Objects DB/Inst: VADP/VADP (Oct 19 16:00 to 10:15)
No data exists for this section of the report.
-------------------------------------------------------------
Top DB Files DB/Inst: VADP/VADP (Oct 19 16:00 to 10:15)
No data exists for this section of the report.
-------------------------------------------------------------
Top Latches DB/Inst: VADP/VADP (Oct 19 16:00 to 10:15)
No data exists for this section of the report.
-------------------------------------------------------------
Activity Over Time DB/Inst: VADP/VADP (Oct 19 16:00 to 10:15)
-> Analysis period is divided into smaller time slots
-> Top 3 events are reported in each of those slots
-> 'Slot Count' shows the number of ASH samples in that slot
-> 'Event Count' shows the number of ASH samples waiting for
that event in that slot
-> '% Event' is 'Event Count' over all ASH samples in the analysis period
Slot Event
Slot Time (Duration) Count Event Count % Event
-------------------- -------- ------------------------------ -------- -------
16:00:00 (90.0 min) 14,686 PX Deq Credit: send blkd 12,063 5.54
CPU + Wait for CPU 2,431 1.12
enq: BF - allocation contentio 159 0.07
17:30:00 (110.0 min) 22,150 PX Deq Credit: send blkd 20,864 9.58
CPU + Wait for CPU 1,274 0.58
db file scattered read 8 0.00
19:20:00 (110.0 min) 21,603 PX Deq Credit: send blkd 20,896 9.59
CPU + Wait for CPU 705 0.32
control file parallel write 2 0.00
21:10:00 (110.0 min) 22,894 PX Deq Credit: send blkd 20,864 9.58
CPU + Wait for CPU 1,993 0.91
db file sequential read 20 0.01
23:00:00 (110.0 min) 23,073 PX Deq Credit: send blkd 20,864 9.58
CPU + Wait for CPU 2,186 1.00
db file scattered read 19 0.01
00:50:00 (110.0 min) 22,627 PX Deq Credit: send blkd 20,864 9.58
CPU + Wait for CPU 1,759 0.81
control file parallel write 3 0.00
02:40:00 (110.0 min) 22,668 PX Deq Credit: send blkd 20,154 9.25
CPU + Wait for CPU 2,497 1.15
enq: BF - allocation contentio 17 0.01
04:30:00 (110.0 min) 21,557 PX Deq Credit: send blkd 20,896 9.59
CPU + Wait for CPU 658 0.30
control file parallel write 3 0.00
06:20:00 (110.0 min) 21,523 PX Deq Credit: send blkd 20,864 9.58
CPU + Wait for CPU 655 0.30
log file parallel write 2 0.00
08:10:00 (110.0 min) 21,016 PX Deq Credit: send blkd 20,064 9.21
CPU + Wait for CPU 951 0.44
db file scattered read 1 0.00
10:00:00 (16.0 min) 3,239 PX Deq Credit: send blkd 3,040 1.40
CPU + Wait for CPU 197 0.09
SQL*Net more data from client 1 0.00
-------------------------------------------------------------
End of Report
Now from reading several threads on here and several other sites, most of them lead me to believe that my current value of:
PARALLEL_EXECUTION_MESSAGE_SIZE=8k
should be set to a higher value...let's say
PARALLEL_EXECUTION_MESSAGE_SIZE=32k
Would this really be the issue...is there more here that I am not seeing. Why does it work on one DB and not the other.