Hi, we are using version 11.2.0.3.0 of oracle version. I had initially posted the question in this thread"https://community.oracle.com/thread/4069296" and was supposed to mark that as complete as we were under impression that the issue is mitigated, but the issue reappears again, so opening this new thread with additional details.
We have one insert query which inserts record into one of our transaction table, and daily we insert ~150+million records into this table from java batch in parallel(Batch INSERT), (so 150million+ execution of the INSERT statement). Suddenly we saw the INSERT query is performing poorly and there were backlogs getting piled up at java end, and we are struggling to catch those up. As i saw from dba_hist_Sqlstat, this query has started doing high I/O(verified from IOWAIT_delta column) as compared to other days making it to run longer for each execution. and i see the obj# is coming as its index partitions. in normal days we used have Avg~150 reads/execution now its Avg ~600disk reads/execution.Initially this insert it was taking avg~.01 minutes/execution in normal days, now its consuming ~.5minutes/execution. Execution plan is exactly same as previous and this insert is also very straight forward. also i see overall data base level useri/o(DB file sequential read) spike and on top, this query is coming in picture. This INSERT query is using ARRAY binds from Java, so i am visualizing this as bulk bind(FOR ALL) sort of INSERT in oracle.Yes another thing we did, we were having AWR retention period initially 40days and we recently changed it to ~91days, but hope that would not impact the database as we have not increased the snap interval interval anyway(its 1-hour as it was before). so current days, we are kind of always under panic when we are seeing any i/o spike. So need experts suggestions to get the RCA and get that fixed rather doing guesswork.
Below is the query-which is slowing down
INSERT INTO tab1 (c1, c2, c3, c4,
c5, c6, c7, c8, c9.....) VALUES (sysdate, 'INFO',
:1 , :2 , :3 , :4 , :5 , :6 , :7,.... )
-------------------------------------------------
| Id | Operation | Name | Cost |
-------------------------------------------------
| 0 | INSERT STATEMENT | | 1 |
| 1 | LOAD TABLE CONVENTIONAL | | |
-------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - INS$1
Note
-----
- cpu costing is off (consider enabling it)
initially to get the issue fixed and to BAU, we did restart the database thinking if we had any issue with the long running monthend reports capturing the buffer cache with the yesterdays data blocks and so the restart would help to make all them start again from scratch so I/O will be in control for these INSERTS(mainly index blocks). But that didn't help, we were still processing the INSERT slowly, so then we had got direction from pERF DBA'S to increase the NEXT EXTENT size from 8K to 64M for the table and index partitions. And then it seems to us, we were kind of back to BAU and those INSERTS were processed faster and backlog caughtup. Now again we have similar slowness observed after 10days for ~2hrs. Then we just killed some long reporting queries and it was just guess work and then things became BAU, so we are not yet very sure regarding the root cause which makes these slowness or high I/O by this INSERT sql. I have published stats below from dba_hist_sqlstat and dba_hist_active_sess_history both
As Jonathan mentioned in last thread, "There's also the possibility that checkpoint problems and log transfer to standby instance are having an impact.", Not sure how we are relating it with the high I/O done by the Insert query but yes, We use gloden gate to replicate the data to another active database(which we call as Disaster Recovery, we dont have dataguard), and i think, glodengate applies exact statement to the target DB but not logs , so not sure if that would affect the performance of this source INSERT query anyway?
Also as Andrew mentioned in last thread "As you compared a snapshot to a snapshot, instead of comparing for the same number of rows inserted, we are comparing against a time interval. It would be useful to also compare how many times the insert was fired and how many rows were inserted." I am really trying to get the count of execution of that INSERT statement but the figure(sum(EXECUTIONS_DELTA)) which i am seeing in dba_hist_sqlstat is far away from the actual. Each hour we INSERT ~6million+ records using the INSERT statement, but its not matching with the figure captured in dba_hist_sqlstat. So i am not very sure, if we had got the high number of transactions(batchy insert from JAVA) for that exact minute/hour, which has the triggering impact on the INSERT slowness which lasts for days to get back to BAU/backlog cleared. Which will mean that, we are hitting threshold in terms of processing those batch INSERTS in DB side from Java? Again is it that , as these are using BATCH inserts technique from JAVA, so the query execution count which i get from sum(EXECUTIONS_DELTA) in dba_hist_sqlstat was not giving correct number, or might be it executes so fast that its not getting captured in dba_hist_sqlstat.
Below are the data from dba_hist_sqlstat for the specific sql-1231123123wer(INSERT statement in to table TAB1):
SAMPLE_END -MIN (s.end_interval_time):- Format(DD-MON-YYYY HH24)
EXECUTIONS=SUM(EXECUTIONS_DELTA)
PIO_PER_EXEC=SUM(DISK_READS_delta)/SUM (executions_delta)
CPU_TIME_EXEC=SUM(cpu_time_delta)/SUM (executions_delta)
CONCURRENCY_PER_EXEC=SUM(ccwait_delta)/SUM (executions_delta)
LIO_PER_EXEC=SUM(BUFFER_GETS_delta)/SUM (executions_delta)
IO_PER_EXEC=SUM(IOWAIT_delta)/SUM (executions_delta)
MSEC_EXEC=(SUM(ELAPSED_TIME_delta)/SUM (executions_delta))/1000000/60
as you see the slowness is observed during the period when the elapsed time per execution increased from ~.03minutes to .3minutes at 1st august 10AM, and this lasted till 2nd august 1pm. Again same issue started on 18th august 11AM just for 2hrs then things went normal. I have marked the slow executions in bold
SAMPLE_END EXECUTIONS PIO_PER_EXEC LIO_PER_EXEC CPU_TIME_EXEC CONCURRENCY_PER_EXEC IO_PER_EXEC MSEC_EXEC
01-AUG-2017 00 7523 64.1 13656.7 150576.9 28042.3 718844 0.02
01-AUG-2017 01 8500 3.9 13645.7 141735.3 7705.7 42713.2 0.00
01-AUG-2017 02 7465 67.2 15090.5 164221 12945.1 648493.5 0.02
01-AUG-2017 03 4286 234.9 17007.1 205672 15450 1630362.6 0.03
01-AUG-2017 04 5430 273.3 16416.1 201445.7 7499.9 1686160.1 0.03
01-AUG-2017 05 4676 350.2 17043.4 226296 15500.5 2993848.1 0.06
01-AUG-2017 06 7985 62.5 11653.9 140393.2 4751.4 726457.1 0.02
01-AUG-2017 07 8670 106.9 13769.3 162237.6 4227.9 1152926.5 0.02
01-AUG-2017 08 8470 100.3 16671.7 185792.2 7698.2 1250294.7 0.02
01-AUG-2017 09 10526 136.2 16678.5 194791.9 10157.2 1442744.7 0.03
01-AUG-2017 10 1365 1439.1 18772.1 425890.1 14705.6 22485888.3 0.38
01-AUG-2017 11 1073 2144.7 19450.4 523196.6 5639.7 32997275.9 0.56
01-AUG-2017 12 2375 1462.8 17899.2 377195.8 4277.6 14696319.8 0.25
01-AUG-2017 13 3490 1274.3 18510.7 353510 2484 9591990 0.17
01-AUG-2017 14 3656 1205.3 16062 321053.1 7972.1 8778938.2 0.15
01-AUG-2017 15 3643 1171.7 17552.8 332552.8 2736.7 9225229.9 0.16
01-AUG-2017 16 1674 1341.9 18515 376762.2 5905.1 20975597.4 0.36
01-AUG-2017 17 1337 1788.4 17922.7 445834 3249.1 26933183.1 0.46
01-AUG-2017 18 2632 1235.2 17012.5 335482.5 2898.9 12975642.8 0.22
01-AUG-2017 19 2700 1252.8 18003.6 349014.8 27209 13041465.4 0.22
01-AUG-2017 20 3452 930.3 18376.8 298076.5 2092.8 8089613.6 0.14
01-AUG-2017 21 3748 1268.6 18749.1 341080.6 4316.3 8797350.2 0.15
01-AUG-2017 22 3947 1240.7 18892.2 339371.7 2390.4 8358006.4 0.15
01-AUG-2017 23 4305 919.2 18638.7 308861.8 1037.8 7650412.5 0.13
02-AUG-2017 00 6788 477.2 17870.2 252807.9 887.1 4736455.1 0.08
02-AUG-2017 01 6451 602.8 18903.3 273539 1415.8 4729156.8 0.08
02-AUG-2017 02 5068 482.7 19054.2 269218.6 10117.2 6577973.9 0.11
02-AUG-2017 03 5426 496.5 18301.9 263415 653 6007050.5 0.10
02-AUG-2017 04 6269 565.6 19139.6 270827.9 950.8 5087833.1 0.09
02-AUG-2017 05 190 1552.3 19182.7 450210.5 553.7 22419829.9 0.38
02-AUG-2017 06 1745 1342.5 18951.1 408458.5 680.8 20220353.9 0.34
02-AUG-2017 07 2268 1005.3 19094.4 364841.3 691.3 15583149.1 0.27
02-AUG-2017 08 2729 1013.1 18487.8 327955.3 9614 12204144.5 0.21
02-AUG-2017 09 2967 1320 19495 368294.6 2480.8 11717933.1 0.20
02-AUG-2017 10 2850 1411.7 18165.6 410824.6 5496.7 19316054.3 0.33
02-AUG-2017 12 124 2173.3 18046.9 508790.3 6808.8 29452282.3 0.50
02-AUG-2017 13 2167 1746.2 19052.3 445288.4 1483.8 15419082.9 0.26
02-AUG-2017 14 8575 818.3 17943.9 277521.9 968.4 3250029.3 0.06
02-AUG-2017 15 12685 560.8 18489 250365.8 4597.7 1531295.4 0.03
02-AUG-2017 16 7448 13.8 13577.1 143231.7 5817.4 102794.5 0.01
02-AUG-2017 17 13855 16.3 16756.4 165159.1 11553.2 100080.1 0.01
02-AUG-2017 18 11645 95.1 17621.6 184827 8806.5 391426.9 0.01
02-AUG-2017 19 12430 114.6 15764.8 180029 4265.1 486051.9 0.01
02-AUG-2017 20 12768 110.9 17809.2 191923.6 4940.7 511255.3 0.01
02-AUG-2017 21 12852 57 19497.4 189393.9 6485.7 254535.1 0.01
02-AUG-2017 22 19171 38.4 15718 154304.9 2806.3 180435 0.01
02-AUG-2017 23 16421 173.4 15860.3 179076.8 2262.5 686276.4 0.02
18-AUG-2017 00 9585 5.9 13745.8 138247.3 25526.9 57233.2 0.01
18-AUG-2017 01 4522 37.6 12826.7 146570.1 19157.3 438420.9 0.01
18-AUG-2017 02 4254 55.5 15766.4 159351.2 11787.6 778196.2 0.02
18-AUG-2017 03 4236 172.6 16405.1 180507.6 9528.2 1733081.7 0.03
18-AUG-2017 04 4573 324.4 16671.4 200876.9 7075.8 2253632.9 0.04
18-AUG-2017 05 3773 300.2 16809.8 202960.5 4856.3 3190170.9 0.06
18-AUG-2017 06 3683 106.5 17493.6 183869.1 5904.9 1594598.1 0.03
18-AUG-2017 07 4942 116.7 16781.7 181485.2 5596.7 1615343.1 0.03
18-AUG-2017 08 8360 93 16762.9 172672.2 4528.4 1077904.8 0.02
18-AUG-2017 09 8877 140.7 15395.6 178219 4436.3 1569586 0.03
18-AUG-2017 10 10122 171.2 17561.7 196854.4 2882.3 2694655 0.05
18-AUG-2017 11 1886 1230.7 16266.6 345800.6 2554.3 18555441.3 0.31
18-AUG-2017 12 3885 994.5 18026.1 295251 1537 9058392.7 0.16
18-AUG-2017 13 6592 596.7 17844.1 242003.9 1278.9 4570744.2 0.08
18-AUG-2017 14 5806 748.4 17686.4 255101.6 1035.4 5420140 0.09
18-AUG-2017 15 5565 689.7 18380.5 264285.7 1134.4 5660885.8 0.10
18-AUG-2017 16 6259 550.4 17982.5 258170.6 1182.5 5201512.1 0.09
18-AUG-2017 17 10313 322.1 19109.2 223175.6 1025.1 2451553.9 0.04
18-AUG-2017 18 16883 134.2 18938.4 186732.2 1450.7 901143 0.02
18-AUG-2017 19 10592 290.4 19169.9 212254.5 1930.6 1611194.6 0.03
18-AUG-2017 20 7622 489.2 18744.8 228159.3 1762.3 1584379.1 0.03
18-AUG-2017 21 6800 731.7 18461.8 258958.8 2766.9 1614815.9 0.03
18-AUG-2017 22 7884 893.4 18607.7 268004.8 2068.7 1910353.7 0.04
18-AUG-2017 23 6496 260.9 15290.7 170689.7 9745.7 862330.2 0.02
Below are the data from dba_hist_active_sess_history (for 1st august 11AM to 12 i.e 1-hr duration during ISSUE):
I used below sql to fetch from dba_hist_active_sess_history
select /*+ optimizer_features_enable('11.2.0.4') all_rows*/
sql_id,sql_plan_operation, sql_plan_options, NVL(event,'CPU') event,
session_state,
(select object_name from dba_objects o where o.object_id=ash.current_obj#) object_name
,ROUND(SUM(ash.delta_read_io_bytes/1024/1024 ),2) Read_MB,
COUNT(DISTINCT sample_id) cnt,
MAX(sample_time) most_recent_time
from dba_hist_active_sess_history ash where snap_id between 106059 and 106060
group by sql_id, current_obj#,sql_plan_operation, sql_plan_options, event, session_state
order by cnt desc nulls last
sql_id,sql_plan_operation, sql_plan_options,event,session_state,object_name,Read_MB,cnt, most_recent_time
1231123123wer db file sequential read WAITING TAB1_IX3 17501.97 691 8/1/2017 12:00:11.333 PM
1231123123wer db file sequential read WAITING TAB1_IX4 17792.88 687 8/1/2017 12:00:11.333 PM
45654jhgjg78 db file sequential read WAITING TAB2_IX2 16100.56 684 8/1/2017 12:00:11.333 PM
45654jhgjg78 db file sequential read WAITING TAB2_IX1 16513.36 683 8/1/2017 12:00:11.333 PM
9879879879kio INDEX RANGE SCAN db file sequential read WAITING 34665.77 662 8/1/2017 12:00:11.333 PM
fbq52zjtgd0nt TABLE ACCESS FULL db file scattered read WAITING TAB3 156363.43 625 8/1/2017 12:00:11.333 PM
1231123123wer db file sequential read WAITING TAB1_IX8 8576.77 608 8/1/2017 12:00:11.333 PM
2hsmc282yu239 db file sequential read WAITING TAB4_IX3 25441.91 589 8/1/2017 12:00:11.333 PM
db file async I/O submit WAITING 28.05 588 8/1/2017 12:00:11.333 PM
2yzjgxvwmrjpt INDEX RANGE SCAN db file sequential read WAITING 21875.88 434 8/1/2017 12:00:11.333 PM
c8jg8va9n9pa3 TABLE ACCESS BY LOCAL INDEX ROWID db file parallel read WAITING TAB1 34499.91 347 8/1/2017 11:41:41.537 AM
c9tcka0a5sz7d TABLE ACCESS BY INDEX ROWID db file sequential read WAITING TAB10 17164.57 319 8/1/2017 11:12:21.580 AM
Below are the data from dba_hist_active_sess_history (for 2nd august 5pM to 6pM i.e 1-hr duration during BAU):
sql_id,sql_plan_operation, sql_plan_options,event,session_state,object_name,Read_MB,cnt, most_recent_time
76765464765ui TABLE ACCESS BY LOCAL INDEX ROWID db file sequential read WAITING TAB1 80692.42 683 8/2/2017 6:00:30.009 PM
76765464765ui INDEX RANGE SCAN CPU ON CPU TAB1 60822.07 672 8/2/2017 6:00:30.009 PM
c6gap9ac93nn0 INDEX RANGE SCAN db file sequential read WAITING 29845.94 644 8/2/2017 6:00:30.009 PM
log file parallel write WAITING 34.04 533 8/2/2017 6:00:30.009 PM
9twyt2720cb1y TABLE ACCESS BY LOCAL INDEX ROWID db file sequential read WAITING TAB1 27986.13 508 8/2/2017 6:00:19.869 PM
db file async I/O submit WAITING 11.42 460 8/2/2017 6:00:09.739 PM
fbq52zjtgd0nt TABLE ACCESS FULL db file scattered read WAITING TAB3 216106.79 452 8/2/2017 5:59:18.938 PM
CPU ON CPU 14605.23 423 8/2/2017 6:00:30.009 PM
log file sync WAITING 3542.33 412 8/2/2017 6:00:19.869 PM
c5qy916bph8hg SELECT STATEMENT library cache: mutex X WAITING 2353.41 369 8/2/2017 5:59:59.609 PM
dhs8nnzkcj8j7 TABLE ACCESS BY LOCAL INDEX ROWID db file sequential read WAITING TAB1 23591.57 350 8/2/2017 6:00:30.009 PM
csnhxbdpzc2g3 TABLE ACCESS FULL direct path read WAITING TAB11 723036.05 300 8/2/2017 5:50:00.115 PM
If i just fetch the data specific to the sql(INSERT) statement for both the period, below is the results:
ISSUE period:
sql_id,sql_plan_operation, sql_plan_options,event,session_state,object_name,Read_MB,cnt, most_recent_time
1231123123wer db file sequential read WAITING TAB1_IX3 17501.97 691 8/1/2017 12:00:11.333 PM
1231123123wer db file sequential read WAITING TAB1_IX4 17792.88 687 8/1/2017 12:00:11.333 PM
1231123123wer db file sequential read WAITING TAB1_IX8 8576.77 608 8/1/2017 12:00:11.333 PM
1231123123wer db file sequential read WAITING TAB1_IX1 700.63 92 8/1/2017 11:59:40.743 AM
1231123123wer CPU ON CPU TAB1_IX8 343.48 46 8/1/2017 11:57:48.722 AM
1231123123wer CPU ON CPU TAB1_IX4 328.69 41 8/1/2017 12:00:11.333 PM
1231123123wer CPU ON CPU TAB1_IX3 263.05 33 8/1/2017 11:58:49.852 AM
1231123123wer db file sequential read WAITING TAB1_IX6 47.02 7 8/1/2017 11:57:48.722 AM
1231123123wer db file sequential read WAITING TAB1_IX2 37.5 6 8/1/2017 11:51:52.200 AM
1231123123wer db file sequential read WAITING TAB1_IX9 22.99 5 8/1/2017 11:31:00.596 AM
1231123123wer db file sequential read WAITING TAB1_IX7 24.37 4 8/1/2017 11:43:43.647 AM
BAU period:
sql_id,sql_plan_operation, sql_plan_options,event,session_state,object_name,Read_MB,cnt, most_recent_time
1231123123wer db file sequential read WAITING TAB1_IX4 872.6 170 8/2/2017 6:00:19.869 PM
1231123123wer CPU ON CPU TAB1_IX4 489.68 155 8/2/2017 5:58:48.618 PM
1231123123wer db file sequential read WAITING TAB1_IX3 792.27 130 8/2/2017 5:59:59.609 PM
1231123123wer db file sequential read WAITING TAB1_IX8 503.05 115 8/2/2017 5:59:59.609 PM
1231123123wer CPU ON CPU TAB1_IX8 284.5 101 8/2/2017 6:00:19.869 PM
1231123123wer CPU ON CPU TAB1_IX3 284.93 86 8/2/2017 5:59:59.609 PM
1231123123wer db file sequential read WAITING TAB1_IX1 157.59 54 8/2/2017 5:59:59.609 PM
1231123123wer CPU ON CPU TAB1_IX1 109.8 48 8/2/2017 5:58:08.000 PM
1231123123wer CPU ON CPU 77.97 45 8/2/2017 5:59:29.058 PM