11.2.0.4
We have a java application that is inserting records at a rate of anywhere from 500k to 1.5million rows per hour. Running for 8 hours a day. table is truncated at night, every night.
The app is inserting 1 row at a time in batches of 1000, bad design, its being redesigned to insert in bulk. We'll deal with that later. Problem is, 3-4 times a day, the app is crashing as its local cache is growing too big because the inserts arent going in quick enough. restart of their app fixes. Id be confident if they implement bulk load it should fix this, but will be a while coming so hence me looking at it from the DB. I ran a SQL_ID AWR report to get specific statistics for an hour it didnt crash and an hour it did to see what the loads like for that specific insert. Nearly twice the rows inserted at 1.2m for the hour it didnt crash vs 660k for the hour it did. Hmm, something else then,
Table definition. 34 column table, numbers and varchars, no long, clobs or anything. No triggers.
Its on its own tablespace with 3 datafiles on 15k VMAX disks. No other processes hit this.
2 indexes, a PK and on a date.
7 not null constraints.
Here are the stats for the hour it did crash
Plan Hash Total Elapsed 1st Capture Last Capture
# Value Time(ms) Executions Snap ID Snap ID
--- ---------------- ---------------- ------------- ------------- --------------
1 0 53,929 66,4691 42571 42571
-------------------------------------------------------------
Stat Name Statement Per Execution % Snap
---------------------------------------- ---------- -------------- -------
Elapsed Time (ms) 53,929 0.1 0.6
CPU Time (ms) 36,158 0.1 0.6
Executions 664,691 N/A N/A
Buffer Gets 4,983,524 7.5 1.0
Disk Reads 2,513 0.0 0.1
Parse Calls 9 0.0 0.0
Rows 664,684 1.0 N/A
User I/O Wait Time (ms) 17,234 N/A N/A
Cluster Wait Time (ms) 0 N/A N/A
Application Wait Time (ms) 0 N/A N/A
Concurrency Wait Time (ms) 255 N/A N/A
Invalidations 0 N/A N/A
Version Count 13 N/A N/A
Sharable Mem(KB) 191 N/A N/A
-------------------------------------------------------------
Here are the stats for the hour it didnt crash (which was the hour before the one above)
Plan Hash Total Elapsed 1st Capture Last Capture
# Value Time(ms) Executions Snap ID Snap ID
--- ---------------- ---------------- ------------- ------------- --------------
1 0 72,610 129,4455 42570 42570
-------------------------------------------------------------
Stat Name Statement Per Execution % Snap
---------------------------------------- ---------- -------------- -------
Elapsed Time (ms) 72,610 0.1 1.1
CPU Time (ms) 66,016 0.1 1.2
Executions 1,294,455 N/A N/A
Buffer Gets 9,650,990 7.5 1.9
Disk Reads 618 0.0 0.0
Parse Calls 22 0.0 0.0
Rows 1,294,455 1.0 N/A
User I/O Wait Time (ms) 6,241 N/A N/A
Cluster Wait Time (ms) 0 N/A N/A
Application Wait Time (ms) 0 N/A N/A
Concurrency Wait Time (ms) 5 N/A N/A
Invalidations 0 N/A N/A
Version Count 13 N/A N/A
Sharable Mem(KB) 191 N/A N/A
-------------------------------------------------------------
So based on above we seem to have somethign hurting the IO when crashes (this is the only app that complains, but it is high volume). Despite half the rows over the hour, The disk reads are 4 times as much and user I/O waits are nearly 3 times then the hour where it doesnt crash. Im going to take this as an indicator of where the potential answer lies. Where next to look? No other access to the tables and that tables datafiles while this is running, some batch stuff hits it at night before it gets truncated for the next day.
Im gathering traces from the storage to see workload