We have a long-running batch SQL (running about 14 hours so far) that joins 16 tables to do a direct-path insert into another table. There is only one occurrence of this SQL running in the instance at any given time.
Here are some statistics from DBA_HIST_SQLSTAT:
| BEGIN_INTERVAL_TIME END_INTERVAL_TIME | EXECUTIONS ELAPSED_SEC | CPU_SEC | IOWAIT_SEC ROWS_PROCESSED | BUFFER_GETS PHYSICAL_READS |
------------------- ------------------- ------------ ------------- ------------- ------------- -------------- -------------- --------------
| 05/11/2015 00:00:28 05/11/2015 01:00:30 | 0 | 519.92 | 396.03 | 154.04 | 0 | 5,152,419 | 22,998 |
| 05/11/2015 01:00:30 05/11/2015 02:00:32 | 0 | 3,507.69 | 2,818.16 | 754.26 | 0 | 34,834,637 | 87,823 |
| 05/11/2015 02:00:32 05/11/2015 03:00:33 | 0 | 3,521.94 | 2,907.21 | 691.31 | 0 | 36,936,025 | 84,857 |
| 05/11/2015 03:00:33 05/11/2015 04:00:36 | 0 | 3,522.74 | 2,968.57 | 631.35 | 0 | 40,630,840 | 83,614 |
| 05/11/2015 04:00:36 05/11/2015 05:00:38 | 0 | 3,504.09 | 3,060.45 | 522.26 | 0 | 37,509,360 | 67,110 |
| 05/11/2015 05:00:38 05/11/2015 06:00:40 | 0 | 3,526.92 | 3,092.64 | 512.13 | 0 | 38,534,031 | 68,438 |
| 05/11/2015 06:00:40 05/11/2015 07:00:41 | 0 | 3,516.35 | 3,112.10 | 481.54 | 0 | 34,000,400 | 65,129 |
| 05/11/2015 07:00:41 05/11/2015 08:00:43 | 0 | 3,521.35 | 2,889.92 | 707.48 | 0 | 25,362,928 | 91,970 |
| 05/11/2015 08:00:43 05/11/2015 09:00:49 | 0 | 3,529.14 | 2,703.83 | 899.84 | 0 | 13,194,352 | 149,575 |
| 05/11/2015 09:00:49 05/11/2015 10:00:57 | 0 | 3,527.23 | 2,501.88 | 1,099.39 | 0 | 12,425,802 | 168,407 |
| 05/11/2015 10:00:57 05/11/2015 11:00:05 | 0 | 3,457.85 | 2,128.88 | 1,405.28 | 0 | 12,198,139 | 228,599 |
| 05/11/2015 11:00:05 05/11/2015 12:00:13 | 0 | 3,533.07 | 2,801.07 | 807.50 | 0 | 14,972,963 | 103,139 |
| 05/11/2015 12:00:13 05/11/2015 13:00:22 | 0 | 3,485.39 | 865.92 | 2,696.10 | 0 | 5,084,815 | 466,098 |
| 05/11/2015 13:00:22 05/11/2015 14:00:35 | 0 | 3,531.56 | 290.73 | 3,280.07 | 0 | 2,793,640 | 435,604 |
(curse this formatting)
I have seen that Buffer Gets have been fairly consistent from run to run. I use that as an indication of how the SQL is progressing.
Note that the hourly pace of Buffer Gets drops significantly around 08:00 (probably started sometime before that) and drops again at about 12:00. However, the Physical Reads increase significantly at the same time. Its pace is nowless than a tenth of what it was earlier in the run.
I queried DBA_HIST_ACTIVE_SESS_HISTORY for this SQL_ID, and it shows that, early in the run, the events that referenced a datafile/tablespace were all accessing the tablespace for the application (so far, so good). But later, the I/O activity switches to almost entirely accesses of the UNDO tablespace.
So, my suspicions are:
1. Something updated one of those 16 tables after this query started, and the UNDO activity is incurred to reconstruct a consistent version of the changed blocks.
2. Something triggered a Delayed Block Cleanout condition on one of those 16 tables (not necessarily after this started) and the UNDO activity is incurred trying to determine if the blocks are usable.
I have queried ASH for either of these possibilities and have come up empty. That doesn't mean it didn't happen, but ASH does not have it if it did.
I need to know if there is any other cause that could trigger this. What else should I be looking for?
This same issue (including the UNDO) occurred a few days ago, causing us to kill and re-run the job. Now it has occurred again. It has not occurred in the previous 6 months (our AWR retention)..
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi
PL/SQL Release 10.2.0.5.0 - Production
CORE 10.2.0.5.0 Production
TNS for Solaris: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production