I'm trying to puzzle out the details of a ORA-01555 I got a few days ago (10.2 database).
First, let's RTFM.
http://download.oracle.com/docs/cd/B28359_01/server.111/b28310/undo002.htm
It states that
* UNDO_RETENTION is a minimum value
* Undo retention is limited by the maximum size of the UNDO tablespace provided its datafiles are set to AUTOEXTEND
* The "real" value of undo retention at any given time is found in the most recent row in v$undostat (tuned_undoretention column)
OK, great. So here is my "snapshot too old error" as found in the alertlog:
Sat Jul 19 09:49:33 2008
ORA-01555 caused by SQL statement below (SQL ID: 3ghqu9m8u617j, Query Duration=1431 sec, SCN: 0x0001.160ce8bc):
I tried to find this SQL statement in v$undostat but it had already aged out, so I looked in DBA_HIST_UNDOSTAT:
select to_char(begin_time,'yyyy-mm-dd hh24:mi:ss') begin_time,
to_char(end_time,'yyyy-mm-dd hh24:mi:ss') end_time,
(end_time-begin_time)*86400 snapshot_len,
maxquerylen, tuned_undoretention, maxquerysqlid
from dba_hist_undostat
where maxquerysqlid='3ghqu9m8u617j'
and trunc(begin_time)='19-JUL-08'
order by begin_time
BEGIN_TIME END_TIME SNAPSHOT_L MAXQUERYLEN TUNED_UNDORETENTION MAXQUERYSQLID
-------------------- -------------------- ---------- ----------- ------------------- -------------
2008-07-19 09:27:05 2008-07-19 09:37:05 600 530 21600 3ghqu9m8u617j
2008-07-19 09:37:05 2008-07-19 09:47:05 600 1131 21600 3ghqu9m8u617j
2008-07-19 09:47:05 2008-07-19 09:57:05 600 1732 21600 3ghqu9m8u617j
2008-07-19 09:57:05 2008-07-19 10:07:05 600 2333 21600 3ghqu9m8u617j
2008-07-19 10:07:05 2008-07-19 10:17:05 600 2933 21600 3ghqu9m8u617j
2008-07-19 10:17:05 2008-07-19 10:27:05 600 3535 21600 3ghqu9m8u617j
2008-07-19 10:27:05 2008-07-19 10:37:05 600 4135 21600 3ghqu9m8u617j
2008-07-19 10:37:05 2008-07-19 10:47:05 600 4736 21600 3ghqu9m8u617j
2008-07-19 10:47:05 2008-07-19 10:57:05 600 5337 21600 3ghqu9m8u617j
2008-07-19 10:57:05 2008-07-19 11:07:05 600 5938 21600 3ghqu9m8u617j
2008-07-19 11:07:05 2008-07-19 11:17:05 600 6539 21600 3ghqu9m8u617j
2008-07-19 11:17:05 2008-07-19 11:27:05 600 7140 21600 3ghqu9m8u617j
2008-07-19 11:27:05 2008-07-19 11:37:05 600 7740 21600 3ghqu9m8u617j
2008-07-19 11:37:05 2008-07-19 11:47:05 600 8341 21600 3ghqu9m8u617j
2008-07-19 18:36:08 2008-07-19 18:46:08 600 500 21600 3ghqu9m8u617j
2008-07-19 18:46:08 2008-07-19 18:56:08 600 1101 21600 3ghqu9m8u617j
2008-07-19 18:56:08 2008-07-19 19:06:08 600 1702 21600 3ghqu9m8u617j
I'm reading from this that sql_id 3ghqu9m8u617j started a little after 9:27 and continued executing, adding about 600 seconds to MAXQUERYLEN every 10-minute period. Then, between 11:37 and 11:47, it stopped after executing for 8341 seconds.
Well, that doesn't correlate with the ORA-01555, which says that 3ghqu9m8u617j stopped at 9:49 after executing for 1431 seconds.
Who is right? dba_hist_undostat or the alertlog? And why the conflicting information?
Also, UNDO_RETENTION was set to 5 hours (18000), and as you can see it had been bumped up to 21600 (6 hours). The undo tablespace datafile was set to autoextend and was only a third of its maximum possible size.
Perhaps there was a RETENTION GUARANTEE set on the undo tablespace, and some other query was hogging all the undo? Nope. It's set to NOGUARANTEE.
Apparently I had plenty of undo space. So why was there an error at all?
I seem to be missing something. I'd like to understand this conundrum.
Thanks,
Fred