Env:
===
Oracle 10.2.0.4
Red Hat Enterprise Linux Server release 5.11 (Tikanga)
6 databases running on RAC with two nodes
ASM
Background:
=========
Update on 32million rows in one DB caused archiving of redo logs to fill up - users received "ORA-00257 archiver error. connect internal only until freed."
Perfomed tasks and investigations:
==========================
* My first option was to connect to RMAN and remove old archivelogs, then perform new backup. Due to other parallel tasks I could only attend to this about 12 hours after errors occurred.
RMAN> connect target /
However, this only resulted in several hours of waiting - did not get the usual "connected to target database: ...". I could not either run "sqlplus / as sysdba" - it also seemed to hang without sending any response. I could not access the DB.
* Next I accessed the asmcmd-prompt, found the archived logs and manually removed them by entering "rm ...". Plan to run crosscheck in RMAN afterwards. Result below:
ASMCMD> pwd
+fra/jbeccc/archivelog/2016_04_20
ASMCMD> ls
ASMCMD>
ASMCMD> lsdg
State Type Rebal Unbal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Name
MOUNTED EXTERN N N 512 4096 1048576 1126378 723233 0 723233 0 DATA/
MOUNTED EXTERN N N 512 4096 1048576 102398 100670 0 100670 0 FRA/
A
* After the above lots of free space on FRA where logs are archived but I could still not successfully connect with either RMAN or sqlplus. Next I shut down both instances:
[-DB- oracle@dbhcora02 bin]$ srvctl stop instance -d JBECCC -i JBECCC1
[-DB- oracle@dbhcora02 bin]$ srvctl stop instance -d JBECCC -i JBECCC2
...the above did eventually work but took unusually long (about 10-15min per command). However, processes seem to be still up judging from:
[-DB- oracle@dbhcora01 bdump]$ ps -ef|grep pmon
oracle 5504 27196 0 10:26 pts/3 00:00:00 grep pmon
oracle 13279 1 0 05:57 ? 00:00:00 ora_pmon_JBACCC1
oracle 17224 1 0 Jan21 ? 00:00:08 asm_pmon_+ASM1
oracle 17745 1 0 Jan21 ? 00:02:54 ora_pmon_BMYCCC1
oracle 18961 1 0 Jan21 ? 00:00:23 ora_pmon_BOACCC1
oracle 19778 1 0 Jan21 ? 00:00:50 ora_pmon_BOECCC1
oracle 22387 1 0 Jan21 ? 00:03:34 ora_pmon_JBECCC1 <---should not be there after instance shutdown
oracle 23184 1 0 Jan21 ? 00:01:45 ora_pmon_RAYCCC1
[-DB- oracle@dbhcora01 bdump]$
* I restart the instances, again it takes very long, but I still cannot connect through RMAN or sqlplus
Data from the alert-log:
---- first warnings about this ----
Wed Apr 20 12:15:56 2016
Beginning log switch checkpoint up to RBA [0x21cf.2.10], SCN: 123261622749
Wed Apr 20 12:15:56 2016
Thread 1 advanced to log sequence 8655 (LGWR switch)
Current log# 3 seq# 8655 mem# 0: +DATA/jbeccc/redo03.log
Wed Apr 20 12:15:56 2016
Errors in file /u01/app/oracle/admin/JBECCC/bdump/jbeccc1_arc0_22609.trc:
ORA-19815: WARNING: db_recovery_file_dest_size of 32212254720 bytes is 85.25% used, and has 4752146432 remaining bytes available.
Wed Apr 20 12:15:56 2016
************************************************************************
You have following choices to free up space from flash recovery area:
1. Consider changing RMAN RETENTION POLICY. If you are using Data Guard,
then consider changing RMAN ARCHIVELOG DELETION POLICY.
2. Back up files to tertiary device such as tape using RMAN
BACKUP RECOVERY AREA command.
3. Add disk space and increase db_recovery_file_dest_size parameter to
reflect the new space.
4. Delete unnecessary files using RMAN DELETE command. If an operating
system command was used to delete files, then use RMAN CROSSCHECK and
DELETE EXPIRED commands.
************************************************************************
---- later ----
Wed Apr 20 12:30:55 2016
ORA-19815: WARNING: db_recovery_file_dest_size of 32212254720 bytes is 100.00% used, and has 0 remaining bytes available.
...
ORA-19809: limit exceeded for recovery files
ORA-19804: cannot reclaim 103809024 bytes disk space from 32212254720 limit
...
ARC0: Error 19809 Creating archive log file to '+FRA'
ARCH: Archival stopped, error occurred. Will continue retrying
---- continuous ----
ORA-16038: log 3 sequence# 8700 cannot be archived
ORA-19809: limit exceeded for recovery files
ORA-00312: online log 3 thread 1: '+DATA/jbeccc/redo03.log'
Thu Apr 21 10:18:02 2016
ARCH: Archival stopped, error occurred. Will continue retrying
ORACLE Instance JBECCC1 - Archival Error
ORA-16014: log 3 sequence# 8700 not archived, no available destinations
ORA-00312: online log 3 thread 1: '+DATA/jbeccc/redo03.log'
What can I do?
All good suggestions are of interest.
Feel free to request additional info if I've by mistake omitted relevant data.