RAC 10.2.0.5 asm redhat5 64位出现不停自己重启的现象
手上一个rac的库,(版本10.2.0.5 64bit,操作系统是redhat5 64bit),9月份的时候down机了,查看了alert日志,在alert_asm.log中发现有io failed,在alert_orcl.log中发现有ORA-00204: error in reading (block 35, # blocks 1) of control file的报错.同事重启之后,能恢复过来,但不久之后在alert日志中又会发现,诸如:
Reconfiguration started (old inc 16, new inc 17)
List of nodes:
0
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
之类的提示,经过观察实例会不停的自己down掉再重启..
现在拿到手上想处理一下,但不知道该怎么下手,请各位指点,谢谢..
-----
2个节点的alert_asm.log如下:
asm1:
NOTE: ASMB process exiting due to lack of ASM file activity for 12 seconds
Wed Sep 12 13:13:25 CST 2012
WARNING: IO Failed. au:43 diskname:ORCL:VOL1
rq:0x2ab86feb6f88 buffer:0x627ed000 au_offset(bytes):720896 iosz:4096 operation:1
status:2
NOTE: cache initiating offline of disk 0 group 1
WARNING: process 6933 initiating offline of disk 0.3915955288 (VOL1) with mask 0x3 in group 1
WARNING: Disk 0 in group 1 in mode: 0x7,state: 0x2 will be taken offline
NOTE: PST update: grp = 1, dsk = 0, mode = 0x6
Wed Sep 12 13:13:25 CST 2012
ERROR: too many offline disks in PST (grp 1)
Wed Sep 12 13:13:25 CST 2012
ERROR: PST-initiated MANDATORY DISMOUNT of group DATA
Wed Sep 12 13:13:25 CST 2012
WARNING: Disk 0 in group 1 in mode: 0x7,state: 0x2 was taken offline
Wed Sep 12 13:13:25 CST 2012
NOTE: halting all I/Os to diskgroup DATA
NOTE: active pin found: 0x0x65faf748
NOTE: active pin found: 0x0x65faf8a8
Wed Sep 12 13:13:26 CST 2012
NOTE: cache dismounting group 1/0xB8984CA8 (DATA)
Wed Sep 12 13:13:27 CST 2012
kjbdomdet send to node 1
detach from dom 1, sending detach message to node 1
Wed Sep 12 13:13:27 CST 2012
Dirty detach reconfiguration started (old inc 16, new inc 16)
List of nodes:
0 1
Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 1 invalid = TRUE
116 GCS resources traversed, 0 cancelled
6104 GCS resources on freelist, 6124 on array, 6124 allocated
Dirty Detach Reconfiguration complete
Wed Sep 12 13:13:27 CST 2012
WARNING: dirty detached from domain 1
Wed Sep 12 13:13:27 CST 2012
NOTE: PST enabling heartbeating (grp 1)
Wed Sep 12 13:13:27 CST 2012
SUCCESS: diskgroup DATA was dismounted
Wed Sep 12 13:13:27 CST 2012
WARNING: PST-initiated MANDATORY DISMOUNT of group DATA not performed - group not mounted
Wed Sep 12 13:13:27 CST 2012
Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm1_b001_7494.trc:
ORA-15001: diskgroup "DATA" does not exist or is not mounted
Wed Sep 12 13:13:28 CST 2012
freeing rdom 1
Received dirty detach msg from node 1 for dom 1
Wed Sep 12 14:53:41 CST 2012
Reconfiguration started (old inc 16, new inc 17)
List of nodes:
0
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Wed Sep 12 14:53:41 CST 2012
LMS 0: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Wed Sep 12 14:53:41 CST 2012
LMS 0: 0 GCS shadows traversed, 0 replayed
Wed Sep 12 14:53:41 CST 2012
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
Wed Sep 12 14:53:51 CST 2012
Shutting down instance (abort)
License high water mark = 4
Instance terminated by USER, pid = 14969
Wed Sep 12 14:56:38 CST 2012
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 eth1 10.185.3.0 configured from OCR for use as a cluster interconnect
Interface type 1 eth0 10.185.3.0 configured from OCR for use as a public interface
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/oracle/db_1/dbs/arch
Autotune of undo retention is turned off.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.5.0.
System parameters with non-default values:
large_pool_size = 12582912
instance_type = asm
cluster_database = TRUE
instance_number = 1
remote_login_passwordfile= EXCLUSIVE
background_dump_dest = /u01/app/oracle/admin/+ASM/bdump
user_dump_dest = /u01/app/oracle/admin/+ASM/udump
core_dump_dest = /u01/app/oracle/admin/+ASM/cdump
asm_diskstring = ORCL:VOL*
asm_diskgroups = DATA
Cluster communication is configured to use the following interface(s) for this instance
10.185.3.77
Wed Sep 12 14:56:39 CST 2012
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
PMON started with pid=2, OS id=7327
LMON started with pid=5, OS id=7333
PSP0 started with pid=4, OS id=7331
DIAG started with pid=3, OS id=7329
LMD0 started with pid=6, OS id=7335
LMS0 started with pid=7, OS id=7337
MMAN started with pid=8, OS id=7341
DBW0 started with pid=9, OS id=7343
LGWR started with pid=10, OS id=7345
CKPT started with pid=11, OS id=7347
SMON started with pid=12, OS id=7349
RBAL started with pid=13, OS id=7351
GMON started with pid=14, OS id=7353
Wed Sep 12 14:56:40 CST 2012
lmon registered with NM - instance id 1 (internal mem no 0)
Wed Sep 12 14:56:40 CST 2012
Reconfiguration started (old inc 0, new inc 2)
ASM instance
List of nodes:
0 1
Global Resource Directory frozen
Communication channels reestablished
* allocate domain 1, invalid = TRUE
* domain 1 valid = 1 according to instance 1
Wed Sep 12 14:56:40 CST 2012
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Wed Sep 12 14:56:40 CST 2012
LMS 0: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Wed Sep 12 14:56:40 CST 2012
LMS 0: 0 GCS shadows traversed, 0 replayed
Wed Sep 12 14:56:40 CST 2012
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
LCK0 started with pid=15, OS id=7360
Wed Sep 12 14:56:41 CST 2012
SQL> ALTER DISKGROUP ALL MOUNT
Wed Sep 12 14:56:41 CST 2012
NOTE: cache registered group DATA number=1 incarn=0xdd6857ac
Wed Sep 12 14:56:41 CST 2012
Loaded ASM Library - Generic Linux, version 2.0.4 (KABI_V2) library for asmlib interface
Wed Sep 12 14:56:41 CST 2012
NOTE: Hbeat: instance not first (grp 1)
NOTE: cache opening disk 0 of grp 1: VOL1 label:VOL1
Wed Sep 12 14:56:41 CST 2012
NOTE: F1X0 found on disk 0 fcn 0.0
NOTE: cache mounting (not first) group 1/0xDD6857AC (DATA)
Wed Sep 12 14:56:41 CST 2012
kjbdomatt send to node 1
Wed Sep 12 14:56:42 CST 2012
NOTE: attached to recovery domain 1
Wed Sep 12 14:56:42 CST 2012
NOTE: LGWR attempting to mount thread 2 for disk group 1
NOTE: LGWR mounted thread 2 for disk group 1
NOTE: opening chunk 2 at fcn 0.131392 ABA
NOTE: seq=41 blk=1148
Wed Sep 12 14:56:42 CST 2012
NOTE: cache mounting group 1/0xDD6857AC (DATA) succeeded
SUCCESS: diskgroup DATA was mounted
Wed Sep 12 14:56:43 CST 2012
NOTE: recovering COD for group 1/0xdd6857ac (DATA)
SUCCESS: completed COD recovery for group 1/0xdd6857ac (DATA)
Wed Sep 12 14:56:45 CST 2012
Starting background process ASMB
ASMB started with pid=17, OS id=7454
Wed Sep 12 14:56:55 CST 2012
NOTE: ASMB process exiting due to lack of ASM file activity for 12 seconds
asm2:
NOTE: ASMB process exiting due to lack of ASM file activity for 12 seconds
Received dirty detach msg from node 0 for dom 1
Wed Sep 12 13:13:30 CST 2012
Dirty detach reconfiguration started (old inc 16, new inc 16)
List of nodes:
0 1
Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 1 invalid = TRUE
14 GCS resources traversed, 0 cancelled
6104 GCS resources on freelist, 6124 on array, 6124 allocated
99 GCS shadows traversed, 0 replayed
Dirty Detach Reconfiguration complete
Wed Sep 12 13:13:30 CST 2012
NOTE: SMON starting instance recovery for group 1 (mounted)
Wed Sep 12 13:13:30 CST 2012
WARNING: IO Failed. au:0 diskname:ORCL:VOL1
rq:0x2b88463fb990 buffer:0x2b884670ca00 au_offset(bytes):0 iosz:4096 operation:0
status:2
WARNING: IO Failed. au:0 diskname:ORCL:VOL1
rq:0x2b88463fb990 buffer:0x2b884670ca00 au_offset(bytes):0 iosz:4096 operation:0
status:2
WARNING: IO Failed. au:4 diskname:ORCL:VOL1
rq:0xe4372e0 buffer:0x6045f000 au_offset(bytes):0 iosz:4096 operation:0
status:2
WARNING: cache failed to read gn 1 fn 3 blk 0 count 1 from disk 0
ERROR: cache failed to read fn=3 blk=0 from disk(s): 0
ORA-15081: failed to submit an I/O operation to a disk
NOTE: cache initiating offline of disk 0 group 1
WARNING: process 6999 initiating offline of disk 0.3915955111 (VOL1) with mask 0x3 in group 1
NOTE: PST update: grp = 1, dsk = 0, mode = 0x6
Wed Sep 12 13:13:30 CST 2012
ERROR: too many offline disks in PST (grp 1)
Wed Sep 12 13:13:30 CST 2012
ERROR: PST-initiated MANDATORY DISMOUNT of group DATA
Wed Sep 12 13:13:30 CST 2012
WARNING: Disk 0 in group 1 in mode: 0x7,state: 0x2 was taken offline
Wed Sep 12 13:13:30 CST 2012
NOTE: halting all I/Os to diskgroup DATA
NOTE: active pin found: 0x0x65faf748
Wed Sep 12 13:13:30 CST 2012
Abort recovery for domain 1
Wed Sep 12 13:13:30 CST 2012
NOTE: cache dismounting group 1/0xB8984B57 (DATA)
Wed Sep 12 13:13:31 CST 2012
kjbdomdet send to node 0
detach from dom 1, sending detach message to node 0
Wed Sep 12 13:13:31 CST 2012
Dirty detach reconfiguration started (old inc 16, new inc 16)
List of nodes:
0 1
Global Resource Directory partially frozen for dirty detach
* dirty detach - domain 1 invalid = TRUE
99 GCS resources traversed, 0 cancelled
6104 GCS resources on freelist, 6124 on array, 6124 allocated
Dirty Detach Reconfiguration complete
Wed Sep 12 13:13:31 CST 2012
freeing rdom 1
Wed Sep 12 13:13:31 CST 2012
WARNING: dirty detached from domain 1
Wed Sep 12 13:13:31 CST 2012
SUCCESS: diskgroup DATA was dismounted
Wed Sep 12 13:13:31 CST 2012
WARNING: PST-initiated MANDATORY DISMOUNT of group DATA not performed - group not mounted
Wed Sep 12 13:13:31 CST 2012
Errors in file /u01/app/oracle/admin/+ASM/bdump/+asm2_b001_17967.trc:
ORA-15001: diskgroup "DATA" does not exist or is not mounted
Wed Sep 12 14:53:43 CST 2012
Shutting down instance (abort)
License high water mark = 4
Instance terminated by USER, pid = 25013
Wed Sep 12 14:56:02 CST 2012
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
Interface type 1 eth1 10.185.3.0 configured from OCR for use as a cluster interconnect
Interface type 1 eth0 10.185.3.0 configured from OCR for use as a public interface
Picked latch-free SCN scheme 3
Using LOG_ARCHIVE_DEST_1 parameter default value as /u01/app/oracle/db_1/dbs/arch
Autotune of undo retention is turned off.
LICENSE_MAX_USERS = 0
SYS auditing is disabled
ksdpec: called for event 13740 prior to event group initialization
Starting up ORACLE RDBMS Version: 10.2.0.5.0.
System parameters with non-default values:
large_pool_size = 12582912
instance_type = asm
cluster_database = TRUE
instance_number = 2
remote_login_passwordfile= EXCLUSIVE
background_dump_dest = /u01/app/oracle/admin/+ASM/bdump
user_dump_dest = /u01/app/oracle/admin/+ASM/udump
core_dump_dest = /u01/app/oracle/admin/+ASM/cdump
asm_diskstring = ORCL:VOL*
asm_diskgroups = DATA
Cluster communication is configured to use the following interface(s) for this instance
10.185.3.79
Wed Sep 12 14:56:03 CST 2012
cluster interconnect IPC version:Oracle UDP/IP (generic)
IPC Vendor 1 proto 2
LMON started with pid=5, OS id=7171
PSP0 started with pid=4, OS id=7169
DIAG started with pid=3, OS id=7167
PMON started with pid=2, OS id=7165
LMD0 started with pid=6, OS id=7173
LMS0 started with pid=7, OS id=7175
MMAN started with pid=8, OS id=7179
DBW0 started with pid=9, OS id=7181
LGWR started with pid=10, OS id=7183
CKPT started with pid=11, OS id=7185
SMON started with pid=12, OS id=7187
RBAL started with pid=13, OS id=7189
GMON started with pid=14, OS id=7192
Wed Sep 12 14:56:04 CST 2012
lmon registered with NM - instance id 2 (internal mem no 1)
Wed Sep 12 14:56:04 CST 2012
Reconfiguration started (old inc 0, new inc 1)
ASM instance
List of nodes:
1
Global Resource Directory frozen
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Wed Sep 12 14:56:04 CST 2012
LMS 0: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Wed Sep 12 14:56:04 CST 2012
LMS 0: 0 GCS shadows traversed, 0 replayed
Wed Sep 12 14:56:04 CST 2012
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
LCK0 started with pid=15, OS id=7198
Wed Sep 12 14:56:05 CST 2012
SQL> ALTER DISKGROUP ALL MOUNT
Wed Sep 12 14:56:05 CST 2012
NOTE: cache registered group DATA number=1 incarn=0xdd684d18
Wed Sep 12 14:56:05 CST 2012
Loaded ASM Library - Generic Linux, version 2.0.4 (KABI_V2) library for asmlib interface
Wed Sep 12 14:56:05 CST 2012
NOTE: Hbeat: instance first (grp 1)
Wed Sep 12 14:56:10 CST 2012
NOTE: start heartbeating (grp 1)
NOTE: cache opening disk 0 of grp 1: VOL1 label:VOL1
Wed Sep 12 14:56:10 CST 2012
NOTE: F1X0 found on disk 0 fcn 0.0
NOTE: cache mounting (first) group 1/0xDD684D18 (DATA)
* allocate domain 1, invalid = TRUE
Wed Sep 12 14:56:10 CST 2012
NOTE: attached to recovery domain 1
Wed Sep 12 14:56:10 CST 2012
NOTE: starting recovery of thread=1 ckpt=40.10147 group=1
NOTE: starting recovery of thread=2 ckpt=40.1147 group=1
NOTE: advancing ckpt for thread=2 ckpt=40.1147
NOTE: advancing ckpt for thread=1 ckpt=40.10159
NOTE: cache recovered group 1 to fcn 0.227416
Wed Sep 12 14:56:10 CST 2012
NOTE: LGWR attempting to mount thread 1 for disk group 1
NOTE: LGWR mounted thread 1 for disk group 1
NOTE: opening chunk 1 at fcn 0.227416 ABA
NOTE: seq=41 blk=10160
Wed Sep 12 14:56:10 CST 2012
NOTE: cache mounting group 1/0xDD684D18 (DATA) succeeded
SUCCESS: diskgroup DATA was mounted
Wed Sep 12 14:56:13 CST 2012
NOTE: recovering COD for group 1/0xdd684d18 (DATA)
SUCCESS: completed COD recovery for group 1/0xdd684d18 (DATA)
Wed Sep 12 14:56:16 CST 2012
Starting background process ASMB
ASMB started with pid=17, OS id=7358
Wed Sep 12 14:56:26 CST 2012
NOTE: ASMB process exiting due to lack of ASM file activity for 9 seconds
Wed Sep 12 14:56:40 CST 2012
Reconfiguration started (old inc 1, new inc 2)
List of nodes:
0 1
Global Resource Directory frozen
Communication channels reestablished
* domain 1 valid = 1 according to instance 0
Wed Sep 12 14:56:40 CST 2012
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Wed Sep 12 14:56:40 CST 2012
LMS 0: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Wed Sep 12 14:56:40 CST 2012
LMS 0: 98 GCS shadows traversed, 0 replayed
Wed Sep 12 14:56:40 CST 2012
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
帖子经 suredandan编辑过