Skip to Main Content

Oracle Database Discussions

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

ASM diskgroups disappeared...

alpApr 6 2009 — edited Apr 6 2009
Hello.
I have awful situation. Today after update of VMware ESX server our RAC nodes failed. ASM failed to mount diskgroups (I've checked disks permissions, they are OK) with following messages:

Mon Apr 6 15:11:41 2009
lmon registered with NM - instance id 2 (internal mem no 1)
Mon Apr 6 15:11:42 2009
Reconfiguration started (old inc 0, new inc 24)
ASM instance
List of nodes:
0 1
Global Resource Directory frozen
Communication channels reestablished
* allocate domain 1, invalid = TRUE
* domain 1 valid = 1 according to instance 0
* allocate domain 2, invalid = TRUE
* domain 2 valid = 1 according to instance 0
* allocate domain 3, invalid = TRUE
* domain 3 valid = 1 according to instance 0
Mon Apr 6 15:11:42 2009
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Mon Apr 6 15:11:42 2009
LMS 0: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Mon Apr 6 15:11:42 2009
LMS 0: 0 GCS shadows traversed, 0 replayed
Mon Apr 6 15:11:42 2009
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Reconfiguration complete
LCK0 started with pid=15, OS id=4303
Mon Apr 6 15:11:43 2009
SQL> ALTER DISKGROUP ALL MOUNT
Mon Apr 6 15:11:43 2009
NOTE: cache registered group DATA number=1 incarn=0x43283dc2
NOTE: cache registered group FRA number=2 incarn=0x43583dc3
NOTE: cache registered group LOGS number=3 incarn=0x43583dc4
Mon Apr 6 15:11:43 2009
ERROR: no PST quorum in group 1: required 2, found 0
Mon Apr 6 15:11:43 2009
NOTE: cache dismounting group 1/0x43283DC2 (DATA)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup DATA was not mounted
Mon Apr 6 15:11:43 2009
ERROR: no PST quorum in group 2: required 2, found 0
Mon Apr 6 15:11:43 2009
NOTE: cache dismounting group 2/0x43583DC3 (FRA)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup FRA was not mounted
Mon Apr 6 15:11:43 2009
ERROR: no PST quorum in group 3: required 2, found 0
Mon Apr 6 15:11:43 2009
NOTE: cache dismounting group 3/0x43583DC4 (LOGS)
NOTE: dbwr not being msg'd to dismount
ERROR: diskgroup LOGS was not mounted

After reboot of each node it seems asm instances forgot about asm diskgroups at all.
select * from v$asm_diskgroup gave nothing.
I've tried to recreate LOGS diskgroup with the same disk - it became worse, data was lost (it is only REDO, I hope I can ressurect FRA discgroup and recover database...). v$asm_disk shows the following:
SQL> select distinct path,group_number,INCARNATION,MOUNT_STATUS,HEADER_STATUS,MODE_STATUS from V$ASM_DISK;

PATH GROUP_NUMBER INCARNATION MOUNT_S HEADER_STATU MODE_ST

/dev/sdc 0 0 CLOSED PROVISIONED ONLINE //Former DATA Diskgroup member

/dev/sdd 1 3915939028 CACHED MEMBER ONLINE // my attempt to recreate diskgroup

/dev/sdf 0 0 CLOSED PROVISIONED ONLINE // Former FRA Diskgroup member

kfed says the following on /dev/sdf:

kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj: 2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check: 3801353830 ; 0x00c: 0xe2940e66
...
kfdhdb.driver.provstr: ORCLDISK ; 0x000: length=8
...
kfdhdb.compat: 168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum: 0 ; 0x024: 0x0000
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: FRA_0000 ; 0x028: length=8
kfdhdb.grpname: FRA ; 0x048: length=3
kfdhdb.fgname: FRA_0000 ; 0x068: length=8
kfdhdb.capname: ; 0x088: length=0
....

kfed says the following on /dev/sdc:
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 0 ; 0x004: T=0 NUMB=0x0
kfbh.block.obj: 2147483648 ; 0x008: TYPE=0x8 NUMB=0x0
kfbh.check: 2743336053 ; 0x00c: 0xa383fc75
...
kfdhdb.compat: 168820736 ; 0x020: 0x0a100000
kfdhdb.dsknum: 0 ; 0x024: 0x0000
kfdhdb.grptyp: 1 ; 0x026: KFDGTP_EXTERNAL
kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER
kfdhdb.dskname: DATA_0000 ; 0x028: length=9
kfdhdb.grpname: DATA ; 0x048: length=4
kfdhdb.fgname: DATA_0000 ; 0x068: length=9
kfdhdb.capname: ; 0x088: length=0
...

We don't use asmlib.
The only good thing is that this database hasn't been entered into production state. However, a lot of work was done there and I have to recover it...
Backups were on FRA diskgroup...

P.S. I know that I need to contact Oracle Support, but our chiefs didn't want to spend money on Oracle support (so we even don't have Metalink) ... I know, it's silly, but its so...
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on May 4 2009
Added on Apr 6 2009
1 comment
657 views