Problems Getting 11.2 RAC Node Online After Install (CRS Not Started)
734536Jun 24 2010 — edited Jun 24 2010I have a newly built 11.2 six node RAC running on RHEL VMWare VMs. The Grid Infrastructure is installed under user 'grid' and the Oracle DB Software is installed under user 'oracle' per the instructions for an [Advanced Installation Oracle Grid Infrastructure installation|http://download.oracle.com/docs/cd/E11882_01/install.112/e10812/prelinux.htm#BABFDGHJ] . I have also set up Role-allocated Groups, Users, and Paths per the instructions. I am using ASM.
The Grid Infrastructure and subsequent Database installation went well. However, when I rebooted one of the nodes as a test to ensure that it would rejoin the cluster I saw that it did not rejoin the cluster. I investigate this:
*[grid@dosfrdb06 ~]$ crsctl check crs*
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
So, CRS is not up. I look in the $OH/log/dosfrdb06/crsd/crsd.log and see entries like this:
+2010-06-24 17:27:02.019: [ OCRRAW][2598158928]proprioo: Failed to open [+DATA]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.+
+2010-06-24 17:27:02.019: [ OCRRAW][2598158928]proprioo: No OCR/OLR devices are usable+
+2010-06-24 17:27:02.019: [ OCRASM][2598158928]proprasmcl: asmhandle is NULL+
+2010-06-24 17:27:02.019: [ OCRRAW][2598158928]proprinit: Could not open raw device+
+2010-06-24 17:27:02.019: [ OCRASM][2598158928]proprasmcl: asmhandle is NULL+
+2010-06-24 17:27:02.019: [ OCRAPI][2598158928]a_init:16!: Backend init unsuccessful : [26]+
+2010-06-24 17:27:02.020: [ CRSOCR][2598158928] OCR context init failure. Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge+
+ORA-15077: could not locate ASM instance serving a required diskgroup+
+2010-06-24 17:27:02.020: [ CRSD][2598158928][PANIC] CRSD exiting: Could not init OCR, code: 26+
+2010-06-24 17:27:02.020: [ CRSD][2598158928] Done.+
and I recall reading that in 11.2 ASM starts before crsd.bin, and brings up the diskgroup automatically if it contains the OCR. I believe my ASM does contain the OCR.
So, then the question becomes, why didn't ASM start? I look in the alert log for the ASM instance and see:
Starting ORACLE instance (normal)
Errors in file /u01/app/grid/diag/asm/+asm/+ASM6/trace/+ASM6_ora_5926.trc:+
ORA-27154: post/wait create failed
ORA-27300: OS system dependent operation:semget failed with status: 28
ORA-27301: OS failure message: No space left on device+
ORA-27302: failure occurred at: sskgpsemsper
Thu Jun 24 16:57:19 2010
Here's +ASM6_ora_5926.trc:
Trace file /u01/app/grid/diag/asm/asm/+ASM6/trace/+ASM6_ora_5926.trc+
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
ORACLE_HOME = /u01/app/11.2.0/grid
System name: Linux
Node name: dosfrdb06.oracle.dos
Release: 2.6.18-164.el5
Version: #1 SMP Tue Aug 18 15:51:48 EDT 2009
Machine: x86_64
Instance name: ASM6+
Redo thread mounted by this instance: 0 <none>
Oracle process number: 0
Unix process pid: 5926, image: oracle@dosfrdb06.oracle.dos
*** 2010-06-24 16:49:18.896
Number of resource hash buckets is 1170
kewmnfy_1: gid=0, mxrwm=5, tsize=2960
kewmtotalchbsize_1(gid=0): mxent=1118,maxbuc=3,total_1=0
GetCHBSize: gid=0, dtype=0, mxbuc=3, mxrwm=3, mxent=1118, size=40248, kewmtotalchbsize_2(gid=0): (0, 40248)
GetCHBSize: gid=0, dtype=1, mxbuc=3, mxrwm=2, mxent=1118, size=53664, kewmtotalchbsize_2(gid=0): (1, 93912)
kewmtotalchbsize_3(gid=0): ESQBufsize=0, Total=93912
kewmtotalchbsize_4(gid=0): TBufsize=12, Final_total=93936
kewmnfy_2: gid=0, tsize=96896
kewmnfy_3: gid=0, mxrwm=5, RRMSize=16, tsize=96976
kewmnfy_4: gid=0, mxdrm=5, DRMSize=16, tsize=97056
kewmnfy_5: gid=0, mxTMStat=0, TMsiz=4, tsize=97056
kewmnfy_1: gid=1, mxrwm=5, tsize=97056
kewmtotalchbsize_1(gid=1): mxent=13,maxbuc=62,total_1=0
GetCHBSize: gid=1, dtype=0, mxbuc=62, mxrwm=2, mxent=13, size=6448, kewmtotalchbsize_2(gid=1): (0, 6448)
GetCHBSize: gid=1, dtype=1, mxbuc=62, mxrwm=3, mxent=13, size=19344, kewmtotalchbsize_2(gid=1): (1, 25792)
kewmtotalchbsize_3(gid=1): ESQBufsize=0, Total=25792
kewmtotalchbsize_4(gid=1): TBufsize=248, Final_total=26288
kewmnfy_2: gid=1, tsize=123344
kewmnfy_3: gid=1, mxrwm=5, RRMSize=16, tsize=123424
kewmnfy_4: gid=1, mxdrm=6, DRMSize=16, tsize=123520
kewmnfy_5: gid=1, mxTMStat=2, TMsiz=4, tsize=123528
kewmnfy_1: gid=11, mxrwm=10, tsize=123528
kewmtotalchbsize_1(gid=11): mxent=14,maxbuc=62,total_1=0
GetCHBSize: gid=11, dtype=0, mxbuc=62, mxrwm=0, mxent=14, size=0, kewmtotalchbsize_2(gid=11): (0, 0)
GetCHBSize: gid=11, dtype=1, mxbuc=62, mxrwm=10, mxent=14, size=69440, kewmtotalchbsize_2(gid=11): (1, 69440)
kewmtotalchbsize_3(gid=11): ESQBufsize=0, Total=69440
kewmtotalchbsize_4(gid=11): TBufsize=248, Final_total=69936
kewmnfy_2: gid=11, tsize=193464
kewmnfy_3: gid=11, mxrwm=10, RRMSize=16, tsize=193624
kewmnfy_4: gid=11, mxdrm=10, DRMSize=16, tsize=193784
kewmnfy_5: gid=11, mxTMStat=0, TMsiz=4, tsize=193784
kewmnfy_1: gid=2, mxrwm=110, tsize=193784
kewmtotalchbsize_1(gid=2): mxent=1,maxbuc=62,total_1=0
GetCHBSize: gid=2, dtype=0, mxbuc=62, mxrwm=89, mxent=1, size=22072, kewmtotalchbsize_2(gid=2): (0, 22072)
GetCHBSize: gid=2, dtype=1, mxbuc=62, mxrwm=21, mxent=1, size=10416, kewmtotalchbsize_2(gid=2): (1, 32488)
kewmtotalchbsize_3(gid=2): ESQBufsize=0, Total=32488
kewmtotalchbsize_4(gid=2): TBufsize=248, Final_total=32984
kewmnfy_2: gid=2, tsize=226768
kewmnfy_3: gid=2, mxrwm=110, RRMSize=16, tsize=228528
kewmnfy_4: gid=2, mxdrm=160, DRMSize=16, tsize=231088
kewmnfy_5: gid=2, mxTMStat=4, TMsiz=4, tsize=231104
kewmnfy_1: gid=3, mxrwm=110, tsize=231104
kewmtotalchbsize_1(gid=3): mxent=1,maxbuc=14,total_1=0
GetCHBSize: gid=3, dtype=0, mxbuc=14, mxrwm=89, mxent=1, size=4984, kewmtotalchbsize_2(gid=3): (0, 4984)
GetCHBSize: gid=3, dtype=1, mxbuc=14, mxrwm=21, mxent=1, size=2352, kewmtotalchbsize_2(gid=3): (1, 7336)
kewmtotalchbsize_3(gid=3): ESQBufsize=0, Total=7336
kewmtotalchbsize_4(gid=3): TBufsize=56, Final_total=7448
kewmnfy_2: gid=3, tsize=238552
kewmnfy_3: gid=3, mxrwm=110, RRMSize=16, tsize=240312
kewmnfy_4: gid=3, mxdrm=160, DRMSize=16, tsize=242872
kewmnfy_5: gid=3, mxTMStat=0, TMsiz=4, tsize=242872
kewmnfy_1: gid=4, mxrwm=1, tsize=242872
kewmtotalchbsize_1(gid=4): mxent=172,maxbuc=62,total_1=0
GetCHBSize: gid=4, dtype=0, mxbuc=62, mxrwm=1, mxent=172, size=42656, kewmtotalchbsize_2(gid=4): (0, 42656)
GetCHBSize: gid=4, dtype=1, mxbuc=62, mxrwm=0, mxent=172, size=0, kewmtotalchbsize_2(gid=4): (1, 42656)
kewmtotalchbsize_3(gid=4): ESQBufsize=42656, Total=85312
kewmtotalchbsize_4(gid=4): TBufsize=248, Final_total=85808
kewmnfy_2: gid=4, tsize=328680
kewmnfy_3: gid=4, mxrwm=1, RRMSize=16, tsize=328696
kewmnfy_4: gid=4, mxdrm=1, DRMSize=16, tsize=328712
kewmnfy_5: gid=4, mxTMStat=0, TMsiz=4, tsize=328712
kewmnfy_1: gid=5, mxrwm=10, tsize=328712
kewmtotalchbsize_1(gid=5): mxent=172,maxbuc=3,total_1=0
GetCHBSize: gid=5, dtype=0, mxbuc=3, mxrwm=9, mxent=172, size=18576, kewmtotalchbsize_2(gid=5): (0, 18576)
GetCHBSize: gid=5, dtype=1, mxbuc=3, mxrwm=1, mxent=172, size=4128, kewmtotalchbsize_2(gid=5): (1, 22704)
kewmtotalchbsize_3(gid=5): ESQBufsize=2064, Total=24768
kewmtotalchbsize_4(gid=5): TBufsize=12, Final_total=24792
kewmnfy_2: gid=5, tsize=353504
kewmnfy_3: gid=5, mxrwm=10, RRMSize=16, tsize=353664
kewmnfy_4: gid=5, mxdrm=10, DRMSize=16, tsize=353824
kewmnfy_5: gid=5, mxTMStat=2, TMsiz=4, tsize=353832
kewmnfy_1: gid=6, mxrwm=3, tsize=353832
kewmtotalchbsize_1(gid=6): mxent=118,maxbuc=62,total_1=0
GetCHBSize: gid=6, dtype=0, mxbuc=62, mxrwm=1, mxent=118, size=29264, kewmtotalchbsize_2(gid=6): (0, 29264)
GetCHBSize: gid=6, dtype=1, mxbuc=62, mxrwm=2, mxent=118, size=117056, kewmtotalchbsize_2(gid=6): (1, 146320)
kewmtotalchbsize_3(gid=6): ESQBufsize=29264, Total=175584
kewmtotalchbsize_4(gid=6): TBufsize=248, Final_total=176080
kewmnfy_2: gid=6, tsize=529912
kewmnfy_3: gid=6, mxrwm=3, RRMSize=16, tsize=529960
kewmnfy_4: gid=6, mxdrm=5, DRMSize=16, tsize=530040
kewmnfy_5: gid=6, mxTMStat=4, TMsiz=4, tsize=530056
kewmnfy_1: gid=7, mxrwm=6, tsize=530056
kewmtotalchbsize_1(gid=7): mxent=200,maxbuc=8,total_1=0
GetCHBSize: gid=7, dtype=0, mxbuc=8, mxrwm=4, mxent=200, size=25600, kewmtotalchbsize_2(gid=7): (0, 25600)
GetCHBSize: gid=7, dtype=1, mxbuc=8, mxrwm=2, mxent=200, size=25600, kewmtotalchbsize_2(gid=7): (1, 51200)
kewmtotalchbsize_3(gid=7): ESQBufsize=6400, Total=57600
kewmtotalchbsize_4(gid=7): TBufsize=32, Final_total=57664
kewmnfy_2: gid=7, tsize=587720
kewmnfy_3: gid=7, mxrwm=6, RRMSize=16, tsize=587816
kewmnfy_4: gid=7, mxdrm=6, DRMSize=16, tsize=587912
kewmnfy_5: gid=7, mxTMStat=0, TMsiz=4, tsize=587912
kewmnfy_1: gid=9, mxrwm=0, tsize=587912
kewmnfy_2: gid=9, tsize=587912
kewmnfy_3: gid=9, mxrwm=0, RRMSize=16, tsize=587912
kewmnfy_4: gid=9, mxdrm=2, DRMSize=16, tsize=587944
kewmnfy_5: gid=9, mxTMStat=0, TMsiz=4, tsize=587944
kewmnfy_1: gid=10, mxrwm=3, tsize=587944
kewmtotalchbsize_1(gid=10): mxent=118,maxbuc=26,total_1=0
GetCHBSize: gid=10, dtype=0, mxbuc=26, mxrwm=1, mxent=118, size=12272, kewmtotalchbsize_2(gid=10): (0, 12272)
GetCHBSize: gid=10, dtype=1, mxbuc=26, mxrwm=2, mxent=118, size=49088, kewmtotalchbsize_2(gid=10): (1, 61360)
kewmtotalchbsize_3(gid=10): ESQBufsize=12272, Total=73632
kewmtotalchbsize_4(gid=10): TBufsize=104, Final_total=73840
kewmnfy_2: gid=10, tsize=661784
kewmnfy_3: gid=10, mxrwm=3, RRMSize=16, tsize=661832
kewmnfy_4: gid=10, mxdrm=5, DRMSize=16, tsize=661912
kewmnfy_5: gid=10, mxTMStat=4, TMsiz=4, tsize=661928
kewmnfy_1: gid=12, mxrwm=9, tsize=661928
kewmtotalchbsize_1(gid=12): mxent=32,maxbuc=62,total_1=0
GetCHBSize: gid=12, dtype=0, mxbuc=62, mxrwm=0, mxent=32, size=0, kewmtotalchbsize_2(gid=12): (0, 0)
GetCHBSize: gid=12, dtype=1, mxbuc=62, mxrwm=9, mxent=32, size=142848, kewmtotalchbsize_2(gid=12): (1, 142848)
kewmtotalchbsize_3(gid=12): ESQBufsize=7936, Total=150784
kewmtotalchbsize_4(gid=12): TBufsize=248, Final_total=151280
kewmnfy_2: gid=12, tsize=813208
kewmnfy_3: gid=12, mxrwm=9, RRMSize=16, tsize=813352
kewmnfy_4: gid=12, mxdrm=9, DRMSize=16, tsize=813496
kewmnfy_5: gid=12, mxTMStat=0, TMsiz=4, tsize=813496
kewmnfy_1: gid=13, mxrwm=3, tsize=813496
kewmtotalchbsize_1(gid=13): mxent=60,maxbuc=62,total_1=0
GetCHBSize: gid=13, dtype=0, mxbuc=62, mxrwm=3, mxent=60, size=44640, kewmtotalchbsize_2(gid=13): (0, 44640)
GetCHBSize: gid=13, dtype=1, mxbuc=62, mxrwm=0, mxent=60, size=0, kewmtotalchbsize_2(gid=13): (1, 44640)
kewmtotalchbsize_3(gid=13): ESQBufsize=14880, Total=59520
kewmtotalchbsize_4(gid=13): TBufsize=248, Final_total=60016
kewmnfy_2: gid=13, tsize=873512
kewmnfy_3: gid=13, mxrwm=3, RRMSize=16, tsize=873560
kewmnfy_4: gid=13, mxdrm=3, DRMSize=16, tsize=873608
kewmnfy_5: gid=13, mxTMStat=0, TMsiz=4, tsize=873608
kewmnfy_1: gid=14, mxrwm=11, tsize=873608
kewmtotalchbsize_1(gid=14): mxent=50,maxbuc=26,total_1=0
GetCHBSize: gid=14, dtype=0, mxbuc=26, mxrwm=1, mxent=50, size=5200, kewmtotalchbsize_2(gid=14): (0, 5200)
GetCHBSize: gid=14, dtype=1, mxbuc=26, mxrwm=10, mxent=50, size=104000, kewmtotalchbsize_2(gid=14): (1, 109200)
kewmtotalchbsize_3(gid=14): ESQBufsize=5200, Total=114400
kewmtotalchbsize_4(gid=14): TBufsize=104, Final_total=114608
kewmnfy_2: gid=14, tsize=988216
kewmnfy_3: gid=14, mxrwm=11, RRMSize=16, tsize=988392
kewmnfy_4: gid=14, mxdrm=11, DRMSize=16, tsize=988568
kewmnfy_5: gid=14, mxTMStat=11, TMsiz=4, tsize=988612
kewmnfy_5a: mxent=172, BSesBufSize=688, tsize=989300
kewmnfy_6: TOTAL=989300
*** 2010-06-24 16:49:21.876
dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=0, mask=0x0)
----- Error Stack Dump -----
ORA-27154: post/wait create failed
ORA-27300: OS system dependent operation:semget failed with status: 28
ORA-27301: OS failure message: No space left on device
ORA-27302: failure occurred at: sskgpsemsper
So, now I'm stuck and waiting for advice. Which device has no space left? It's none of these:
*[grid@dosfrdb06 trace]$ df -h*
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VolGroup00-LogVol00
31G 7.8G 22G 27% /
/dev/sda1 99M 16M 79M 17% /boot
tmpfs 2.0G 0 2.0G 0% /dev/shm
/dev/mapper/VolGroup01-u01
40G 7.5G 30G 20% /u01
My fear is that if I reboot any of the other RAC nodes (one of the other 5) then this same thing will happen. Keep in mind the RAC DB is up on 5 of the 6 nodes.
Thanks for any advice.