Hi everyone,
this morning all the switches in our server room rebooted causing all the RAC servers to restart.
After this none of them would start successfully.
Oracle 11.2.0.1 on RHEL6
Here are some log info:
--------
crsd.log
--------
2011-05-20 12:13:10.782: [ CSSCLNT][2146903840]clssscConnect: gipc request failed with 29 (0x16)
2011-05-20 12:13:10.782: [ CSSCLNT][2146903840]clsssInitNative: connect failed, rc 29
2011-05-20 12:13:10.783: [ CRSRTI][2146903840] CSS is not ready. Received status 3 from CSS. Waiting for good status ..
----------------
alertstgrac1.log
----------------
[ohasd(2303)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'stgrac1'.
2011-05-20 12:11:15.452
[cssd(2661)]CRS-1713:CSSD daemon is started in clustered mode
2011-05-20 12:11:15.739
[cssd(2661)]CRS-1603:CSSD on node stgrac1 shutdown by user.
2011-05-20 12:12:14.033
[/u01/app/11.2.0/grid/bin/orarootagent.bin(2563)]CRS-5818:Aborted command 'start for resource: ora.diskmon 1 1' for resource 'ora.diskmon'. Details at (:CRSAGF00113:) in /u01/app/11.2.0/grid/log/stgrac1/agent/ohasd/orarootagent_root/orarootagent_root.log.
2011-05-20 12:12:18.039
[ohasd(2303)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.diskmon'. Details at (:CRSPE00111:) in /u01/app/11.2.0/grid/log/stgrac1/ohasd/ohasd.log
---------------------
orarootagent_root.log
---------------------
2011-05-20 12:12:23.162: [ora.diskmon][2684352256] [clean] execCmd ret = 0
2011-05-20 12:12:23.162: [ora.diskmon][2684352256] [clean] DiskmonAgent::clean } nopipe
2011-05-20 12:12:23.163: [ora.diskmon][2684352256] [clean] clsn_agent::clean }
2011-05-20 12:12:23.163: [ AGFW][2684352256] Command: clean for resource: ora.diskmon 1 1 completed with status: SUCCESS
2011-05-20 12:12:23.163: [ AGFW][2684352256] Executing command: check for resource: ora.diskmon 1 1
2011-05-20 12:12:23.164: [ AGFW][3066025728] Agent sending reply for: RESOURCE_CLEAN[ora.diskmon 1 1] ID 4100:826
2011-05-20 12:12:23.164: [ora.diskmon][2684352256] [check] DiskmonAgent::check {
2011-05-20 12:12:23.164: [ora.diskmon][2684352256] [check] DiskmonAgent::connect {
2011-05-20 12:12:23.165: [ora.diskmon][2684352256] [check] DiskmonAgent::connect: skgznp_connect failed with error 56815 and the timeout expired
2011-05-20 12:12:23.165: [ora.diskmon][2684352256] [check] (null) category: 56815, operation: connect, loc: skgznpcon6, OS error: 2, other:
2011-05-20 12:12:23.165: [ora.diskmon][2684352256] [check] DiskmonAgent::connect } error
2011-05-20 12:12:23.165: [ora.diskmon][2684352256] [check] DiskmonAgent::check } 2
2011-05-20 12:12:23.165: [ AGFW][2684352256] check for resource: ora.diskmon 1 1 completed with status: PLANNED_OFFLINE
2011-05-20 12:12:23.165: [ AGFW][3066025728] ora.diskmon 1 1 state changed from: CLEANING to: PLANNED_OFFLINE
2011-05-20 12:12:23.166: [ AGFW][3066025728] Agent sending last reply for: RESOURCE_CLEAN[ora.diskmon 1 1] ID 4100:826
---------
ohasd.log
---------
2011-05-20 12:12:23.167: [ AGFW][2053089024] Agfw Proxy Server sending the reply to PE for message:RESOURCE_CLEAN[ora.diskmon 1 1] ID 4100:825
2011-05-20 12:12:23.167: [ CRSPE][2042582784] Received reply to action [Clean] message ID: 825
2011-05-20 12:12:23.168: [ AGFW][2053089024] Received the reply to the message: RESOURCE_CLEAN[ora.diskmon 1 1] ID 4100:826 from the agent /u01/app/11.2.0/grid/bin/orarootagent_root
2011-05-20 12:12:23.168: [ AGFW][2053089024] Agfw Proxy Server sending the last reply to PE for message:RESOURCE_CLEAN[ora.diskmon 1 1] ID 4100:825
2011-05-20 12:12:23.169: [ CRSPE][2042582784] Received reply to action [Clean] message ID: 825
2011-05-20 12:12:23.169: [ CRSPE][2042582784] RI [ora.diskmon 1 1] new external state [OFFLINE] old value: [UNKNOWN] label = []
2011-05-20 12:12:23.169: [ CRSPE][2042582784] CRS-2681: Clean of 'ora.diskmon' on 'stgrac1' succeeded
2011-05-20 12:12:23.169: [ CRSPE][2042582784] Sequencer for [ora.diskmon 1 1] has completed with error: CRS-0215: Could not start resource 'ora.diskmon'.
./crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME TARGET STATE SERVER STATE_DETAILS
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
1 ONLINE OFFLINE
ora.crsd
1 ONLINE INTERMEDIATE stgrac1
ora.cssd
1 ONLINE OFFLINE
ora.cssdmonitor
1 ONLINE ONLINE stgrac1
ora.ctssd
1 ONLINE OFFLINE
ora.diskmon
1 ONLINE OFFLINE
ora.evmd
1 ONLINE ONLINE stgrac1
ora.gipcd
1 ONLINE ONLINE stgrac1
ora.gpnpd
1 ONLINE ONLINE stgrac1
ora.mdnsd
1 ONLINE ONLINE stgrac1
This errors look the same for 2 different RAC clusters(2 nodes per cluster).
Can anybody please give me some ideas on what I can check further?