Hi all,
I run a small ATP on my LAB where i have
- 2x nodes RAC 11.2.0.2 & ASM (my OCR & Voting files are stored on ASM)
- 1 public interface <> eth0
- 1 private interface <> eth1
- 1 SCAN IP defined in the /etc/hosts file (i'm not using DNS or GNS)
The test i run was to shutdown the private interface (eth1) on node 1 and i saw that
1) all cluster services and cluster daemons on node 2 were killed and node 2 was evicted from the cluster by node 1
2) all new connections were redirected to the survived node
3) Oracle OHASD daemon was restarted on node 2 and tried to start the cluster services without success because private network between cluster nodes was down
Up to here everything worked as expected but once i turn on eth1 it took ~ 9 minutes for the CSSD to startup and bring all the components up & running.
The node2 alert logs showes
[ctssd(12949)]CRS-2402:The Cluster Time Synchronization Service aborted on host node2. Details at (:ctss_css_init1:) in /u01/oracle/installed/oracle_cluster-11.2.0.2-1/log/node2/ctssd/octssd.log.
2011-04-13 08:09:40.978
[ohasd(5058)]CRS-2765:Resource 'ora.cssd' has failed on server 'node2'.
2011-04-13 08:09:40.985
[/u01/oracle/installed/oracle_cluster-11.2.0.2-1/bin/oraagent.bin(5764)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u01/oracle/installed/oracle_cluster-11.2.0.2-1/log/node2/agent/ohasd/oraagent_oracle/oraagent_oracle.log";
2011-04-13 08:09:41.169
[ohasd(5058)]CRS-2765:Resource 'ora.asm' has failed on server 'node2'.
2011-04-13 08:09:50.337
[cssd(13103)]CRS-1713:CSSD daemon is started in clustered mode
2011-04-13 08:10:05.833
[cssd(13103)]CRS-1707:Lease acquisition for node node2 number 2 completed
2011-04-13 08:10:07.119
[cssd(13103)]CRS-1605:CSSD voting file is online: ORCL:CRS_DISK1_2G; details in /u01/oracle/installed/oracle_cluster-11.2.0.2-1/log/node2/cssd/ocssd.log.
2011-04-13 08:10:07.121
[cssd(13103)]CRS-1605:CSSD voting file is online: ORCL:CRS_DISK2_2G; details in /u01/oracle/installed/oracle_cluster-11.2.0.2-1/log/node2/cssd/ocssd.log.
2011-04-13 08:10:07.143
[cssd(13103)]CRS-1605:CSSD voting file is online: ORCL:CRS_DISK1_2G; details in /u01/oracle/installed/oracle_cluster-11.2.0.2-1/log/node2/cssd/ocssd.log.
2011-04-13 08:19:49.386
[/u01/oracle/installed/oracle_cluster-11.2.0.2-1/bin/cssdagent(13091)]CRS-5818:Aborted command 'start for resource: ora.cssd 1 1' for resource 'ora.cssd'. Details at (:CRSAGF00113:) {0:6:7} in /u01/oracle/installed/oracle_cluster-11.2.0.2-1/log/node2/agent/ohasd/oracssdagent_root/oracssdagent_root.log.
2011-04-13 08:19:49.387
[cssd(13103)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/oracle/installed/oracle_cluster-11.2.0.2-1/log/node2/cssd/ocssd.log
2011-04-13 08:19:49.387
[cssd(13103)]CRS-1603:CSSD on node node2 shutdown by user.
2011-04-13 08:19:54.501
[ohasd(5058)]CRS-2765:Resource 'ora.cssdmonitor' has failed on server 'node2'.
2011-04-13 08:19:57.723
[cssd(17068)]CRS-1713:CSSD daemon is started in clustered mode
2011-04-13 08:20:01.177
[ohasd(5058)]CRS-2765:Resource 'ora.diskmon' has failed on server 'node2'.
2011-04-13 08:20:13.167
[cssd(17068)]CRS-1707:Lease acquisition for node node2 number 2 completed
pay attention at the timestamp 08:10:07.143 & 08:19:49.386
The error in the oracssdagent_root.log is
2011-04-13 08:09:49.286: [CLSFRAME][3014212592] New Framework state: 2
2011-04-13 08:09:49.286: [CLSFRAME][3014212592] M2M is starting...
2011-04-13 08:09:49.288: [ CRSCOMM][3014212592] Ipc: Starting send thread
2011-04-13 08:09:49.288: [ CRSCOMM][1092061504] Ipc: sendWork thread started.
2011-04-13 08:09:49.289: [ CRSCOMM][1105643840] IpcC: IPC Client thread started listening
2011-04-13 08:09:49.289: [ CRSCOMM][1105643840] IpcC: Received member number of 10
2011-04-13 08:09:49.290: [CLSFRAME][3014212592] New IPC Member:{Relative|Node:0|Process:0|Type:2}:OHASD:node2
2011-04-13 08:09:49.290: [CLSFRAME][3014212592] New process connected to us ID:{Relative|Node:0|Process:0|Type:2} Info:OHASD:node2
2011-04-13 08:09:49.291: [CLSFRAME][3014212592] Tints initialized with nodeId: 0 procId: 10
2011-04-13 08:09:49.291: [CLSFRAME][3014212592] Starting thread model named: MultiThread
2011-04-13 08:09:49.292: [CLSFRAME][3014212592] Starting thread model named: TimerSharedTM
2011-04-13 08:09:49.293: [CLSFRAME][3014212592] New Framework state: 3
2011-04-13 08:09:49.293: [ AGFW][3014212592] Agent Framework started successfully
2011-04-13 08:09:49.293: [ AGFW][1116150080] {0:10:2} Agfw engine module has enabled...
2011-04-13 08:09:49.293: [CLSFRAME][1116150080] {0:10:2} Module Enabling is complete
2011-04-13 08:09:49.293: [CLSFRAME][1116150080] {0:10:2} New Framework state: 6
2011-04-13 08:09:49.294: [CLSFRAME][3014212592] M2M is now powered by a doWork() thread.
2011-04-13 08:09:49.294: [ AGFW][1116150080] {0:10:2} Agent is started with userid: root , expected user: root
2011-04-13 08:09:49.294: [ AGENT][1116150080] {0:10:2} Static Version 11.2.0.2.0
2011-04-13 08:09:49.294: [ AGFW][1116150080] {0:10:2} Agent sending message to PE: AGENT_HANDSHAKE[Proxy] ID 20484:11
2011-04-13 08:09:49.302: [ AGFW][1116150080] {0:10:2} Agent received the message: RESTYPE_ADD[ora.cssd.type] ID 8196:12358
2011-04-13 08:09:49.302: [ AGFW][1116150080] {0:10:2} Added new restype: ora.cssd.type
2011-04-13 08:09:49.303: [ AGFW][1116150080] {0:10:2} Agent sending last reply for: RESTYPE_ADD[ora.cssd.type] ID 8196:12358
2011-04-13 08:09:49.305: [ AGFW][1116150080] {0:10:2} Agent received the message: RESOURCE_ADD[ora.cssd 1 1] ID 4356:12359
2011-04-13 08:09:49.305: [ AGFW][1116150080] {0:10:2} Added new resource: ora.cssd 1 1 to the agfw
2011-04-13 08:09:49.306: [ AGFW][1116150080] {0:10:2} Agent sending last reply for: RESOURCE_ADD[ora.cssd 1 1] ID 4356:12359
2011-04-13 08:09:49.308: [ AGFW][1116150080] {0:6:7} Agent received the message: RESOURCE_START[ora.cssd 1 1] ID 4098:12360
2011-04-13 08:09:49.308: [ AGFW][1116150080] {0:6:7} Preparing START command for: ora.cssd 1 1
2011-04-13 08:09:49.308: [ AGFW][1116150080] {0:6:7} ora.cssd 1 1 state changed from: UNKNOWN to: STARTING
2011-04-13 08:09:49.309: [ora.cssd][1114048832] {0:6:7} [start] clsncssd_cssdstart: Start action called
2011-04-13 08:09:49.309: [ora.cssd][1114048832] {0:6:7} [start] clsncssd_getattr: attr OMON_INITRATE, value 1000
2011-04-13 08:09:49.309: [ora.cssd][1114048832] {0:6:7} [start] clsncssd_getattr: attr OMON_POLLRATE, value 500
2011-04-13 08:09:49.309: [ora.cssd][1114048832] {0:6:7} [start] clsncssd_getattr: attr ORA_OPROCD_MODE, value
2011-04-13 08:09:49.310: [ora.cssd][1114048832] {0:6:7} [start] clsncssd_getattr: attr PROCD_TIMEOUT, value 1000
2011-04-13 08:09:49.310: [ora.cssd][1114048832] {0:6:7} [start] clsncssd_getattr: attr LOGGING_LEVEL, value 1
2011-04-13 08:09:49.310: [ora.cssd][1114048832] {0:6:7} [start] clsncssd_cssdstart: loglevels CSSD=2,GIPCNM=2,GIPCGM=2,GIPCCM=2,CLSF=0,SKGFD=0,GPNP=1,OLR=0
2011-04-13 08:09:49.313: [ora.cssd][1114048832] {0:6:7} [start] clsncssd_cssdstart: START action for resource /u01/oracle/installed/oracle_cluster-11.2.0.2-1/bin/ocssd: SUCCESS
2011-04-13 08:09:49.313: [ora.cssd][1114048832] {0:6:7} [start] clsncssd_waitomon: start waiting
2011-04-13 08:09:49.313: [ CSSCLNT][1098377536]clsssInitNative: Init for agent
2011-04-13 08:09:50.317: [ CSSCLNT][1098377536]clsssInitNative: Init for agent
2011-04-13 08:09:51.319: [ CSSCLNT][1098377536]clsssInitNative: Init for agent
2011-04-13 08:09:51.322: [ CSSCLNT][1098377536]clssnsqueryfatal: css is fatal = 0
2011-04-13 08:09:51.322: [ USRTHRD][1098377536] clsncssd_thrdspawn: spawn OPROCD succ
2011-04-13 08:09:51.322: [ USRTHRD][1098377536] clsncssd_thrdspawn: spawn POLLMSG succ
2011-04-13 08:09:51.323: [ USRTHRD][1099954496] clsnpollmsg_main: starting pollmsg thread
2011-04-13 08:09:51.323: [ USRTHRD][1107745088] clsnproc_main: timeout of procd cannot be 0, now we set to default 1000.
2011-04-13 08:09:51.323: [ USRTHRD][1117727040] clsnwork_main: starting worker thread
2011-04-13 08:09:51.323: [ USRTHRD][1098377536] clsncssd_thrdspawn: spawn WORKER succ
2011-04-13 08:09:51.323: [ USRTHRD][1107745088] clsnproc_main: starting oprocd
2011-04-13 08:09:51.323: [ USRTHRD][1098377536] clsncssd_thrdspawn: spawn KILL succ
2011-04-13 08:10:07.151: [ USRTHRD][1098377536] clsnomon_init: css init done, nodenum 2
2011-04-13 08:10:07.151: [ USRTHRD][1098377536] clsnomon_WaitToRegister: waiting for first reconfiguration and kgzf initialization
2011-04-13 08:19:49.385: [CLSFRAME][3014212592] TM [MultiThread] is changing desired thread # to 3. Current # is 2
2011-04-13 08:19:49.387: [ AGFW][1111947584] {0:6:7} Created alert : (:CRSAGF00113:) : Aborting the command: start for resource: ora.cssd 1 1
2011-04-13 08:19:49.387: [ora.cssd][1111947584] {0:6:7} [start] clsncssd_cssdabort: sending shutdown abort to CSS with new ctx
2011-04-13 08:19:49.387: [ CSSCLNT][1098377536]clsssRecvMsg: wrong type request (0) on 0xc9 ret 0
2011-04-13 08:19:49.387: [ CSSCLNT][1098377536]clssnskgzfdone: RPC failed rc 1
2011-04-13 08:19:49.387: [ USRTHRD][1098377536] clsnomon_WaitToRegister: exadata initialization completed with rc=1
2011-04-13 08:19:49.387: [ USRTHRD][1098377536] clsnomon_init: problems in the CSS to allow OMON registration 2
2011-04-13 08:19:49.387: [ USRTHRD][1098377536] clsnomon_cleanup: to exit status = 2
2011-04-13 08:19:49.387: [ USRTHRD][1098377536] clsnomon_cleanup: failure, sending shutdown immediate to CSS
2011-04-13 08:19:49.387: [ USRTHRD][1098377536] CHECK action is in progress, Rejecting the check action requested by entry point for ora.cssd
2011-04-13 08:19:49.426: [ AGFW][2008402928] Starting the agent: /u01/oracle/installed/oracle_cluster-11.2.0.2-1/log/node2/agent/ohasd/oracssdagent_root/
2011-04-13 08:19:49.426: [ AGENT][2008402928] Agent framework initialized, Process Id = 17013
2011-04-13 08:19:49.426: [ USRTHRD][2008402928] to enter agent main
2011-04-13 08:19:49.426: [ USRTHRD][2008402928] clsscssd_main: New soft limit for stack size is 1572864, hard limit is 4294967295
2011-04-13 08:19:49.434: [ USRTHRD][2008402928] clsncssd_main: setting priority to 4
2011-04-13 08:19:49.434: [ USRTHRD][2008402928] *** Agent Framework Started ***
Do you have any idea why it took so long to bring all the components up & running?
Thanks a lot!!
G