Blue screen blues
Hi all,
I was hoping someone might be able to assist me with my Oracle RAC problem. I am running 10g r2 on Windows 2003. I patched the cluster with the opmd.exe patch after doing some research. From my research I found that the blue screen usually happens when the two nodes cant communicate for a period, so Oracle blue screens windows to preserve data and avoid a split brain scenario.
The Windows machines are actually VMs and live on a shared ESX server. There were network issues during last week, but have since been resolved, however the machines continue to blue screen.
Any advice or interpretation would be much appreciated. If you need any other logs please let me know.
Thanks very much.
Here is a dump of the relevant part of the logs,
2009-02-02 11:32:07.121: [ OCROSD]utgdv:11:could not read reg value ocrmirrorconfig_loc os error= The system could not find the environment option that was entered.
[ CSSD]2009-02-02 11:32:08.606 >USER: Oracle Database 10g CSS Release 10.2.0.3.0 Production Copyright 1996, 2004 Oracle. All rights reserved.
[ CSSD]2009-02-02 11:32:08.606 >USER: CSS daemon log for node qa852im-racb, number 2, in cluster crs
[ clsdmt]Listening to (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=61180))
[ CSSD]2009-02-02 11:32:08.871 [2616] >TRACE: clssscmain: local-only set to false
[ CSSD]2009-02-02 11:32:10.419 [2616] >TRACE: clssnmReadNodeInfo: added node 1 (qa852im-raca) to cluster
[ CSSD]2009-02-02 11:32:12.591 [2616] >TRACE: clssnmReadNodeInfo: added node 2 (qa852im-racb) to cluster
[ CSSD]2009-02-02 11:32:12.826 [2692] >TRACE: clssnm_skgxnmon: skgxn init failed
[ CSSD]2009-02-02 11:32:12.826 [2616] >TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor
[ CSSD]2009-02-02 11:32:13.373 [2616] >TRACE: clssnmNMInitialize: misscount set to (30), impending reconfig threshold set to (26000)
[ CSSD]2009-02-02 11:32:13.451 [2616] >TRACE: clssnmNMInitialize: diskShortTimeout set to (27000)ms
[ CSSD]2009-02-02 11:32:13.529 [2616] >TRACE: clssnmNMInitialize: diskLongTimeout set to (200000)ms
[ CSSD]2009-02-02 11:32:14.201 [2616] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0/\\.\votedsk1)
[ CSSD]2009-02-02 11:32:14.201 [2696] >TRACE: clssnmvDPT: spawned for disk 0 (\\.\votedsk1)
[ CSSD]2009-02-02 11:32:16.295 [2696] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0/\\.\votedsk1)
[ CSSD]2009-02-02 11:32:17.217 [2700] >TRACE: clssnmvKillBlockThread: spawned for disk 0 (\\.\votedsk1) initial sleep interval (1000)ms
[ CSSD]2009-02-02 11:32:17.248 [2696] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(2) wrtcnt(695) LATS(467359) Disk lastSeqNo(695)
[ CSSD]2009-02-02 11:32:18.171 [2616] >TRACE: clssnmFatalInit: fatal mode enabled
[ CSSD]2009-02-02 11:32:18.186 [2708] >TRACE: clssnmconnect: connecting to node 2, flags 0x0001, connector 1
[ CSSD]2009-02-02 11:32:18.202 [2708] >TRACE: clssnmClusterListener: Listening on (ADDRESS=(PROTOCOL=tcp)(HOST=qa852im-racb-priv)(PORT=49895))
[ CSSD]2009-02-02 11:32:18.202 [2708] >TRACE: clssnmconnect: connecting to node 0, flags 0x0000, connector 1
[ CSSD]2009-02-02 11:32:18.202 [2708] >TRACE: clssnmconnect: connecting to node 1, flags 0x0001, connector 0
[ CSSD]2009-02-02 11:32:18.217 [2724] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=61102))
[ CSSD]2009-02-02 11:32:18.217 [2736] >TRACE: clssgmPeerListener: Listening on (ADDRESS=(PROTOCOL=tcp)(DEV=1136)(HOST=10.0.0.20)(PORT=1045))
[ CSSD]2009-02-02 11:32:19.202 [2708] >TRACE: clsc_send_msg: (01348B50) NS err (12571, 12560), transport (533, 57, 0)
[ CSSD]2009-02-02 11:32:19.218 [2740] >TRACE: clssnmPollingThread: Connection complete
[ CSSD]2009-02-02 11:32:19.218 [2744] >TRACE: clssnmSendingThread: Connection complete
[ CSSD]2009-02-02 11:32:19.218 [2748] >TRACE: clssnmRcfgMgrThread: Connection complete
[ CSSD]2009-02-02 11:32:26.219 [2748] >TRACE: clssnmRcfgMgrThread: Local Join
[ CSSD]2009-02-02 11:32:26.219 [2748] >TRACE: clssnmDoSyncUpdate: Initiating sync 1
[ CSSD]2009-02-02 11:32:26.219 [2748] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (27000)ms
[ CSSD]2009-02-02 11:32:26.219 [2748] >TRACE: clssnmSetupAckWait: Ack message type (11)
[ CSSD]2009-02-02 11:32:26.219 [2748] >TRACE: clssnmSetupAckWait: node(2) is ALIVE
[ CSSD]2009-02-02 11:32:26.219 [2748] >TRACE: clssnmSendSync: syncSeqNo(1)
[ CSSD]2009-02-02 11:32:26.219 [2748] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(1)
[ CSSD]2009-02-02 11:32:26.219 [2708] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[qa852im-racb] seq[1] sync[1]
[ CSSD]2009-02-02 11:32:26.219 [2708] >TRACE: clssnmHandleSync: diskTimeout set to (27000)ms
[ CSSD]2009-02-02 11:32:26.219 [2748] >TRACE: clssnmWaitForAcks: done, msg type(11)
[ CSSD]2009-02-02 11:32:26.219 [2748] >TRACE: clssnmSetupAckWait: Ack message type (13)
[ CSSD]2009-02-02 11:32:26.219 [2748] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
[ CSSD]2009-02-02 11:32:26.219 [2748] >TRACE: clssnmSendVote: syncSeqNo(1)
[ CSSD]2009-02-02 11:32:26.219 [2748] >TRACE: clssnmWaitForAcks: Ack message type(13), ackCount(1)
[ CSSD]2009-02-02 11:32:26.219 [2708] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(1)
[ CSSD]2009-02-02 11:32:26.219 [2748] >TRACE: clssnmWaitForAcks: done, msg type(13)
[ CSSD]2009-02-02 11:32:26.219 [2748] >TRACE: clssnmCheckDskInfo: Checking disk info...
[ CSSD]2009-02-02 11:32:26.219 [2748] >TRACE: clssnmCheckDskInfo: diskTimeout set to (200000)ms
[ CSSD]2009-02-02 11:32:26.219 [2748] >TRACE: clssnmCheckDskInfo: node(1) timeout(8969) state_network(0) state_disk(3) misstime(476328)
[ CSSD]2009-02-02 11:32:26.313 [2616] >USER: NMEVENT_SUSPEND [00][00][00][00]