Node eviction problem !! (Urgent Prod)
598389Sep 15 2008 — edited Sep 25 2008Hi there
I am facing Node eviction problem in two node cluster ( Oracle 10.2.0.3 ) on AIX 5L. Node 2 gets evicted frequently .. following are the excerpt from cssd.log from Node1. I have some questions ..
1. "clssnmReadDskHeartbeat: node(2) is down" Does it mean node eviction happened due to disk timeout ?
2. Memory consumption is always 95% on Node1 , Can this effect to node eviction ? if yes then why node 2 not 1 ?
3. I have asked my sys admin about memory, he said there was a problem because one interconnect was not connected that is why node was getting evicted.. But even after putting redundant cable back node is keep evicting.. I checked only one interconnect is in use. Is there any way we can use two interconnect (two Private IPs) for failover ? (REDUNDANT INTERCONNECT)
Any input would be highly appreciated. Many thanks
[ CSSD]2008-09-15 17:06:25.360 [2058] >TRACE: clssgmClientConnectMsg: Connect from con(11259ec30) proc(1125982d0) pid()
proto(10:2:1:1)
[ CSSD]2008-09-15 17:06:27.307 [1030] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(1) wrtcnt(1) LATS(1386516915
) Disk lastSeqNo(1)
[ CSSD]2008-09-15 17:06:28.501 [1801] >TRACE: clssnmConnComplete: connected to node 2 (con 11259ec30), state 1 birth 0,
unique 1221494782/1221494782 prevConuni(0)
[ CSSD]2008-09-15 17:06:28.776 [3600] >TRACE: clssnmDoSyncUpdate: Initiating sync 32
[ CSSD]2008-09-15 17:06:28.776 [3600] >TRACE: clssnmDoSyncUpdate: diskTimeout set to (27000)ms
[ CSSD]2008-09-15 17:06:28.776 [3600] >TRACE: clssnmSetupAckWait: Ack message type (11)
[ CSSD]2008-09-15 17:06:28.776 [3600] >TRACE: clssnmSetupAckWait: node(1) is ALIVE
[ CSSD]2008-09-15 17:06:28.776 [3600] >TRACE: clssnmSetupAckWait: node(2) is ALIVE
[ CSSD]2008-09-15 17:06:28.776 [3600] >TRACE: clssnmSendSync: syncSeqNo(32)
[ CSSD]2008-09-15 17:06:28.777 [3600] >TRACE: clssnmWaitForAcks: Ack message type(11), ackCount(2)
[ CSSD]2008-09-15 17:06:28.777 [1801] >TRACE: clssnmHandleSync: Acknowledging sync: src[1] srcName[kdb1-mp1-30] seq[93]
sync[32]
[ CSSD]2008-09-15 17:06:28.777 [1801] >TRACE: clssnmHandleSync: diskTimeout set to (27000)ms
[ CSSD]2008-09-15 17:06:28.777 [1] >USER: NMEVENT_SUSPEND [00][00][00][02]
[ CSSD]2008-09-15 17:06:28.782 [3600] >TRACE: clssnmWaitForAcks: done, msg type(11)
[ CSSD]2008-09-15 17:06:28.782 [3600] >TRACE: clssnmDoSyncUpdate: node(2) is transitioning from joining state to active
state
[ CSSD]2008-09-15 17:06:28.782 [3600] >TRACE: clssnmSetupAckWait: Ack message type (13)
[ CSSD]2008-09-15 17:06:28.782 [3600] >TRACE: clssnmSetupAckWait: node(1) is ACTIVE
[ CSSD]2008-09-15 17:06:28.782 [3600] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
[ CSSD]2008-09-15 17:06:28.782 [3600] >TRACE: clssnmSendVote: syncSeqNo(32)
[ CSSD]2008-09-15 17:06:28.782 [3600] >TRACE: clssnmWaitForAcks: Ack message type(13), ackCount(2)
[ CSSD]2008-09-15 17:06:28.782 [1801] >TRACE: clssnmSendVoteInfo: node(1) syncSeqNo(32)
[ CSSD]2008-09-15 17:06:28.783 [3600] >TRACE: clssnmWaitForAcks: done, msg type(13)
[ CSSD]2008-09-15 17:06:28.783 [3600] >TRACE: clssnmCheckDskInfo: Checking disk info...
[ CSSD]2008-09-15 17:06:28.783 [3600] >TRACE: clssnmEvict: Start
[ CSSD]2008-09-15 17:06:28.783 [3600] >TRACE: clssnmWaitOnEvictions: Start
[ CSSD]2008-09-15 17:06:28.783 [3600] >TRACE: clssnmSetupAckWait: Ack message type (15)
[ CSSD]2008-09-15 17:06:28.783 [3600] >TRACE: clssnmSetupAckWait: node(1) is ACTIVE
[ CSSD]2008-09-15 17:06:28.783 [3600] >TRACE: clssnmSetupAckWait: node(2) is ACTIVE
[ CSSD]2008-09-15 17:06:28.783 [3600] >TRACE: clssnmSendUpdate: syncSeqNo(32)
[ CSSD]2008-09-15 17:06:28.784 [1801] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birt
h (0/0) (old/new)
[ CSSD]2008-09-15 17:06:28.784 [1801] >TRACE: clssnmDeactivateNode: node 0 () left cluster
[ CSSD]2008-09-15 17:06:28.784 [1801] >TRACE: clssnmUpdateNodeState: node 1, state (3/3) unique (1209657900/1209657900)
prevConuni(0) birth (8/8) (old/new)
[ CSSD]2008-09-15 17:06:28.784 [1801] >TRACE: clssnmUpdateNodeState: node 2, state (2/2) unique (1221494782/1221494782)
prevConuni(0) birth (32/32) (old/new)
[ CSSD]2008-09-15 17:06:28.784 [1801] >USER: clssnmHandleUpdate: SYNC(32) from node(1) completed
[ CSSD]2008-09-15 17:06:28.784 [1801] >USER: clssnmHandleUpdate: NODE 1 (kdb1-mp1-30) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2008-09-15 17:06:28.784 [1801] >USER: clssnmHandleUpdate: NODE 2 (kdb2-mp1-30) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2008-09-15 17:06:28.784 [1801] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
[ CSSD]2008-09-15 17:06:28.784 [3600] >TRACE: clssnmWaitForAcks: Ack message type(15), ackCount(1)
[ CSSD]2008-09-15 17:06:28.784 [3881] >TRACE: clssgmReconfigThread: started for reconfig (32)
[ CSSD]2008-09-15 17:06:28.784 [3881] >USER: NMEVENT_RECONFIG [00][00][00][06]
[ CSSD]2008-09-15 17:06:28.785 [3600] >TRACE: clssnmWaitForAcks: done, msg type(15)
[ CSSD]2008-09-15 17:06:28.785 [3600] >TRACE: clssnmDoSyncUpdate: Sync Complete!