RAC节点宕机,疑似网络问题
947524Nov 11 2012 — edited Nov 13 2012生产系统,2节点RAC,RAC套件版本为11.2.0.4,操作系统为RHEL 5.4 x86_64。服务器为HP DL380G7。
最近每个月都出现同样的问题,心跳网络报错,然后节点2数据库宕掉,重启服务器方能解决问题。
以下为故障时间点错误日志(eth0为心跳网络所在网卡)
*##### messages #####*
Nov 9 15:01:39 mesdb2 kernel: bnx2: eth0 NIC Copper Link is Down
Nov 9 15:01:42 mesdb2 kernel: bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON
Nov 9 15:01:52 mesdb2 kernel: bnx2: eth0 NIC Copper Link is Down
Nov 9 15:01:55 mesdb2 kernel: bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON
Nov 9 15:02:02 mesdb2 avahi-daemon[9460]: Withdrawing address record for 192.168.21.47 on eth2.
Nov 9 15:02:02 mesdb2 avahi-daemon[9460]: Withdrawing address record for 192.168.21.7 on eth2.
Nov 9 15:02:05 mesdb2 kernel: bnx2: eth0 NIC Copper Link is Down
Nov 9 15:02:07 mesdb2 avahi-daemon[9460]: Withdrawing address record for 169.254.8.102 on eth0.
*##### altermesdb2.log #####*
2012-11-09 15:01:47.635
[cssd(10044)]CRS-1612:Network communication with node mesdb1 (1) missing for 50% of . Removal of this node from cluster in 14.540 seconds
2012-11-09 15:01:55.655
[cssd(10044)]CRS-1611:Network communication with node mesdb1 (1) missing for 75% of timeout interval. Removal of this node from cluster in 6.520 seconds
2012-11-09 15:01:59.663
[cssd(10044)]CRS-1610:Network communication with node mesdb1 (1) missing for 90% of timeout interval. Removal of this node from cluster in 2.510 seconds
2012-11-09 15:02:02.181
[cssd(10044)]CRS-1609:This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; details at (:CSSNM00008:) in /oracle/grid/11.2/log/mesdb2/cssd/ocssd.log.
2012-11-09 15:02:02.185
[cssd(10044)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /oracle/grid/11.2/log/mesdb2/cssd/ocssd.log
*##### ocssd.log ######*
2012-11-09 15:00:58.288: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:00:58.288: [ CSSD][1080224064]clssnmSendingThread: sent 4 status msgs to all nodes
2012-11-09 15:01:02.296: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:01:02.296: [ CSSD][1080224064]clssnmSendingThread: sent 4 status msgs to all nodes
2012-11-09 15:01:07.306: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:01:07.306: [ CSSD][1080224064]clssnmSendingThread: sent 5 status msgs to all nodes
2012-11-09 15:01:12.316: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:01:12.316: [ CSSD][1080224064]clssnmSendingThread: sent 5 status msgs to all nodes
2012-11-09 15:01:17.326: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:01:17.326: [ CSSD][1080224064]clssnmSendingThread: sent 5 status msgs to all nodes
2012-11-09 15:01:21.334: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:01:21.334: [ CSSD][1080224064]clssnmSendingThread: sent 4 status msgs to all nodes
2012-11-09 15:01:25.342: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:01:25.342: [ CSSD][1080224064]clssnmSendingThread: sent 4 status msgs to all nodes
2012-11-09 15:01:29.350: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:01:29.350: [ CSSD][1080224064]clssnmSendingThread: sent 4 status msgs to all nodes
2012-11-09 15:01:34.360: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:01:34.360: [ CSSD][1080224064]clssnmSendingThread: sent 5 status msgs to all nodes
2012-11-09 15:01:39.370: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:01:39.370: [ CSSD][1080224064]clssnmSendingThread: sent 5 status msgs to all nodes
2012-11-09 15:01:40.182: [GIPCHGEN][1098344768] gipchaInterfaceFail: marking interface failing 0x2aaab02c8980 { host '', haName 'CSS_scan', local (nil), ip '172.16.21.10', subnet '172.16.21.0', mask '255.255.255.0', numRef 1, numFail 0, flags 0x4d }
2012-11-09 15:01:40.190: [GIPCHGEN][1096767808] gipchaInterfaceFail: marking interface failing 0x11fbea70 { host 'mesdb1', haName 'CSS_scan', local 0x2aaab02c8980, ip '172.16.21.9:56687', subnet '172.16.21.0', mask '255.255.255.0', numRef 0, numFail 0, flags 0x6 }
2012-11-09 15:01:40.200: [GIPCHGEN][1096767808] gipchaInterfaceDisable: disabling interface 0x2aaab02c8980 { host '', haName 'CSS_scan', local (nil), ip '172.16.21.10', subnet '172.16.21.0', mask '255.255.255.0', numRef 0, numFail 1, flags 0x1cd }
2012-11-09 15:01:40.212: [GIPCHGEN][1096767808] gipchaInterfaceDisable: disabling interface 0x11fbea70 { host 'mesdb1', haName 'CSS_scan', local 0x2aaab02c8980, ip '172.16.21.9:56687', subnet '172.16.21.0', mask '255.255.255.0', numRef 0, numFail 0, flags 0x86 }
2012-11-09 15:01:40.213: [GIPCHALO][1096767808] gipchaLowerCleanInterfaces: performing cleanup of disabled interface 0x11fbea70 { host 'mesdb1', haName 'CSS_scan', local 0x2aaab02c8980, ip '172.16.21.9:56687', subnet '172.16.21.0', mask '255.255.255.0', numRef 0, numFail 0, flags 0xa6 }
2012-11-09 15:01:40.213: [GIPCHGEN][1096767808] gipchaInterfaceReset: resetting interface 0x11fbea70 { host 'mesdb1', haName 'CSS_scan', local 0x2aaab02c8980, ip '172.16.21.9:56687', subnet '172.16.21.0', mask '255.255.255.0', numRef 0, numFail 0, flags 0xa6 }
2012-11-09 15:01:40.222: [GIPCHDEM][1096767808] gipchaWorkerCleanInterface: performing cleanup of disabled interface 0x2aaab02c8980 { host '', haName 'CSS_scan', local (nil), ip '172.16.21.10', subnet '172.16.21.0', mask '255.255.255.0', numRef 0, numFail 0, flags 0x1ed }
2012-11-09 15:01:40.222: [GIPCHTHR][1096767808] gipchaWorkerUpdateInterface: created remote interface for node 'mesdb1', haName 'CSS_scan', inf 'udp://172.16.21.9:56687'
2012-11-09 15:01:40.222: [GIPCHALO][1096767808] gipchaLowerCleanInterfaces: forcing interface purge due to loss of all comms node 0x2aaab01fca50 { host 'mesdb1', haName 'CSS_scan', srcLuid 787fbb33-023e177a, dstLuid ec6439a0-44129130 numInf 1, contigSeq 1140862, lastAck 1140845, lastValidAck 1140862, sendSeq [1140846 : 1140855], createTime 16963874, flags 0x4808 }
2012-11-09 15:01:40.222: [GIPCHGEN][1096767808] gipchaInterfaceDisable: disabling interface 0x11fbea70 { host 'mesdb1', haName 'CSS_scan', local (nil), ip '172.16.21.9:56687', subnet '172.16.21.0', mask '255.255.255.0', numRef 0, numFail 0, flags 0x6 }
2012-11-09 15:01:40.232: [GIPCHALO][1096767808] gipchaLowerCleanInterfaces: performing cleanup of disabled interface 0x11fbea70 { host 'mesdb1', haName 'CSS_scan', local (nil), ip '172.16.21.9:56687', subnet '172.16.21.0', mask '255.255.255.0', numRef 0, numFail 0, flags 0x226 }