Skip to Main Content

Chinese

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

RAC节点宕机,疑似网络问题

947524Nov 11 2012 — edited Nov 13 2012
生产系统,2节点RAC,RAC套件版本为11.2.0.4,操作系统为RHEL 5.4 x86_64。服务器为HP DL380G7。

最近每个月都出现同样的问题,心跳网络报错,然后节点2数据库宕掉,重启服务器方能解决问题。

以下为故障时间点错误日志(eth0为心跳网络所在网卡)

*##### messages #####*
Nov 9 15:01:39 mesdb2 kernel: bnx2: eth0 NIC Copper Link is Down
Nov 9 15:01:42 mesdb2 kernel: bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON
Nov 9 15:01:52 mesdb2 kernel: bnx2: eth0 NIC Copper Link is Down
Nov 9 15:01:55 mesdb2 kernel: bnx2: eth0 NIC Copper Link is Up, 1000 Mbps full duplex, receive & transmit flow control ON
Nov 9 15:02:02 mesdb2 avahi-daemon[9460]: Withdrawing address record for 192.168.21.47 on eth2.
Nov 9 15:02:02 mesdb2 avahi-daemon[9460]: Withdrawing address record for 192.168.21.7 on eth2.
Nov 9 15:02:05 mesdb2 kernel: bnx2: eth0 NIC Copper Link is Down
Nov 9 15:02:07 mesdb2 avahi-daemon[9460]: Withdrawing address record for 169.254.8.102 on eth0.

*##### altermesdb2.log #####*
2012-11-09 15:01:47.635
[cssd(10044)]CRS-1612:Network communication with node mesdb1 (1) missing for 50% of . Removal of this node from cluster in 14.540 seconds
2012-11-09 15:01:55.655
[cssd(10044)]CRS-1611:Network communication with node mesdb1 (1) missing for 75% of timeout interval. Removal of this node from cluster in 6.520 seconds
2012-11-09 15:01:59.663
[cssd(10044)]CRS-1610:Network communication with node mesdb1 (1) missing for 90% of timeout interval. Removal of this node from cluster in 2.510 seconds
2012-11-09 15:02:02.181
[cssd(10044)]CRS-1609:This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; details at (:CSSNM00008:) in /oracle/grid/11.2/log/mesdb2/cssd/ocssd.log.
2012-11-09 15:02:02.185
[cssd(10044)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /oracle/grid/11.2/log/mesdb2/cssd/ocssd.log

*##### ocssd.log ######*

2012-11-09 15:00:58.288: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:00:58.288: [ CSSD][1080224064]clssnmSendingThread: sent 4 status msgs to all nodes
2012-11-09 15:01:02.296: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:01:02.296: [ CSSD][1080224064]clssnmSendingThread: sent 4 status msgs to all nodes
2012-11-09 15:01:07.306: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:01:07.306: [ CSSD][1080224064]clssnmSendingThread: sent 5 status msgs to all nodes
2012-11-09 15:01:12.316: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:01:12.316: [ CSSD][1080224064]clssnmSendingThread: sent 5 status msgs to all nodes
2012-11-09 15:01:17.326: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:01:17.326: [ CSSD][1080224064]clssnmSendingThread: sent 5 status msgs to all nodes
2012-11-09 15:01:21.334: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:01:21.334: [ CSSD][1080224064]clssnmSendingThread: sent 4 status msgs to all nodes
2012-11-09 15:01:25.342: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:01:25.342: [ CSSD][1080224064]clssnmSendingThread: sent 4 status msgs to all nodes
2012-11-09 15:01:29.350: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:01:29.350: [ CSSD][1080224064]clssnmSendingThread: sent 4 status msgs to all nodes
2012-11-09 15:01:34.360: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:01:34.360: [ CSSD][1080224064]clssnmSendingThread: sent 5 status msgs to all nodes
2012-11-09 15:01:39.370: [ CSSD][1080224064]clssnmSendingThread: sending status msg to all nodes
2012-11-09 15:01:39.370: [ CSSD][1080224064]clssnmSendingThread: sent 5 status msgs to all nodes
2012-11-09 15:01:40.182: [GIPCHGEN][1098344768] gipchaInterfaceFail: marking interface failing 0x2aaab02c8980 { host '', haName 'CSS_scan', local (nil), ip '172.16.21.10', subnet '172.16.21.0', mask '255.255.255.0', numRef 1, numFail 0, flags 0x4d }
2012-11-09 15:01:40.190: [GIPCHGEN][1096767808] gipchaInterfaceFail: marking interface failing 0x11fbea70 { host 'mesdb1', haName 'CSS_scan', local 0x2aaab02c8980, ip '172.16.21.9:56687', subnet '172.16.21.0', mask '255.255.255.0', numRef 0, numFail 0, flags 0x6 }
2012-11-09 15:01:40.200: [GIPCHGEN][1096767808] gipchaInterfaceDisable: disabling interface 0x2aaab02c8980 { host '', haName 'CSS_scan', local (nil), ip '172.16.21.10', subnet '172.16.21.0', mask '255.255.255.0', numRef 0, numFail 1, flags 0x1cd }
2012-11-09 15:01:40.212: [GIPCHGEN][1096767808] gipchaInterfaceDisable: disabling interface 0x11fbea70 { host 'mesdb1', haName 'CSS_scan', local 0x2aaab02c8980, ip '172.16.21.9:56687', subnet '172.16.21.0', mask '255.255.255.0', numRef 0, numFail 0, flags 0x86 }
2012-11-09 15:01:40.213: [GIPCHALO][1096767808] gipchaLowerCleanInterfaces: performing cleanup of disabled interface 0x11fbea70 { host 'mesdb1', haName 'CSS_scan', local 0x2aaab02c8980, ip '172.16.21.9:56687', subnet '172.16.21.0', mask '255.255.255.0', numRef 0, numFail 0, flags 0xa6 }
2012-11-09 15:01:40.213: [GIPCHGEN][1096767808] gipchaInterfaceReset: resetting interface 0x11fbea70 { host 'mesdb1', haName 'CSS_scan', local 0x2aaab02c8980, ip '172.16.21.9:56687', subnet '172.16.21.0', mask '255.255.255.0', numRef 0, numFail 0, flags 0xa6 }
2012-11-09 15:01:40.222: [GIPCHDEM][1096767808] gipchaWorkerCleanInterface: performing cleanup of disabled interface 0x2aaab02c8980 { host '', haName 'CSS_scan', local (nil), ip '172.16.21.10', subnet '172.16.21.0', mask '255.255.255.0', numRef 0, numFail 0, flags 0x1ed }
2012-11-09 15:01:40.222: [GIPCHTHR][1096767808] gipchaWorkerUpdateInterface: created remote interface for node 'mesdb1', haName 'CSS_scan', inf 'udp://172.16.21.9:56687'
2012-11-09 15:01:40.222: [GIPCHALO][1096767808] gipchaLowerCleanInterfaces: forcing interface purge due to loss of all comms node 0x2aaab01fca50 { host 'mesdb1', haName 'CSS_scan', srcLuid 787fbb33-023e177a, dstLuid ec6439a0-44129130 numInf 1, contigSeq 1140862, lastAck 1140845, lastValidAck 1140862, sendSeq [1140846 : 1140855], createTime 16963874, flags 0x4808 }
2012-11-09 15:01:40.222: [GIPCHGEN][1096767808] gipchaInterfaceDisable: disabling interface 0x11fbea70 { host 'mesdb1', haName 'CSS_scan', local (nil), ip '172.16.21.9:56687', subnet '172.16.21.0', mask '255.255.255.0', numRef 0, numFail 0, flags 0x6 }
2012-11-09 15:01:40.232: [GIPCHALO][1096767808] gipchaLowerCleanInterfaces: performing cleanup of disabled interface 0x11fbea70 { host 'mesdb1', haName 'CSS_scan', local (nil), ip '172.16.21.9:56687', subnet '172.16.21.0', mask '255.255.255.0', numRef 0, numFail 0, flags 0x226 }
This post has been answered by LiuMaclean(刘相兵) on Nov 12 2012
Jump to Answer
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Dec 11 2012
Added on Nov 11 2012
9 comments
1,399 views