Oracle RAC crs无法启动的问题
974089Nov 13 2012 — edited Nov 14 2012这两个节点的RAC是做为DataGuard备库。
版本:Red Linux 5.6,Oracle 10.2.0.3.0
node1->$ crsctl check crs
CSS appears healthy
Cannot communicate with CRS
EVM appears healthy
node1->$ crsctl query css votedisk
0. 0 /dev/raw/raw1
located 1 votedisk(s).
node1->$ ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 497744
Used space (kbytes) : 3820
Available space (kbytes) : 493924
ID : 1682116375
Device/File Name : /dev/raw/raw4
Device/File integrity check succeeded
Device/File not configured
Cluster registry integrity check succeeded
# *./oifcfg getif*
eth0 10.17.19.0 global cluster_interconnect
eth1 172.17.19.0 global public
# */etc/init.d/init.crs start*
node1->$ ps -ef|grep crs
root 5083 1 0 15:10 ? 00:00:00 /bin/su -l oracle -c sh -c 'ulimit -c unlimited; cd /app/oracle/product/10.2.0/crs_1/log/node1/evmd; exec /app/oracle/product/10.2.0/crs_1/bin/evmd '
oracle 17459 4769 0 16:09 pts/1 00:00:00 grep crs
oracle 26397 5083 0 15:51 ? 00:00:00 /app/oracle/product/10.2.0/crs_1/bin/evmd.bin
root 26619 26370 0 15:51 ? 00:00:00 /bin/su -l oracle -c /bin/sh -c 'cd /app/oracle/product/10.2.0/crs_1/log/node1/cssd/oclsomon; ulimit -c unlimited; /app/oracle/product/10.2.0/crs_1/bin/oclsomon || exit $?'
oracle 26626 26619 0 15:51 ? 00:00:00 /bin/sh -c cd /app/oracle/product/10.2.0/crs_1/log/node1/cssd/oclsomon; ulimit -c unlimited; /app/oracle/product/10.2.0/crs_1/bin/oclsomon || exit $?
oracle 26672 26626 0 15:51 ? 00:00:00 /app/oracle/product/10.2.0/crs_1/bin/oclsomon.bin
oracle 26691 26371 0 15:51 ? 00:00:00 /app/oracle/product/10.2.0/crs_1/bin/ocssd.bin
oracle 27094 26397 0 15:51 ? 00:00:00 /app/oracle/product/10.2.0/crs_1/bin/evmlogger.bin -o /app/oracle/product/10.2.0/crs_1/evm/log/evmlogger.info -l /app/oracle/product/10.2.0/crs_1/evm/log/evmlogger.log
alertnode1.log 文件部份内容:
2012-11-13 15:51:07.152
[cssd(26691)]CRS-1605:CSSD voting file is online: /dev/raw/raw1. Details in /app/oracle/product/10.2.0/crs_1/log/node1/cssd/ocssd.log.
2012-11-13 15:51:08.084
[cssd(26691)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node1 node2 .
2012-11-13 15:51:08.320
[evmd(26397)]CRS-1401:EVMD started on node node1.
ocssd.log 文件内容:
[ CSSD]2012-11-13 15:51:05.037 >USER: Oracle Database 10g CSS Release 10.2.0.3.0 Production Copyright 1996, 2004 Oracle. All rights reserved.
[ CSSD]2012-11-13 15:51:05.037 >USER: CSS daemon log for node node1, number 1, in cluster crs
[ CSSD]2012-11-13 15:51:05.040 [2246605696] >TRACE: clssscmain: local-only set to false
[ clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=node1DBG_CSSD))
[ CSSD]2012-11-13 15:51:05.065 [2246605696] >TRACE: clssnmReadNodeInfo: added node 1 (node1) to cluster
[ CSSD]2012-11-13 15:51:05.074 [2246605696] >TRACE: clssnmReadNodeInfo: added node 2 (node2) to cluster
[ CSSD]2012-11-13 15:51:05.077 [1120115008] >TRACE: clssnm_skgxnmon: skgxn init failed
[ CSSD]2012-11-13 15:51:05.077 [2246605696] >TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor
[ CSSD]2012-11-13 15:51:05.079 [2246605696] >TRACE: clssnmNMInitialize: misscount set to (60), impending reconfig threshold set to (56000)
[ CSSD]2012-11-13 15:51:05.079 [2246605696] >TRACE: clssnmNMInitialize: diskShortTimeout set to (57000)ms
[ CSSD]2012-11-13 15:51:05.080 [2246605696] >TRACE: clssnmNMInitialize: diskLongTimeout set to (200000)ms
[ CSSD]2012-11-13 15:51:05.082 [2246605696] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw1)
[ CSSD]2012-11-13 15:51:05.082 [1120115008] >TRACE: clssnmvDPT: spawned for disk 0 (/dev/raw/raw1)
[ CSSD]2012-11-13 15:51:07.127 [1120115008] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw1)
[ CSSD]2012-11-13 15:51:07.153 [1130604864] >TRACE: clssnmvKillBlockThread: spawned for disk 0 (/dev/raw/raw1) initial sleep interval (1000)ms
[ CSSD]2012-11-13 15:51:07.161 [2246605696] >TRACE: clssnmFatalInit: fatal mode enabled
[ CSSD]2012-11-13 15:51:07.161 [1151584576] >TRACE: clssnmconnect: connecting to node 1, flags 0x0001, connector 1
[ CSSD]2012-11-13 15:51:07.161 [1120115008] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(12) wrtcnt(78619) LATS(1830084) Disk lastSeqNo(78619)
[ CSSD]2012-11-13 15:51:07.162 [1151584576] >TRACE: clssnmClusterListener: Listening on (ADDRESS=(PROTOCOL=tcp)(HOST=node1-priv)(PORT=49895))
[ CSSD]2012-11-13 15:51:07.162 [1151584576] >TRACE: clssnmconnect: connecting to node 0, flags 0x0000, connector 1
[ CSSD]2012-11-13 15:51:07.162 [1151584576] >TRACE: clssnmClusterListener: Probing node 2, con (0x2aaaac10c320)
[ CSSD]2012-11-13 15:51:07.171 [1162074432] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
[ CSSD]2012-11-13 15:51:07.171 [1162074432] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_crs))
[ CSSD]2012-11-13 15:51:07.172 [1193544000] >TRACE: clssgmPeerListener: Listening on (ADDRESS=(PROTOCOL=tcp)(DEV=19)(HOST=10.17.19.20)(PORT=18701))
[ CSSD]2012-11-13 15:51:07.198 [1151584576] >TRACE: clssnmConnComplete: connected to node 2 (con 0x2aaaac163b50), state 3 birth 0, unique 1352712566/1352712566 prevConuni(0)
[ CSSD]2012-11-13 15:51:07.673 [1204033856] >TRACE: clssnmPollingThread: Connection complete
[ CSSD]2012-11-13 15:51:07.673 [1214523712] >TRACE: clssnmSendingThread: Connection complete
[ CSSD]2012-11-13 15:51:07.673 [1225013568] >TRACE: clssnmRcfgMgrThread: Connection complete
[ CSSD]2012-11-13 15:51:08.003 [1151584576] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[node2] seq[45] sync[12]
[ CSSD]2012-11-13 15:51:08.003 [1151584576] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms
[ CSSD]2012-11-13 15:51:08.003 [1151584576] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(12)
[ CSSD]2012-11-13 15:51:08.004 [1151584576] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[ CSSD]2012-11-13 15:51:08.004 [1151584576] >TRACE: clssnmDeactivateNode: node 0 () left cluster
[ CSSD]2012-11-13 15:51:08.004 [1151584576] >TRACE: clssnmUpdateNodeState: node 1, state (1/2) unique (1352793064/1352793064) prevConuni(0) birth (0/12) (old/new)
[ CSSD]2012-11-13 15:51:08.004 [1151584576] >TRACE: clssnmUpdateNodeState: node 2, state (4/3) unique (1352712566/1352712566) prevConuni(0) birth (0/1) (old/new)
[ CSSD]2012-11-13 15:51:08.004 [1151584576] >USER: clssnmHandleUpdate: SYNC(12) from node(2) completed
[ CSSD]2012-11-13 15:51:08.004 [1151584576] >USER: clssnmHandleUpdate: NODE 1 (node1) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2012-11-13 15:51:08.004 [1151584576] >USER: clssnmHandleUpdate: NODE 2 (node2) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2012-11-13 15:51:08.004 [1151584576] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
[ CSSD]2012-11-13 15:51:08.081 [2246605696] >USER: NMEVENT_SUSPEND [00][00][00][00]
[ CSSD]2012-11-13 15:51:08.081 [1235503424] >TRACE: clssgmReconfigThread: started for reconfig (12)
[ CSSD]2012-11-13 15:51:08.081 [1235503424] >USER: NMEVENT_RECONFIG [00][00][00][06]
[ CSSD]2012-11-13 15:51:08.081 [1235503424] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 12
[ CSSD]2012-11-13 15:51:08.082 [1193544000] >TRACE: clssgmInitialRecv: (0xd9ae050) accepted a new connection from node 2 born at 1 active (2, 2), vers (10,3,1,2)
[ CSSD]2012-11-13 15:51:08.082 [1193544000] >TRACE: clssgmInitialRecv: conns done (2/2)
[ CSSD]2012-11-13 15:51:08.082 [1235503424] >TRACE: clssgmEstablishMasterNode: MASTER for 12 is node(2) birth(1)
[ CSSD]2012-11-13 15:51:08.082 [1235503424] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
[ CSSD]2012-11-13 15:51:08.083 [1193544000] >TRACE: clssgmHandleDBDone(): src/dest (2/65535) size(72) incarn 12
[ CSSD]CLSS-3000: reconfiguration successful, incarnation 12 with 2 nodes
[ CSSD]CLSS-3001: local node number 1, master node number 2
[ CSSD]2012-11-13 15:51:08.084 [1235503424] >TRACE: clssgmReconfigThread: completed for reconfig(12), with status(1)
[ CSSD]2012-11-13 15:51:08.268 [1162074432] >TRACE: clssgmClientConnectMsg: Connect from con(0xd9b4d50) proc(0xd9b9d50) pid() proto(10:2:1:1)
[ CSSD]2012-11-13 15:51:08.268 [1193544000] >TRACE: clssgmCommonAddMember: clsomon joined (1/0x1000000/#CSS_CLSSOMON)
[ CSSD]2012-11-13 15:51:08.269 [1162074432] >TRACE: clssgmClientConnectMsg: Connect from con(0xd9b7910) proc(0xd9ba0a0) pid() proto(10:2:1:1)
查看ocr,表决磁盘,存储,网络,裸设备权限,都没有发现问题,有时候执行/etc/init.d/init.crs start还会导致服务器重启,日志内容如下:
/var/log/message重启时的日志
Nov 13 15:51:03 node1 logger: Cluster Ready Services completed waiting on dependencies.
Nov 13 15:51:03 node1 logger: Cluster Ready Services completed waiting on dependencies.
Nov 13 16:10:54 node1 auditd[3667]: Audit daemon rotating log files
Nov 13 16:49:14 node1 auditd[3667]: Audit daemon rotating log files
Nov 13 16:50:37 node1 root: Cluster Ready Services completed waiting on dependencies.
Nov 13 16:52:07 node1 logger: Oracle CSS family monitor shutting down. 3
Nov 13 16:52:07 node1 root: Oracle CRSD 5797 set to stop
Nov 13 16:52:07 node1 root: Oracle CRSD 5797 shutdown completed
Nov 13 16:52:07 node1 root: Oracle EVMD set to stop
Nov 13 16:52:07 node1 root: Oracle CSSD being stopped
Nov 13 16:52:17 node1 root: Oracle CSSD being stopped
Nov 13 16:52:27 node1 root: Oracle EVMD set to stop
Nov 13 16:52:45 node1 root: Oracle CSSD being stopped
Nov 13 17:03:14 node1 root: Oracle CRSD 5797 set to stop
Nov 13 17:03:14 node1 root: Oracle CRSD 5797 shutdown completed
Nov 13 17:03:14 node1 root: Oracle EVMD set to stop
Nov 13 17:03:14 node1 root: Oracle CSSD being stopped
Nov 13 17:03:26 node1 root: Oracle Cluster Ready Services starting by user request.
Nov 13 17:03:35 node1 logger: Cluster Ready Services completed waiting on dependencies.
Nov 13 17:03:36 node1 logger: Oracle CSSD shell script failure. Duplicate CSSD.
Nov 13 17:03:36 node1 kernel: md: stopping all md devices.
Nov 13 17:21:49 node1 syslogd 1.4.1: restart.
Nov 13 17:21:49 node1 kernel: klogd 1.4.1, log source = /proc/kmsg started.
出现 Nov 13 17:03:36 node1 logger: Oracle CSSD shell script failure. Duplicate CSSD. 之后,服务器就重启了
在网上查了不少类似问题,其他网友无法启动CRS主要集中在几个方面:
1、/tmp权限不正确
2、删除/var/tmp/.oracle下的文件,再重启
3、oifcfg查看到网卡设置问题
但我遇到的问题,以上3项都是正常的,跟这个http://www.itpub.net/thread-1330782-1-1.html 问题类似。
请问这个问题是什么原因导致的?
帖子经 user1738965编辑过
帖子经 user1738965编辑过