Skip to Main Content

Chinese

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Oracle RAC crs无法启动的问题

974089Nov 13 2012 — edited Nov 14 2012
这两个节点的RAC是做为DataGuard备库。

版本:Red Linux 5.6,Oracle 10.2.0.3.0

node1->$ crsctl check crs
CSS appears healthy
Cannot communicate with CRS
EVM appears healthy

node1->$ crsctl query css votedisk
0. 0 /dev/raw/raw1

located 1 votedisk(s).

node1->$ ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 2
Total space (kbytes) : 497744
Used space (kbytes) : 3820
Available space (kbytes) : 493924
ID : 1682116375
Device/File Name : /dev/raw/raw4
Device/File integrity check succeeded

Device/File not configured

Cluster registry integrity check succeeded

# *./oifcfg getif*
eth0 10.17.19.0 global cluster_interconnect
eth1 172.17.19.0 global public



# */etc/init.d/init.crs start*

node1->$ ps -ef|grep crs
root 5083 1 0 15:10 ? 00:00:00 /bin/su -l oracle -c sh -c 'ulimit -c unlimited; cd /app/oracle/product/10.2.0/crs_1/log/node1/evmd; exec /app/oracle/product/10.2.0/crs_1/bin/evmd '
oracle 17459 4769 0 16:09 pts/1 00:00:00 grep crs
oracle 26397 5083 0 15:51 ? 00:00:00 /app/oracle/product/10.2.0/crs_1/bin/evmd.bin
root 26619 26370 0 15:51 ? 00:00:00 /bin/su -l oracle -c /bin/sh -c 'cd /app/oracle/product/10.2.0/crs_1/log/node1/cssd/oclsomon; ulimit -c unlimited; /app/oracle/product/10.2.0/crs_1/bin/oclsomon || exit $?'
oracle 26626 26619 0 15:51 ? 00:00:00 /bin/sh -c cd /app/oracle/product/10.2.0/crs_1/log/node1/cssd/oclsomon; ulimit -c unlimited; /app/oracle/product/10.2.0/crs_1/bin/oclsomon || exit $?
oracle 26672 26626 0 15:51 ? 00:00:00 /app/oracle/product/10.2.0/crs_1/bin/oclsomon.bin
oracle 26691 26371 0 15:51 ? 00:00:00 /app/oracle/product/10.2.0/crs_1/bin/ocssd.bin
oracle 27094 26397 0 15:51 ? 00:00:00 /app/oracle/product/10.2.0/crs_1/bin/evmlogger.bin -o /app/oracle/product/10.2.0/crs_1/evm/log/evmlogger.info -l /app/oracle/product/10.2.0/crs_1/evm/log/evmlogger.log


alertnode1.log 文件部份内容:

2012-11-13 15:51:07.152
[cssd(26691)]CRS-1605:CSSD voting file is online: /dev/raw/raw1. Details in /app/oracle/product/10.2.0/crs_1/log/node1/cssd/ocssd.log.
2012-11-13 15:51:08.084
[cssd(26691)]CRS-1601:CSSD Reconfiguration complete. Active nodes are node1 node2 .
2012-11-13 15:51:08.320
[evmd(26397)]CRS-1401:EVMD started on node node1.

ocssd.log 文件内容:

[ CSSD]2012-11-13 15:51:05.037 >USER: Oracle Database 10g CSS Release 10.2.0.3.0 Production Copyright 1996, 2004 Oracle. All rights reserved.
[ CSSD]2012-11-13 15:51:05.037 >USER: CSS daemon log for node node1, number 1, in cluster crs
[ CSSD]2012-11-13 15:51:05.040 [2246605696] >TRACE: clssscmain: local-only set to false
[ clsdmt]Listening to (ADDRESS=(PROTOCOL=ipc)(KEY=node1DBG_CSSD))
[ CSSD]2012-11-13 15:51:05.065 [2246605696] >TRACE: clssnmReadNodeInfo: added node 1 (node1) to cluster
[ CSSD]2012-11-13 15:51:05.074 [2246605696] >TRACE: clssnmReadNodeInfo: added node 2 (node2) to cluster
[ CSSD]2012-11-13 15:51:05.077 [1120115008] >TRACE: clssnm_skgxnmon: skgxn init failed
[ CSSD]2012-11-13 15:51:05.077 [2246605696] >TRACE: clssnm_skgxnonline: Using vacuous skgxn monitor
[ CSSD]2012-11-13 15:51:05.079 [2246605696] >TRACE: clssnmNMInitialize: misscount set to (60), impending reconfig threshold set to (56000)
[ CSSD]2012-11-13 15:51:05.079 [2246605696] >TRACE: clssnmNMInitialize: diskShortTimeout set to (57000)ms
[ CSSD]2012-11-13 15:51:05.080 [2246605696] >TRACE: clssnmNMInitialize: diskLongTimeout set to (200000)ms
[ CSSD]2012-11-13 15:51:05.082 [2246605696] >TRACE: clssnmDiskStateChange: state from 1 to 2 disk (0//dev/raw/raw1)
[ CSSD]2012-11-13 15:51:05.082 [1120115008] >TRACE: clssnmvDPT: spawned for disk 0 (/dev/raw/raw1)
[ CSSD]2012-11-13 15:51:07.127 [1120115008] >TRACE: clssnmDiskStateChange: state from 2 to 4 disk (0//dev/raw/raw1)
[ CSSD]2012-11-13 15:51:07.153 [1130604864] >TRACE: clssnmvKillBlockThread: spawned for disk 0 (/dev/raw/raw1) initial sleep interval (1000)ms
[ CSSD]2012-11-13 15:51:07.161 [2246605696] >TRACE: clssnmFatalInit: fatal mode enabled
[ CSSD]2012-11-13 15:51:07.161 [1151584576] >TRACE: clssnmconnect: connecting to node 1, flags 0x0001, connector 1
[ CSSD]2012-11-13 15:51:07.161 [1120115008] >TRACE: clssnmReadDskHeartbeat: node(2) is down. rcfg(12) wrtcnt(78619) LATS(1830084) Disk lastSeqNo(78619)
[ CSSD]2012-11-13 15:51:07.162 [1151584576] >TRACE: clssnmClusterListener: Listening on (ADDRESS=(PROTOCOL=tcp)(HOST=node1-priv)(PORT=49895))

[ CSSD]2012-11-13 15:51:07.162 [1151584576] >TRACE: clssnmconnect: connecting to node 0, flags 0x0000, connector 1
[ CSSD]2012-11-13 15:51:07.162 [1151584576] >TRACE: clssnmClusterListener: Probing node 2, con (0x2aaaac10c320)
[ CSSD]2012-11-13 15:51:07.171 [1162074432] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=Oracle_CSS_LclLstnr_crs_1))
[ CSSD]2012-11-13 15:51:07.171 [1162074432] >TRACE: clssgmclientlsnr: listening on (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_node1_crs))
[ CSSD]2012-11-13 15:51:07.172 [1193544000] >TRACE: clssgmPeerListener: Listening on (ADDRESS=(PROTOCOL=tcp)(DEV=19)(HOST=10.17.19.20)(PORT=18701))
[ CSSD]2012-11-13 15:51:07.198 [1151584576] >TRACE: clssnmConnComplete: connected to node 2 (con 0x2aaaac163b50), state 3 birth 0, unique 1352712566/1352712566 prevConuni(0)
[ CSSD]2012-11-13 15:51:07.673 [1204033856] >TRACE: clssnmPollingThread: Connection complete
[ CSSD]2012-11-13 15:51:07.673 [1214523712] >TRACE: clssnmSendingThread: Connection complete
[ CSSD]2012-11-13 15:51:07.673 [1225013568] >TRACE: clssnmRcfgMgrThread: Connection complete
[ CSSD]2012-11-13 15:51:08.003 [1151584576] >TRACE: clssnmHandleSync: Acknowledging sync: src[2] srcName[node2] seq[45] sync[12]
[ CSSD]2012-11-13 15:51:08.003 [1151584576] >TRACE: clssnmHandleSync: diskTimeout set to (57000)ms
[ CSSD]2012-11-13 15:51:08.003 [1151584576] >TRACE: clssnmSendVoteInfo: node(2) syncSeqNo(12)
[ CSSD]2012-11-13 15:51:08.004 [1151584576] >TRACE: clssnmUpdateNodeState: node 0, state (0/0) unique (0/0) prevConuni(0) birth (0/0) (old/new)
[ CSSD]2012-11-13 15:51:08.004 [1151584576] >TRACE: clssnmDeactivateNode: node 0 () left cluster

[ CSSD]2012-11-13 15:51:08.004 [1151584576] >TRACE: clssnmUpdateNodeState: node 1, state (1/2) unique (1352793064/1352793064) prevConuni(0) birth (0/12) (old/new)
[ CSSD]2012-11-13 15:51:08.004 [1151584576] >TRACE: clssnmUpdateNodeState: node 2, state (4/3) unique (1352712566/1352712566) prevConuni(0) birth (0/1) (old/new)
[ CSSD]2012-11-13 15:51:08.004 [1151584576] >USER: clssnmHandleUpdate: SYNC(12) from node(2) completed
[ CSSD]2012-11-13 15:51:08.004 [1151584576] >USER: clssnmHandleUpdate: NODE 1 (node1) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2012-11-13 15:51:08.004 [1151584576] >USER: clssnmHandleUpdate: NODE 2 (node2) IS ACTIVE MEMBER OF CLUSTER
[ CSSD]2012-11-13 15:51:08.004 [1151584576] >TRACE: clssnmHandleUpdate: diskTimeout set to (200000)ms
[ CSSD]2012-11-13 15:51:08.081 [2246605696] >USER: NMEVENT_SUSPEND [00][00][00][00]
[ CSSD]2012-11-13 15:51:08.081 [1235503424] >TRACE: clssgmReconfigThread: started for reconfig (12)
[ CSSD]2012-11-13 15:51:08.081 [1235503424] >USER: NMEVENT_RECONFIG [00][00][00][06]
[ CSSD]2012-11-13 15:51:08.081 [1235503424] >TRACE: clssgmEstablishConnections: 2 nodes in cluster incarn 12
[ CSSD]2012-11-13 15:51:08.082 [1193544000] >TRACE: clssgmInitialRecv: (0xd9ae050) accepted a new connection from node 2 born at 1 active (2, 2), vers (10,3,1,2)
[ CSSD]2012-11-13 15:51:08.082 [1193544000] >TRACE: clssgmInitialRecv: conns done (2/2)
[ CSSD]2012-11-13 15:51:08.082 [1235503424] >TRACE: clssgmEstablishMasterNode: MASTER for 12 is node(2) birth(1)
[ CSSD]2012-11-13 15:51:08.082 [1235503424] >TRACE: clssgmChangeMasterNode: requeued 0 RPCs
[ CSSD]2012-11-13 15:51:08.083 [1193544000] >TRACE: clssgmHandleDBDone(): src/dest (2/65535) size(72) incarn 12
[ CSSD]CLSS-3000: reconfiguration successful, incarnation 12 with 2 nodes

[ CSSD]CLSS-3001: local node number 1, master node number 2

[ CSSD]2012-11-13 15:51:08.084 [1235503424] >TRACE: clssgmReconfigThread: completed for reconfig(12), with status(1)
[ CSSD]2012-11-13 15:51:08.268 [1162074432] >TRACE: clssgmClientConnectMsg: Connect from con(0xd9b4d50) proc(0xd9b9d50) pid() proto(10:2:1:1)
[ CSSD]2012-11-13 15:51:08.268 [1193544000] >TRACE: clssgmCommonAddMember: clsomon joined (1/0x1000000/#CSS_CLSSOMON)
[ CSSD]2012-11-13 15:51:08.269 [1162074432] >TRACE: clssgmClientConnectMsg: Connect from con(0xd9b7910) proc(0xd9ba0a0) pid() proto(10:2:1:1)


查看ocr,表决磁盘,存储,网络,裸设备权限,都没有发现问题,有时候执行/etc/init.d/init.crs start还会导致服务器重启,日志内容如下:
/var/log/message重启时的日志

Nov 13 15:51:03 node1 logger: Cluster Ready Services completed waiting on dependencies.
Nov 13 15:51:03 node1 logger: Cluster Ready Services completed waiting on dependencies.
Nov 13 16:10:54 node1 auditd[3667]: Audit daemon rotating log files
Nov 13 16:49:14 node1 auditd[3667]: Audit daemon rotating log files
Nov 13 16:50:37 node1 root: Cluster Ready Services completed waiting on dependencies.
Nov 13 16:52:07 node1 logger: Oracle CSS family monitor shutting down. 3
Nov 13 16:52:07 node1 root: Oracle CRSD 5797 set to stop
Nov 13 16:52:07 node1 root: Oracle CRSD 5797 shutdown completed
Nov 13 16:52:07 node1 root: Oracle EVMD set to stop
Nov 13 16:52:07 node1 root: Oracle CSSD being stopped
Nov 13 16:52:17 node1 root: Oracle CSSD being stopped
Nov 13 16:52:27 node1 root: Oracle EVMD set to stop
Nov 13 16:52:45 node1 root: Oracle CSSD being stopped
Nov 13 17:03:14 node1 root: Oracle CRSD 5797 set to stop
Nov 13 17:03:14 node1 root: Oracle CRSD 5797 shutdown completed
Nov 13 17:03:14 node1 root: Oracle EVMD set to stop
Nov 13 17:03:14 node1 root: Oracle CSSD being stopped
Nov 13 17:03:26 node1 root: Oracle Cluster Ready Services starting by user request.
Nov 13 17:03:35 node1 logger: Cluster Ready Services completed waiting on dependencies.
Nov 13 17:03:36 node1 logger: Oracle CSSD shell script failure. Duplicate CSSD.
Nov 13 17:03:36 node1 kernel: md: stopping all md devices.
Nov 13 17:21:49 node1 syslogd 1.4.1: restart.
Nov 13 17:21:49 node1 kernel: klogd 1.4.1, log source = /proc/kmsg started.

出现 Nov 13 17:03:36 node1 logger: Oracle CSSD shell script failure. Duplicate CSSD. 之后,服务器就重启了
在网上查了不少类似问题,其他网友无法启动CRS主要集中在几个方面:
1、/tmp权限不正确
2、删除/var/tmp/.oracle下的文件,再重启
3、oifcfg查看到网卡设置问题

但我遇到的问题,以上3项都是正常的,跟这个http://www.itpub.net/thread-1330782-1-1.html 问题类似。

请问这个问题是什么原因导致的?

帖子经 user1738965编辑过

帖子经 user1738965编辑过
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Dec 12 2012
Added on Nov 13 2012
5 comments
735 views