Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

System crash of two nodes 11g r2 rac

858557Apr 29 2011 — edited May 2 2011
HI
I get a surprising problem.The rac system will crash every 5 or 6 days.
there is no valuable information on rac1,but in ocssd.log have some errors
2011-08-26 16:15:15.592: [ CSSD][1137830208]clssgmExecuteClientRequest: Node name request from client ((nil))
2011-08-26 16:15:15.594: [ CSSD][1137830208]clssgmExecuteClientRequest: Node name request from client ((nil))
2011-08-26 16:15:15.595: [ CSSD][1137830208]clssgmExecuteClientRequest: NODELIST request from client ((nil))
2011-08-26 16:15:15.595: [ CSSD][1137830208]clssgmNodeList: proc(0x2aaab4203350), client((nil)) with option 4
2011-08-26 16:15:15.595: [ SKGFD][1232238912]BigInit

2011-08-26 16:15:15.596: [ SKGFD][1211259200]BigInit

2011-08-26 16:15:15.596: [ SKGFD][1179789632]BigInit

2011-08-26 16:15:15.596: [ SKGFD][1232238912]kgfkrq (0x2aaab016af10) of status 0 dump:

2011-08-26 16:15:15.596: [ SKGFD][1211259200]kgfkrq (0x2aaaac7a3210) of status 0 dump:

2011-08-26 16:15:15.596: [ SKGFD][1179789632]kgfkrq (0x2aaab41c4a00) of status 0 dump:

2011-08-26 16:15:15.596: [ SKGFD][1232238912]0x2aaab4010410 524304

2011-08-26 16:15:15.596: [ SKGFD][1211259200]0x2aaaac3647a0 524304

2011-08-26 16:15:15.596: [ SKGFD][1179789632]0x2aaab00482b0 524304

2011-08-26 16:15:15.596: [ SKGFD][1232238912]0x2aaaab407400 512

2011-08-26 16:15:15.596: [ SKGFD][1211259200]0x2aaaab489400 512

2011-08-26 16:15:15.596: [ SKGFD][1179789632]0x2aaaab50b400 512

2011-08-26 16:15:15.596: [ SKGFD][1211259200] BigInit

2011-08-26 16:15:15.596: [ SKGFD][1232238912] BigInit

2011-08-26 16:15:15.596: [ SKGFD][1211259200] kgfkrq (0x2aaaac458de0) of status 1 dump:

2011-08-26 16:15:15.596: [ SKGFD][1179789632] BigInit

2011-08-26 16:15:15.596: [ SKGFD][1232238912] kgfkrq (0x2aaab0149810) of status 1 dump:

2011-08-26 16:15:15.596: [ SKGFD][1211259200] 0x2aaaac3647a0 524304

2011-08-26 16:15:15.596: [ SKGFD][1179789632] kgfkrq (0x2aaab42ad700) of status 1 dump:

2011-08-26 16:15:15.596: [ SKGFD][1232238912] 0x2aaab4010410 524304

2011-08-26 16:15:15.596: [ SKGFD][1211259200] 0x2aaaab489400 256

2011-08-26 16:15:15.596: [ SKGFD][1179789632] 0x2aaab00482b0 524304

2011-08-26 16:15:15.596: [ SKGFD][1211259200] BigInit

2011-08-26 16:15:15.596: [ SKGFD][1232238912] 0x2aaaab407400 256

2011-08-26 16:15:15.596: [ SKGFD][1211259200] kgfkrq (0x2aaaac458f48) of status 1 dump:

2011-08-26 16:15:15.596: [ SKGFD][1179789632] 0x2aaaab50b400 256

2011-08-26 16:15:15.596: [ SKGFD][1232238912] BigInit

2011-08-26 16:15:15.596: [ SKGFD][1179789632] BigInit

2011-08-26 16:15:15.596: [ SKGFD][1232238912] kgfkrq (0x2aaab0149978) of status 1 dump:

2011-08-26 16:15:15.596: [ SKGFD][1211259200] 0x2aaaac3647a0 524560

2011-08-26 16:15:15.596: [ SKGFD][1179789632] kgfkrq (0x2aaab42ad868) of status 1 dump:

2011-08-26 16:15:15.596: [ SKGFD][1211259200] 0x2aaaab4a9400 256

2011-08-26 16:15:15.596: [ SKGFD][1232238912] 0x2aaab4010410 524560

2011-08-26 16:15:15.596: [ SKGFD][1179789632] 0x2aaab00482b0 524560

2011-08-26 16:15:15.596: [ SKGFD][1232238912] 0x2aaaab427400 256

2011-08-26 16:15:15.596: [ SKGFD][1179789632] 0x2aaaab52b400 256
....
2011-08-26 16:15:22.877: [ CSSD][1263708480]clssgmTagize: version(1), type(13), tagizer(0x494dfe)
2011-08-26 16:15:22.878: [ CSSD][1263708480]clssgmHandleDataInvalid: grock HB+ASM, member 2 node 2, birth 7
2011-08-26 16:15:23.612: [ CSSD][1284688192]clssnmSendingThread: sending status msg to all nodes
2011-08-26 16:15:23.612: [ CSSD][1284688192]clssnmSendingThread: sent 5 status msgs to all nodes
2011-08-26 16:15:24.967: [ CSSD][1263708480]clssgmTagize: version(1), type(13), tagizer(0x494dfe)
2011-08-26 16:15:24.967: [ CSSD][1263708480]clssgmHandleDataInvalid: grock HB+ASM, member 2 node 2, birth 7
2011-08-26 16:15:27.067: [ CSSD][1263708480]clssgmTagize: version(1), type(13), tagizer(0x494dfe)
2011-08-26 16:15:27.068: [ CSSD][1263708480]clssgmHandleDataInvalid: grock HB+ASM, member 2 node 2, birth 7
2011-08-26 16:15:28.044: [ CSSD][1263708480]clssgmTagize: version(1), type(3), tagizer(0x4929ba)
2011-08-26 16:15:28.044: [ CSSD][1263708480]clssgmHandleMasterMemberAdd: [s(2) d(1)]
2011-08-26 16:15:28.044: [ CSSD][1263708480]clssgmGrockOpTagProcess: clssgmCommonAddMember failed, member(-1/CLSN.FAN.racdb.FANPROC[3]) on node(2)
2011-08-26 16:15:28.044: [ CSSD][1263708480]clssgmGrockOpTagProcess: Operation(3) unsuccessful grock(CLSN.FAN.racdb.FANPROC[3])
2011-08-26 16:15:28.044: [ CSSD][1263708480]clssgmHandleMasterJoin: clssgmProcessJoinUpdate failed with status(-10)
2011-08-26 16:15:28.046: [ CSSD][1263708480]clssgmTagize: version(1), type(3), tagizer(0x4929ba)
2011-08-26 16:15:28.046: [ CSSD][1263708480]clssgmHandleMasterMemberAdd: [s(2) d(1)]

rac2 alert display rac1 has been removed but no evicted information
2011-08-26 16:24:14.953
[cssd(8381)]CRS-1612:Network communication with node rac1 (1) missing for 50% of timeout interval. Removal of this node from cluster in 14.740 seconds
2011-08-26 16:24:22.969
[cssd(8381)]CRS-1611:Network communication with node rac1 (1) missing for 75% of timeout interval. Removal of this node from cluster in 6.720 seconds
2011-08-26 16:24:26.977
[cssd(8381)]CRS-1610:Network communication with node rac1 (1) missing for 90% of timeout interval. Removal of this node from cluster in 2.720 seconds
2011-08-26 16:24:29.695
[cssd(8381)]CRS-1632:Node rac1 is being removed from the cluster in cluster incarnation 206618768
2011-08-26 16:24:29.713
[cssd(8381)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac2 .
2011-08-26 16:24:29.739
[ctssd(8550)]CRS-2407:The new Cluster Time Synchronization Service reference node is host rac2.
2011-08-26 16:24:31.487
[crsd(8829)]CRS-5504:Node down event reported for node 'rac1'.
2011-08-26 16:24:36.671
[crsd(8829)]CRS-2773:Server 'rac1' has been removed from pool 'Generic'.
2011-08-26 16:24:36.672
[crsd(8829)]CRS-2773:Server 'rac1' has been removed from pool 'ora.racdb'.
2011-08-26 16:39:28.152
[ctssd(8550)]CRS-2406:The Cluster Time Synchronization Service timed out on host rac2. Details in /oracle/app/grid/product/11.2.0/log/rac2/ctssd/octssd.log.
2011-08-27 10:08:29.454
[ohasd(7688)]CRS-2112:The OLR service started on node rac2.
2011-08-27 10:08:29.762
[ohasd(7688)]CRS-8017:location: /etc/oracle/lastgasp has 26 reboot advisory log files, 0 were announced and 0 errors occurred
2011-08-27 10:08:34.860
[ohasd(7688)]CRS-2772:Server 'rac2' has been assigned to pool 'Free'.
2011-08-27 10:08:39.040
[cssd(8183)]CRS-1713:CSSD daemon is started in clustered mode
2011-08-27 10:08:58.811
[cssd(8183)]CRS-1707:Lease acquisition for node rac2 number 2 completed

what's the problem?

environment:
two servers + san
system: redhat enterprise 5.4

帖子经 855554编辑过
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on May 30 2011
Added on Apr 29 2011
14 comments
1,450 views