Hello folks.
Envrionment: Oracle RAC 10.2.0.1 Standard Edition with 2 nodes.
OS: Red Hat 5, kernel 2.6.18-308.el5
I just joined this enterprise, and i observed some problems in ONE of the nodes:
Alert.log
Sat Oct 26 00:30:04 2013
Thread 2 advanced to log sequence 1187
Current log# 6 seq# 1187 mem# 0: +DGDATA/dbprod/onlinelog/group_6.317.824732563
Current log# 6 seq# 1187 mem# 1: +DGFRA/dbprod/onlinelog/group_6.262.824732565
Sat Oct 26 00:58:27 2013
ALTER SYSTEM SET service_names='dbprod' SCOPE=MEMORY SID='dbprod2';
Sat Oct 26 00:58:27 2013
Shutting down instance (abort)
License high water mark = 166
Instance terminated by USER, pid = 28935
Sat Oct 26 00:58:30 2013
Starting ORACLE instance (normal)
As we can see above, the instance quickly shutdown and restarted.
It is happening in a random frequency, date and time. So i am discarding the jobs executed.
Mon Sep 9 16:10:22 2013, Fri Sep 20 03:38:23,Wed Sep 25 15:02:59,Wed Oct 2 16:08:39, Tue Oct 8 05:53:16, Thu Oct 24 06:16:21 , Sat Oct 26 00:58:27.
Well i checked then crsd.log :
2013-10-26 00:58:26.978: [ OCRSRV][1284614464]th_select_handler: Failed to retrieve procctx from ht. constr = [373133728] retval lht [-27] Signal CV.
2013-10-26 00:58:27.237: [ CRSAPP][1546860864]0CheckResource error for ora.dbprod.dbprod2.inst error code = 139
2013-10-26 00:58:27.240: [ CRSRES][1546860864]0In stateChanged, ora.dbprod.dbprod2.inst target is ONLINE
2013-10-26 00:58:27.240: [ CRSRES][1546860864]0ora.dbprod.dbprod2.inst on xxxx went OFFLINE unexpectedly
2013-10-26 00:58:27.240: [ CRSRES][1546860864]0StopResource: setting CLI values
2013-10-26 00:58:27.251: [ CRSRES][1546860864]0Attempting to stop `ora.dbprod.dbprod2.inst` on member `xxxx`
2013-10-26 00:58:27.251: [ CRSD][1546860864]0entries=
2013-10-26 00:58:27.251: [ CRSD][1546860864]0entry=owner:oracle:rwx |
2013-10-26 00:58:27.252: [ CRSD][1546860864]0entry=pgrp:oinstall:rwx |
2013-10-26 00:58:27.252: [ CRSD][1546860864]0entry=other::r-- |
2013-10-26 00:58:27.252: [ CRSD][1546860864]0
2013-10-26 00:58:27.312: [ CRSRES][1567840576]0In stateChanged, ora.dbprod.teste.dbprod2.srv target is ONLINE
2013-10-26 00:58:27.313: [ CRSRES][1567840576]0ora.dbprod.teste.dbprod2.srv on xxxx went OFFLINE unexpectedly
2013-10-26 00:58:27.313: [ CRSRES][1567840576]0StopResource: setting CLI values
2013-10-26 00:58:27.348: [ OCRSRV][1284614464]th_select_handler: Failed to retrieve procctx from ht. constr = [372844896] retval lht [-27] Signal CV.
2013-10-26 00:58:27.353: [ CRSRES][1567840576]0Attempting to stop `ora.dbprod.teste.dbprod2.srv` on member `xxxx`
2013-10-26 00:58:27.353: [ CRSD][1567840576]0entries=
2013-10-26 00:58:27.353: [ CRSD][1567840576]0entry=owner:oracle:rwx |
2013-10-26 00:58:27.354: [ CRSD][1567840576]0entry=pgrp:oinstall:rwx |
2013-10-26 00:58:27.354: [ CRSD][1567840576]0entry=other::r-- |
2013-10-26 00:58:27.354: [ CRSD][1567840576]0
2013-10-26 00:58:27.549: [ CRSRES][1567840576]0Stop of `ora.dbprod.teste.dbprod2.srv` on member `xxxx` succeeded.
2013-10-26 00:58:27.549: [ CRSRES][1567840576]0ora.dbprod.teste.dbprod2.srv RESTART_COUNT=0 RESTART_ATTEMPTS=0
2013-10-26 00:58:27.560: [ CRSRES][1567840576]0ora.dbprod.teste.dbprod2.srv failed on xxxx relocating.
2013-10-26 00:58:27.864: [ CRSRES][1567840576]0Cannot relocate ora.dbprod.teste.dbprod2.srvStopping dependents
2013-10-26 00:58:28.080: [ CRSD][1567840576]0entries=
2013-10-26 00:58:28.080: [ CRSD][1567840576]0entry=owner:oracle:rwx |
2013-10-26 00:58:28.080: [ CRSD][1567840576]0entry=pgrp:oinstall:rwx |
2013-10-26 00:58:28.080: [ CRSD][1567840576]0entry=other::r-- |
2013-10-26 00:58:28.080: [ CRSD][1567840576]0
2013-10-26 00:58:28.081: [ CRSRES][1567840576]0StopResource: setting CLI values
2013-10-26 00:58:30.040: [ CRSRES][1546860864]0Stop of `ora.dbprod.dbprod2.inst` on member `xxxx` succeeded.
2013-10-26 00:58:30.041: [ CRSRES][1546860864]0ora.dbprod.dbprod2.inst RESTART_COUNT=1 RESTART_ATTEMPTS=5
2013-10-26 00:58:30.041: [ CRSRES][1546860864]0ora.dbprod.dbprod2.inst Uptime does not exceed uptime_threshold
2013-10-26 00:58:30.041: [ CRSRES][1546860864]0Restarting ora.dbprod.dbprod2.inst on xxxx
I checked cssd.log too:
[ CSSD]2013-10-26 00:58:27.892 [1222052160] >TRACE: clscsendx: (0x2aaaac0b09b0) Connection not active
[ CSSD]2013-10-26 00:58:27.892 [1222052160] >TRACE: clssgmSendClient: Send failed rc 6, con (0x2aaaac0b09b0), client (0x2aaaac0b0d60), proc ((nil))
[ CSSD]2013-10-26 00:58:27.906 [1222052160] >TRACE: clscsendx: (0x2aaaac0d2790) Connection not active
[ CSSD]2013-10-26 00:58:27.906 [1222052160] >TRACE: clssgmSendClient: Send failed rc 6, con (0x2aaaac0d2790), client (0x2aaaac0cd220), proc ((nil))
[ CSSD]2013-10-26 00:58:27.907 [1222052160] >TRACE: clscsendx: (0x2aaaac0ba2c0) Connection not active
[ CSSD]2013-10-26 00:58:27.907 [1222052160] >TRACE: clssgmSendClient: Send failed rc 6, con (0x2aaaac0ba2c0), client (0x2aaaac0ba5c0), proc ((nil))
[ CSSD]2013-10-26 00:58:27.915 [1222052160] >TRACE: clscsendx: (0x2aaaac0af160) Connection not active
[ CSSD]2013-10-26 00:58:27.915 [1222052160] >TRACE: clssgmSendClient: Send failed rc 6, con (0x2aaaac0af160), client (0x2aaaac0cfb80), proc ((nil))
[ CSSD]2013-10-26 00:58:27.915 [1222052160] >TRACE: clscsendx: (0x2aaaac0c43d0) Connection not active
[ CSSD]2013-10-26 00:58:27.915 [1222052160] >TRACE: clssgmSendClient: Send failed rc 6, con (0x2aaaac0c43d0), client (0x2aaaac0c46d0), proc ((nil))
[ CSSD]2013-10-26 00:58:27.915 [1222052160] >TRACE: clscsendx: (0x2aaaac0c4250) Connection not active
[ CSSD]2013-10-26 00:58:27.915 [1222052160] >TRACE: clssgmSendClient: Send failed rc 6, con (0x2aaaac0c4250), client (0x2aaaac0c4bd0), proc ((nil))
[ CSSD]2013-10-26 00:58:27.916 [1222052160] >TRACE: clscsendx: (0x2aaaac0c48f0) Connection not active
[ CSSD]2013-10-26 00:58:27.916 [1222052160] >TRACE: clssgmSendClient: Send failed rc 6, con (0x2aaaac0c48f0), client (0x2aaaac0c51a0), proc ((nil))
[ CSSD]2013-10-26 00:58:28.070 [1190582592] >TRACE: clssgmClientConnectMsg: Connect from con(0x2aaab0003870) proc(0x2aaab00058b0) pid() proto(10:2:1:1)
in /var/log/messages we got this:
Oct 2 16:08:34 xxxx kernel: racgmain[20258]: segfault at 0000000000000090 rip 000000000042d334 rsp 00000000422100d0 error 4
Oct 8 05:53:10 xxxx kernel: racgmain[19588]: segfault at 0000000000453004 rip 0000003179c08d41 rsp 00000000421d8098 error 7
Oct 24 06:16:15 xxxx kernel: racgmain[943]: segfault at 000000000000001c rip 00002afded1cb5c4 rsp 0000000040f0c0a0 error 4
Oct 26 00:58:27 xxxx kernel: racgmain[28827]: segfault at 0000000000453004 rip 0000003179c08d41 rsp 000000004239c098 error 7
Still, i checked out:
/u01/app/oracle/product/10.2.0/db_1/log/xxxx/racg/mdb.log
2013-10-02 16:08:41.956: [ RACG][4009281104] [24610][4009281104][ora.dbprod.dbprod2.inst]: CLSR-0001: Oracle error -1034 encountered
2013-10-08 05:53:18.161: [ RACG][2297886288] [19894][2297886288][ora.dbprod.dbprod2.inst]: CLSR-0001: Oracle error -1034 encountered
2013-10-24 06:16:23.186: [ RACG][1637254736] [1302][1637254736][ora.dbprod.dbprod2.inst]: CLSR-0001: Oracle error -1034 encountered
2013-10-26 00:58:29.870: [ RACG][3002013264] [28933][3002013264][ora.dbprod.dbprod2.inst]: CLSR-0001: Oracle error -1034 encountered
Well now i checked gv$resource_limit:
INST_ID RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_ALLOCATION LIMIT_VALUE
1 processes 98 150 500 500
1 sessions 99 151 555 555
1 enqueue_locks 58 136 6772 6772
1 enqueue_resources 59 166 2660 UNLIMITED
1 ges_procs 97 148 501 501
1 ges_ress 0 0 10920 UNLIMITED
1 ges_locks 0 0 16700 UNLIMITED
1 ges_cache_ress 915 1315 0 UNLIMITED
1 ges_reg_msgs 115 510 1750 UNLIMITED
1 ges_big_msgs 35 1018 1750 UNLIMITED
1 ges_rsv_msgs 0 0 1000 1000
1 gcs_resources 2222615 3706965 4242092 4242092
1 gcs_shadows 1941634 2410463 4242092 4242092
1 dml_locks 0 256 2440 UNLIMITED
1 temporary_table_locks 0 0 UNLIMITED UNLIMITED
1 transactions 2 14 610 UNLIMITED
1 branches 0 1 610 UNLIMITED
1 cmtcallbk 0 2 610 UNLIMITED
1 sort_segment_locks 3 21 UNLIMITED UNLIMITED
1 max_rollback_segments 13 13 610 65535
1 max_shared_servers 0 0 UNLIMITED UNLIMITED
1 parallel_max_servers 7 11 0 3600
2 processes 101 163 500 500
2 sessions 105 182 555 555
2 enqueue_locks 56 85 6772 6772
2 enqueue_resources 57 150 2660 UNLIMITED
2 ges_procs 100 161 501 501
2 ges_ress 0 0 10920 UNLIMITED
2 ges_locks 0 0 16700 UNLIMITED
2 ges_cache_ress 916 12581 0 UNLIMITED
2 ges_reg_msgs 121 541 1750 UNLIMITED
2 ges_big_msgs 32 543 1750 UNLIMITED
2 ges_rsv_msgs 0 0 1000 1000
2 gcs_resources 3080550 3478452 4466739 4466739
2 gcs_shadows 2134783 2435608 4466739 4466739
2 dml_locks 0 236 2440 UNLIMITED
2 temporary_table_locks 0 3 UNLIMITED UNLIMITED
2 transactions 1 14 610 UNLIMITED
2 branches 0 1 610 UNLIMITED
2 cmtcallbk 0 1 610 UNLIMITED
2 sort_segment_locks 2 22 UNLIMITED UNLIMITED
2 max_rollback_segments 17 17 610 65535
2 max_shared_servers 0 0 UNLIMITED UNLIMITED
2 parallel_max_servers 7 11 0 3600
For now i have no idea, i searched about CLRS-0001, not found many things.
Does anybody has a clue? I've never seem this problem before.
I'd appreciate any help.
Thanks in advance.
Regards,