Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

rac node frozen without any apparently reason

Samuel RabiniAug 26 2012 — edited Aug 28 2012

Hi,
I've an Oracle Database 11g Release 11.1.0.6.0 - 64bit Production With the Real Application Clusters option.

I've a 2 Nodes RAC.
I've 5 service active:
- EVODB, EVODB1, EVODB2 (service fro rac: the cluster one, the one of the node1, and the one of the node2)
- IPGW service active on node1 (for write intensive sessions)
- EVOREAD service active on node2 (for read intensive session)

From a week I'm experiencing a strange behavior.
Some time to time, with a frequency of once per day, more or less, node1 freeze.
I really do not know how to explain better.
The symptoms are that EVODB1 and IPGW won't response anymore.
If I try to connect with a simple sqlplus the connection hangs and when I hit ctrl+c to stop the connection I got the message below:

[oracle@dcsrv-evodb01 ~]$ sqlplus scott/tiger@IPGW

SQL*Plus: Release 11.1.0.6.0 - Production on Sun Aug 26 03:17:37 2012

[....here hangs hangs and hangs.... then after ctrl+c...]

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

Error accessing PRODUCT_USER_PROFILE
Warning:  Product user profile information not loaded!
You may need to run PUPBLD.SQL as SYSTEM
Disconnected from Oracle Database 11g Release 11.1.0.6.0 - 64bit Production
With the Real Application Clusters option

What this error mean?

I try to execute some command to check the healty of the services:

the crs_stat:

HA Resource                                   Target     State             
-----------                                   ------     -----             
ora.EVODB.EVODB1.inst                         ONLINE     ONLINE on dcsrv-evodb01
ora.EVODB.EVODB2.inst                         ONLINE     ONLINE on dcsrv-evodb02
ora.EVODB.EVOREAD.EVODB2.srv                  ONLINE     ONLINE on dcsrv-evodb02
ora.EVODB.EVOREAD.cs                          ONLINE     ONLINE on dcsrv-evodb02
ora.EVODB.IPGW.EVODB1.srv                     ONLINE     ONLINE on dcsrv-evodb01
ora.EVODB.IPGW.cs                             ONLINE     ONLINE on dcsrv-evodb01
ora.EVODB.db                                  ONLINE     ONLINE on dcsrv-evodb01
ora.dcsrv-evodb01.ASM1.asm                    ONLINE     ONLINE on dcsrv-evodb01
ora.dcsrv-evodb01.LISTENER_ASM_DCSRV-EVODB01.lsnr ONLINE     ONLINE on dcsrv-evodb01
ora.dcsrv-evodb01.LISTENER_DB_DCSRV-EVODB01.lsnr ONLINE     ONLINE on dcsrv-evodb01
ora.dcsrv-evodb01.gsd                         ONLINE     ONLINE on dcsrv-evodb01
ora.dcsrv-evodb01.ons                         ONLINE     ONLINE on dcsrv-evodb01
ora.dcsrv-evodb01.vip                         ONLINE     ONLINE on dcsrv-evodb01
ora.dcsrv-evodb02.ASM2.asm                    ONLINE     ONLINE on dcsrv-evodb02
ora.dcsrv-evodb02.LISTENER_ASM_DCSRV-EVODB02.lsnr ONLINE     ONLINE on dcsrv-evodb02
ora.dcsrv-evodb02.LISTENER_DB_DCSRV-EVODB02.lsnr ONLINE     ONLINE on dcsrv-evodb02
ora.dcsrv-evodb02.gsd                         ONLINE     ONLINE on dcsrv-evodb02
ora.dcsrv-evodb02.ons                         ONLINE     ONLINE on dcsrv-evodb02
ora.dcsrv-evodb02.vip                         ONLINE     ONLINE on dcsrv-evodb02

the lsnrctl status

[oracle@dcsrv-evodb01 ~]$ lsnrctl status

LSNRCTL for Linux: Version 11.1.0.6.0 - Production on 26-AUG-2012 03:20:55

Copyright (c) 1991, 2007, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER_DB_DCSRV-EVODB01
Version                   TNSLSNR for Linux: Version 11.1.0.6.0 - Production
Start Date                18-AUG-2012 06:43:09
Uptime                    7 days 20 hr. 37 min. 46 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /u01/app/oracle/product/11.1.0/db1/network/admin/listener.ora
Listener Log File         /u01/app/oracle/diag/tnslsnr/dcsrv-evodb01/listener_db_dcsrv-evodb01/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.81.10.130)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.81.10.30)(PORT=1521)))
Services Summary...
Service "EVODB" has 2 instance(s).
  Instance "EVODB1", status READY, has 3 handler(s) for this service...
  Instance "EVODB2", status READY, has 2 handler(s) for this service...
Service "EVODBXDB" has 2 instance(s).
  Instance "EVODB1", status READY, has 1 handler(s) for this service...
  Instance "EVODB2", status READY, has 1 handler(s) for this service...
Service "EVODB_XPT" has 2 instance(s).
  Instance "EVODB1", status READY, has 3 handler(s) for this service...
  Instance "EVODB2", status READY, has 2 handler(s) for this service...
Service "EVOREAD" has 1 instance(s).
  Instance "EVODB2", status READY, has 2 handler(s) for this service...
Service "IPGW" has 1 instance(s).
  Instance "EVODB1", status READY, has 3 handler(s) for this service...
The command completed successfully

the lsnrctl service

[oracle@dcsrv-evodb01 ~]$ lsnrctl service

LSNRCTL for Linux: Version 11.1.0.6.0 - Production on 26-AUG-2012 03:21:05

Copyright (c) 1991, 2007, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
Services Summary...
Service "EVODB" has 2 instance(s).
  Instance "EVODB1", status READY, has 3 handler(s) for this service...
    Handler(s):
      "N000" established:0 refused:0 current:0 max:679 state:ready
         CMON <machine: dcsrv-evodb01, pid: 8070>
         (ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb01.altea.net)(PORT=33035))
      "DEDICATED" established:0 refused:0 state:ready
         REMOTE SERVER
         (ADDRESS=(PROTOCOL=TCP)(HOST=dcsrv-evodb01-vip.altea.net)(PORT=1521))
      "DEDICATED" established:348 refused:0 state:ready
         LOCAL SERVER
  Instance "EVODB2", status READY, has 2 handler(s) for this service...
    Handler(s):
      "N000" established:4242 refused:0 current:133 max:679 state:ready
         CMON <machine: dcsrv-evodb02, pid: 9915>
         (ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb02.altea.net)(PORT=32066))
      "DEDICATED" established:5453 refused:0 state:ready
         REMOTE SERVER
         (ADDRESS=(PROTOCOL=TCP)(HOST=dcsrv-evodb02-vip.altea.net)(PORT=1521))
Service "EVODBXDB" has 2 instance(s).
  Instance "EVODB1", status READY, has 1 handler(s) for this service...
    Handler(s):
      "D000" established:0 refused:0 current:0 max:972 state:ready
         DISPATCHER <machine: dcsrv-evodb01, pid: 7835>
         (ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb01.altea.net)(PORT=62644))
  Instance "EVODB2", status READY, has 1 handler(s) for this service...
    Handler(s):
      "D000" established:0 refused:0 current:0 max:972 state:ready
         DISPATCHER <machine: dcsrv-evodb02, pid: 1312>
         (ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb02.altea.net)(PORT=14541))
Service "EVODB_XPT" has 2 instance(s).
  Instance "EVODB1", status READY, has 3 handler(s) for this service...
    Handler(s):
      "N000" established:0 refused:0 current:0 max:679 state:ready
         CMON <machine: dcsrv-evodb01, pid: 8070>
         (ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb01.altea.net)(PORT=33035))
      "DEDICATED" established:0 refused:0 state:ready
         REMOTE SERVER
         (ADDRESS=(PROTOCOL=TCP)(HOST=dcsrv-evodb01-vip.altea.net)(PORT=1521))
      "DEDICATED" established:348 refused:0 state:ready
         LOCAL SERVER
  Instance "EVODB2", status READY, has 2 handler(s) for this service...
    Handler(s):
      "N000" established:4242 refused:0 current:133 max:679 state:ready
         CMON <machine: dcsrv-evodb02, pid: 9915>
         (ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb02.altea.net)(PORT=32066))
      "DEDICATED" established:5453 refused:0 state:ready
         REMOTE SERVER
         (ADDRESS=(PROTOCOL=TCP)(HOST=dcsrv-evodb02-vip.altea.net)(PORT=1521))
Service "EVOREAD" has 1 instance(s).
  Instance "EVODB2", status READY, has 2 handler(s) for this service...
    Handler(s):
      "N000" established:4242 refused:0 current:133 max:679 state:ready
         CMON <machine: dcsrv-evodb02, pid: 9915>
         (ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb02.altea.net)(PORT=32066))
      "DEDICATED" established:5453 refused:0 state:ready
         REMOTE SERVER
         (ADDRESS=(PROTOCOL=TCP)(HOST=dcsrv-evodb02-vip.altea.net)(PORT=1521))
Service "IPGW" has 1 instance(s).
  Instance "EVODB1", status READY, has 3 handler(s) for this service...
    Handler(s):
      "N000" established:0 refused:0 current:0 max:679 state:ready
         CMON <machine: dcsrv-evodb01, pid: 8070>
         (ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb01.altea.net)(PORT=33035))
      "DEDICATED" established:0 refused:0 state:ready
         REMOTE SERVER
         (ADDRESS=(PROTOCOL=TCP)(HOST=dcsrv-evodb01-vip.altea.net)(PORT=1521))
      "DEDICATED" established:348 refused:0 state:ready
         LOCAL SERVER
The command completed successfully

Everything seems to be perfectly working.

The only action I can take is to shutdown the instance and startup it again.
For this I use the srvctl utility that shutdown the instance in abort mode.

In the alert log I do no see anything.
This is the extract of the log at the freeze time and at the reboot time:

Sun Aug 26 01:47:25 2012
Thread 1 advanced to log sequence 311059
  Current log# 13 seq# 311059 mem# 0: +ONLINELOG/evodb/onlinelog/group_13.268.729333867
Sun Aug 26 01:47:25 2012
SUCCESS: diskgroup ARCHIVELOG was mounted
Sun Aug 26 01:47:35 2012
SUCCESS: diskgroup ARCHIVELOG was dismounted
Sun Aug 26 02:29:35 2012
SUCCESS: diskgroup ARCHIVELOG was mounted
Sun Aug 26 02:29:40 2012
SUCCESS: diskgroup ARCHIVELOG was dismounted
Sun Aug 26 02:47:23 2012
Thread 1 advanced to log sequence 311060
  Current log# 10 seq# 311060 mem# 0: +ONLINELOG/evodb/onlinelog/group_10.265.729333851
Sun Aug 26 02:47:23 2012
SUCCESS: diskgroup ARCHIVELOG was mounted
*Sun Aug 26 02:47:31 2012*
*SUCCESS: diskgroup ARCHIVELOG was dismounted*
*Sun Aug 26 03:24:12 2012*
*Shutting down instance (abort)*
*License high water mark = 331*
*USER (ospid: 1940): terminating the instance*
Sun Aug 26 03:24:13 2012
ORA-30006 : opidrv aborting process unknown ospid (386_46984511656352)
Sun Aug 26 03:24:13 2012
ORA-30006 : opidrv aborting process unknown ospid (403_47991363436960)
Sun Aug 26 03:24:13 2012
ORA-30006 : opidrv aborting process unknown ospid (719_47102804537760)
Sun Aug 26 03:24:13 2012
ORA-28 : opidrv aborting process unknown ospid (28062_46969640935840)
Sun Aug 26 03:24:13 2012
ORA-28 : opidrv aborting process unknown ospid (8123_47601996018080)
Sun Aug 26 03:24:13 2012
ORA-28 : opidrv aborting process unknown ospid (20083_47947053900192)
Sun Aug 26 03:24:13 2012
ORA-30006 : opidrv aborting process unknown ospid (727_47348591145376)
Sun Aug 26 03:24:13 2012
ORA-28 : opidrv aborting process unknown ospid (332_47161478062496)
Sun Aug 26 03:24:14 2012
ORA-28 : opidrv aborting process unknown ospid (380_47227700429216)
Sun Aug 26 03:24:14 2012
ORA-28 : opidrv aborting process unknown ospid (564_47044764156320)
Sun Aug 26 03:24:15 2012
ORA-28 : opidrv aborting process unknown ospid (725_47850787893664)
…… [other aborting process entries]........
Sun Aug 26 03:24:23 2012
Instance terminated by USER, pid = 1940
Sun Aug 26 03:24:25 2012
Instance shutdown complete

Really don't know what is happening and what to check to identify the problem.
Do you have any suggestions?

Thanks in advance,
Samuel

Locked Post

New comments cannot be posted to this locked post.

Locked on Sep 25 2012

Added on Aug 26 2012

#performance-availability, #real-application-clusters

11 comments

734 views