Hi,
I've an Oracle Database 11g Release 11.1.0.6.0 - 64bit Production With the Real Application Clusters option.
I've a 2 Nodes RAC.
I've 5 service active:
- EVODB, EVODB1, EVODB2 (service fro rac: the cluster one, the one of the node1, and the one of the node2)
- IPGW service active on node1 (for write intensive sessions)
- EVOREAD service active on node2 (for read intensive session)
From a week I'm experiencing a strange behavior.
Some time to time, with a frequency of once per day, more or less, node1 freeze.
I really do not know how to explain better.
The symptoms are that EVODB1 and IPGW won't response anymore.
If I try to connect with a simple sqlplus the connection hangs and when I hit ctrl+c to stop the connection I got the message below:
[oracle@dcsrv-evodb01 ~]$ sqlplus scott/tiger@IPGW
SQL*Plus: Release 11.1.0.6.0 - Production on Sun Aug 26 03:17:37 2012
[....here hangs hangs and hangs.... then after ctrl+c...]
Copyright (c) 1982, 2007, Oracle. All rights reserved.
Error accessing PRODUCT_USER_PROFILE
Warning: Product user profile information not loaded!
You may need to run PUPBLD.SQL as SYSTEM
Disconnected from Oracle Database 11g Release 11.1.0.6.0 - 64bit Production
With the Real Application Clusters option
What this error mean?
I try to execute some command to check the healty of the services:
the crs_stat:
HA Resource Target State
----------- ------ -----
ora.EVODB.EVODB1.inst ONLINE ONLINE on dcsrv-evodb01
ora.EVODB.EVODB2.inst ONLINE ONLINE on dcsrv-evodb02
ora.EVODB.EVOREAD.EVODB2.srv ONLINE ONLINE on dcsrv-evodb02
ora.EVODB.EVOREAD.cs ONLINE ONLINE on dcsrv-evodb02
ora.EVODB.IPGW.EVODB1.srv ONLINE ONLINE on dcsrv-evodb01
ora.EVODB.IPGW.cs ONLINE ONLINE on dcsrv-evodb01
ora.EVODB.db ONLINE ONLINE on dcsrv-evodb01
ora.dcsrv-evodb01.ASM1.asm ONLINE ONLINE on dcsrv-evodb01
ora.dcsrv-evodb01.LISTENER_ASM_DCSRV-EVODB01.lsnr ONLINE ONLINE on dcsrv-evodb01
ora.dcsrv-evodb01.LISTENER_DB_DCSRV-EVODB01.lsnr ONLINE ONLINE on dcsrv-evodb01
ora.dcsrv-evodb01.gsd ONLINE ONLINE on dcsrv-evodb01
ora.dcsrv-evodb01.ons ONLINE ONLINE on dcsrv-evodb01
ora.dcsrv-evodb01.vip ONLINE ONLINE on dcsrv-evodb01
ora.dcsrv-evodb02.ASM2.asm ONLINE ONLINE on dcsrv-evodb02
ora.dcsrv-evodb02.LISTENER_ASM_DCSRV-EVODB02.lsnr ONLINE ONLINE on dcsrv-evodb02
ora.dcsrv-evodb02.LISTENER_DB_DCSRV-EVODB02.lsnr ONLINE ONLINE on dcsrv-evodb02
ora.dcsrv-evodb02.gsd ONLINE ONLINE on dcsrv-evodb02
ora.dcsrv-evodb02.ons ONLINE ONLINE on dcsrv-evodb02
ora.dcsrv-evodb02.vip ONLINE ONLINE on dcsrv-evodb02
the lsnrctl status
[oracle@dcsrv-evodb01 ~]$ lsnrctl status
LSNRCTL for Linux: Version 11.1.0.6.0 - Production on 26-AUG-2012 03:20:55
Copyright (c) 1991, 2007, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias LISTENER_DB_DCSRV-EVODB01
Version TNSLSNR for Linux: Version 11.1.0.6.0 - Production
Start Date 18-AUG-2012 06:43:09
Uptime 7 days 20 hr. 37 min. 46 sec
Trace Level off
Security ON: Local OS Authentication
SNMP OFF
Listener Parameter File /u01/app/oracle/product/11.1.0/db1/network/admin/listener.ora
Listener Log File /u01/app/oracle/diag/tnslsnr/dcsrv-evodb01/listener_db_dcsrv-evodb01/alert/log.xml
Listening Endpoints Summary...
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.81.10.130)(PORT=1521)))
(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.81.10.30)(PORT=1521)))
Services Summary...
Service "EVODB" has 2 instance(s).
Instance "EVODB1", status READY, has 3 handler(s) for this service...
Instance "EVODB2", status READY, has 2 handler(s) for this service...
Service "EVODBXDB" has 2 instance(s).
Instance "EVODB1", status READY, has 1 handler(s) for this service...
Instance "EVODB2", status READY, has 1 handler(s) for this service...
Service "EVODB_XPT" has 2 instance(s).
Instance "EVODB1", status READY, has 3 handler(s) for this service...
Instance "EVODB2", status READY, has 2 handler(s) for this service...
Service "EVOREAD" has 1 instance(s).
Instance "EVODB2", status READY, has 2 handler(s) for this service...
Service "IPGW" has 1 instance(s).
Instance "EVODB1", status READY, has 3 handler(s) for this service...
The command completed successfully
the lsnrctl service
[oracle@dcsrv-evodb01 ~]$ lsnrctl service
LSNRCTL for Linux: Version 11.1.0.6.0 - Production on 26-AUG-2012 03:21:05
Copyright (c) 1991, 2007, Oracle. All rights reserved.
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
Services Summary...
Service "EVODB" has 2 instance(s).
Instance "EVODB1", status READY, has 3 handler(s) for this service...
Handler(s):
"N000" established:0 refused:0 current:0 max:679 state:ready
CMON <machine: dcsrv-evodb01, pid: 8070>
(ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb01.altea.net)(PORT=33035))
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=dcsrv-evodb01-vip.altea.net)(PORT=1521))
"DEDICATED" established:348 refused:0 state:ready
LOCAL SERVER
Instance "EVODB2", status READY, has 2 handler(s) for this service...
Handler(s):
"N000" established:4242 refused:0 current:133 max:679 state:ready
CMON <machine: dcsrv-evodb02, pid: 9915>
(ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb02.altea.net)(PORT=32066))
"DEDICATED" established:5453 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=dcsrv-evodb02-vip.altea.net)(PORT=1521))
Service "EVODBXDB" has 2 instance(s).
Instance "EVODB1", status READY, has 1 handler(s) for this service...
Handler(s):
"D000" established:0 refused:0 current:0 max:972 state:ready
DISPATCHER <machine: dcsrv-evodb01, pid: 7835>
(ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb01.altea.net)(PORT=62644))
Instance "EVODB2", status READY, has 1 handler(s) for this service...
Handler(s):
"D000" established:0 refused:0 current:0 max:972 state:ready
DISPATCHER <machine: dcsrv-evodb02, pid: 1312>
(ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb02.altea.net)(PORT=14541))
Service "EVODB_XPT" has 2 instance(s).
Instance "EVODB1", status READY, has 3 handler(s) for this service...
Handler(s):
"N000" established:0 refused:0 current:0 max:679 state:ready
CMON <machine: dcsrv-evodb01, pid: 8070>
(ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb01.altea.net)(PORT=33035))
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=dcsrv-evodb01-vip.altea.net)(PORT=1521))
"DEDICATED" established:348 refused:0 state:ready
LOCAL SERVER
Instance "EVODB2", status READY, has 2 handler(s) for this service...
Handler(s):
"N000" established:4242 refused:0 current:133 max:679 state:ready
CMON <machine: dcsrv-evodb02, pid: 9915>
(ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb02.altea.net)(PORT=32066))
"DEDICATED" established:5453 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=dcsrv-evodb02-vip.altea.net)(PORT=1521))
Service "EVOREAD" has 1 instance(s).
Instance "EVODB2", status READY, has 2 handler(s) for this service...
Handler(s):
"N000" established:4242 refused:0 current:133 max:679 state:ready
CMON <machine: dcsrv-evodb02, pid: 9915>
(ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb02.altea.net)(PORT=32066))
"DEDICATED" established:5453 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=dcsrv-evodb02-vip.altea.net)(PORT=1521))
Service "IPGW" has 1 instance(s).
Instance "EVODB1", status READY, has 3 handler(s) for this service...
Handler(s):
"N000" established:0 refused:0 current:0 max:679 state:ready
CMON <machine: dcsrv-evodb01, pid: 8070>
(ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb01.altea.net)(PORT=33035))
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=dcsrv-evodb01-vip.altea.net)(PORT=1521))
"DEDICATED" established:348 refused:0 state:ready
LOCAL SERVER
The command completed successfully
Everything seems to be perfectly working.
The only action I can take is to shutdown the instance and startup it again.
For this I use the srvctl utility that shutdown the instance in abort mode.
In the alert log I do no see anything.
This is the extract of the log at the freeze time and at the reboot time:
Sun Aug 26 01:47:25 2012
Thread 1 advanced to log sequence 311059
Current log# 13 seq# 311059 mem# 0: +ONLINELOG/evodb/onlinelog/group_13.268.729333867
Sun Aug 26 01:47:25 2012
SUCCESS: diskgroup ARCHIVELOG was mounted
Sun Aug 26 01:47:35 2012
SUCCESS: diskgroup ARCHIVELOG was dismounted
Sun Aug 26 02:29:35 2012
SUCCESS: diskgroup ARCHIVELOG was mounted
Sun Aug 26 02:29:40 2012
SUCCESS: diskgroup ARCHIVELOG was dismounted
Sun Aug 26 02:47:23 2012
Thread 1 advanced to log sequence 311060
Current log# 10 seq# 311060 mem# 0: +ONLINELOG/evodb/onlinelog/group_10.265.729333851
Sun Aug 26 02:47:23 2012
SUCCESS: diskgroup ARCHIVELOG was mounted
*Sun Aug 26 02:47:31 2012*
*SUCCESS: diskgroup ARCHIVELOG was dismounted*
*Sun Aug 26 03:24:12 2012*
*Shutting down instance (abort)*
*License high water mark = 331*
*USER (ospid: 1940): terminating the instance*
Sun Aug 26 03:24:13 2012
ORA-30006 : opidrv aborting process unknown ospid (386_46984511656352)
Sun Aug 26 03:24:13 2012
ORA-30006 : opidrv aborting process unknown ospid (403_47991363436960)
Sun Aug 26 03:24:13 2012
ORA-30006 : opidrv aborting process unknown ospid (719_47102804537760)
Sun Aug 26 03:24:13 2012
ORA-28 : opidrv aborting process unknown ospid (28062_46969640935840)
Sun Aug 26 03:24:13 2012
ORA-28 : opidrv aborting process unknown ospid (8123_47601996018080)
Sun Aug 26 03:24:13 2012
ORA-28 : opidrv aborting process unknown ospid (20083_47947053900192)
Sun Aug 26 03:24:13 2012
ORA-30006 : opidrv aborting process unknown ospid (727_47348591145376)
Sun Aug 26 03:24:13 2012
ORA-28 : opidrv aborting process unknown ospid (332_47161478062496)
Sun Aug 26 03:24:14 2012
ORA-28 : opidrv aborting process unknown ospid (380_47227700429216)
Sun Aug 26 03:24:14 2012
ORA-28 : opidrv aborting process unknown ospid (564_47044764156320)
Sun Aug 26 03:24:15 2012
ORA-28 : opidrv aborting process unknown ospid (725_47850787893664)
…… [other aborting process entries]........
Sun Aug 26 03:24:23 2012
Instance terminated by USER, pid = 1940
Sun Aug 26 03:24:25 2012
Instance shutdown complete
Really don't know what is happening and what to check to identify the problem.
Do you have any suggestions?
Thanks in advance,
Samuel