Hi,
I work on an Oracle Database 11g Release 11.1.0.6.0 - 64bit Production With the Real Application Clusters option.
Tonight I experienced a very strange situation I'm not able to understand.
I've my RAC with these services:
EVODB (the rac one)
EVODB1 (point to node 1)
EVODB2 (point to node 2)
EVOREAD (node 2 is the preferred, node 1 is the available)
IPGW (node 1 is the preferred, node 2 is the available)
Read intensive applications use EVOREAD
Write intensive applications use IPGW
Web is using EVOREAD.
Tonight web (via php) was not able to connecto to the database, returning this error:
ORA-03135: connection lost contact
I then checked the services, but crsstat was perfect:
[oracle@dcsrv-evodb02 ~]$ crsstat
HA Resource Target State
----------- ------ -----
ora.EVODB.EVODB1.inst ONLINE ONLINE on dcsrv-evodb01
ora.EVODB.EVODB2.inst ONLINE ONLINE on dcsrv-evodb02
ora.EVODB.EVOREAD.EVODB2.srv ONLINE ONLINE on dcsrv-evodb02
ora.EVODB.EVOREAD.cs ONLINE ONLINE on dcsrv-evodb02
ora.EVODB.IPGW.EVODB1.srv ONLINE ONLINE on dcsrv-evodb01
ora.EVODB.IPGW.cs ONLINE ONLINE on dcsrv-evodb01
ora.EVODB.db ONLINE ONLINE on dcsrv-evodb01
ora.dcsrv-evodb01.ASM1.asm ONLINE ONLINE on dcsrv-evodb01
ora.dcsrv-evodb01.LISTENER_ASM_DCSRV-EVODB01.lsnr ONLINE ONLINE on dcsrv-evodb01
ora.dcsrv-evodb01.LISTENER_DB_DCSRV-EVODB01.lsnr ONLINE ONLINE on dcsrv-evodb01
ora.dcsrv-evodb01.gsd ONLINE ONLINE on dcsrv-evodb01
ora.dcsrv-evodb01.ons ONLINE ONLINE on dcsrv-evodb01
ora.dcsrv-evodb01.vip ONLINE ONLINE on dcsrv-evodb01
ora.dcsrv-evodb02.ASM2.asm ONLINE ONLINE on dcsrv-evodb02
ora.dcsrv-evodb02.LISTENER_ASM_DCSRV-EVODB02.lsnr ONLINE ONLINE on dcsrv-evodb02
ora.dcsrv-evodb02.LISTENER_DB_DCSRV-EVODB02.lsnr ONLINE ONLINE on dcsrv-evodb02
ora.dcsrv-evodb02.gsd ONLINE ONLINE on dcsrv-evodb02
ora.dcsrv-evodb02.ons ONLINE ONLINE on dcsrv-evodb02
ora.dcsrv-evodb02.vip ONLINE ONLINE on dcsrv-evodb02
lsnrctl service as well:
Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
Services Summary...
Service "EVODB" has 2 instance(s).
Instance "EVODB1", status READY, has 2 handler(s) for this service...
Handler(s):
"N000" established:0 refused:0 current:0 max:679 state:ready
CMON <machine: dcsrv-evodb01, pid: 6985>
(ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb01.xxxxx.xxx)(PORT=46498))
"DEDICATED" established:52 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=dcsrv-evodb01-vip.xxxxx.xxx)(PORT=1521))
Instance "EVODB2", status READY, has 3 handler(s) for this service...
Handler(s):
"N000" established:29 refused:0 current:70 max:679 state:ready
CMON <machine: dcsrv-evodb02, pid: 26709>
(ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb02.xxxxx.xxx)(PORT=61966))
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=dcsrv-evodb02-vip.xxxxx.xxx)(PORT=1521))
"DEDICATED" established:290 refused:0 state:ready
LOCAL SERVER
Service "EVODBXDB" has 2 instance(s).
Instance "EVODB1", status READY, has 1 handler(s) for this service...
Handler(s):
"D000" established:0 refused:0 current:0 max:972 state:ready
DISPATCHER <machine: dcsrv-evodb01, pid: 4494>
(ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb01.xxxxx.xxx)(PORT=57381))
Instance "EVODB2", status READY, has 1 handler(s) for this service...
Handler(s):
"D000" established:0 refused:0 current:0 max:972 state:ready
DISPATCHER <machine: dcsrv-evodb02, pid: 26499>
(ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb02.xxxxx.xxx)(PORT=34877))
Service "EVODB_XPT" has 2 instance(s).
Instance "EVODB1", status READY, has 2 handler(s) for this service...
Handler(s):
"N000" established:0 refused:0 current:0 max:679 state:ready
CMON <machine: dcsrv-evodb01, pid: 6985>
(ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb01.xxxxx.xxx)(PORT=46498))
"DEDICATED" established:52 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=dcsrv-evodb01-vip.xxxxx.xxx)(PORT=1521))
Instance "EVODB2", status READY, has 3 handler(s) for this service...
Handler(s):
"N000" established:29 refused:0 current:70 max:679 state:ready
CMON <machine: dcsrv-evodb02, pid: 26709>
(ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb02.xxxxx.xxx)(PORT=61966))
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=dcsrv-evodb02-vip.xxxxx.xxx)(PORT=1521))
"DEDICATED" established:290 refused:0 state:ready
LOCAL SERVER
Service "EVOREAD" has 1 instance(s).
Instance "EVODB2", status READY, has 3 handler(s) for this service...
Handler(s):
"N000" established:29 refused:0 current:70 max:679 state:ready
CMON <machine: dcsrv-evodb02, pid: 26709>
(ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb02.xxxxx.xxx)(PORT=61966))
"DEDICATED" established:0 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=dcsrv-evodb02-vip.xxxxx.xxx)(PORT=1521))
"DEDICATED" established:290 refused:0 state:ready
LOCAL SERVER
Service "IPGW" has 1 instance(s).
Instance "EVODB1", status READY, has 2 handler(s) for this service...
Handler(s):
"N000" established:0 refused:0 current:0 max:679 state:ready
CMON <machine: dcsrv-evodb01, pid: 6985>
(ADDRESS=(PROTOCOL=tcp)(HOST=dcsrv-evodb01.xxxxx.xxx)(PORT=46498))
"DEDICATED" established:52 refused:0 state:ready
REMOTE SERVER
(ADDRESS=(PROTOCOL=TCP)(HOST=dcsrv-evodb01-vip.xxxxx.xxx)(PORT=1521))
I tried to connecto to database from webserver machine using sqlplus @EVOREAD and I had no problem!
I was able to query normally.
Another app that was using EVOREAD was running without any problem.
No error in the alert log of both nodes.
I then restarted the service EVOREAD. Once up again, sqlplus from webserver machines stopped to work, returning me:
ORA-30006: resource busy; acquire with WAIT timeout expired
While restarting EVOREAD serivice tens of this error has been written down into the alert log of node2:
Tue Apr 03 02:51:46 2012
ORA-30006 : opiodr aborting process L001 ospid (21941_46960412512672)
Tue Apr 03 02:51:46 2012
Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x1D72F3ED4] [PC:0xE03F9D, opitsk()+6977]
Errors in file /u01/app/oracle/diag/rdbms/evodb/EVODB2/trace/EVODB2_l001_21941.trc (incident=451907):
ORA-07445: exception encountered: core dump [opitsk()+6977] [SIGSEGV] [ADDR:0x1D72F3ED4] [PC:0xE03F9D] [Address not mapped to object] []
ORA-30006: resource busy; acquire with WAIT timeout expired
ORA-30006: resource busy; acquire with WAIT timeout expired
Incident details in: /u01/app/oracle/diag/rdbms/evodb/EVODB2/incident/incdir_451907/EVODB2_l001_21941_i451907.trc
Tue Apr 03 02:52:11 2012
k2g_dtp_stop_svc(): Error occured while stopping service [EVOREAD]; some transactions might not have been completely cleaned up
At the end I restart the instance on node2 using srvctl utility and suddenly the instance was shutdown with abort (without trying to close it normally as usual):
Tue Apr 03 02:58:36 2012
Shutting down instance (abort)
License high water mark = 74
USER (ospid: 25761): terminating the instance
Instance terminated by USER, pid = 25761
Tue Apr 03 02:58:41 2012
Instance shutdown complete
Once up and re-moved EVOREAD on node2 (during the instance restart it had been moved to node1), everything started to work fine again.
I really didn't understood the problem: at first look everything seemed to work fine (sqlplus, crsstat, a lor of other app).
What does the ORA-30006: after sqlplus mean?
Why, I start to get that error only after the restart of the service?
And, well, usually the instance reboot solve any kind of "process" or server resource problem...no doubt about that.
Any suggestion on how to detect the problem?
Thanks in advance