Hello all.
We are having a few issues with a specific set up here at work involving Oracle 11, and Oracle 9 databases and I was hoping someone with a fair idea of how Oracle configurations work when it comes to network connectivity and data transfer would mind sharing their opinion on the matter.
First off, a bit of background. I'm a network security engineer by trade and my experience when it comes to the application side of things, specifically databases is inherently weak; so I apologise if my terminology or logic is slightly off here.
Basically what I'm trying to determine is where a fault lies between our users using a terminal server and a remote Oracle SQL database that should service their requests.
The problem lies wherein the user will utilise the 'sqlplus' application invoked from a Windows command prompt window, and expect to be able login and query a database. I believe we have two versions available to use, version 9 which is not actually in production but able to be used for testing and version 11 which is active in production.
When accessing Oracle 11 servers will hang where we expect to see a successful connection followed by a healthy looking "SQL>" prompt data transfer appears to stall as follows:
C:\>sqlplus username/password@blah.world
SQL*Plus: Release 10.2.0.1.0 - Production on Wed Sep 22 18:12:17 2010
Copyright (c) 1982, 2005, Oracle. All rights reserved.
*hangs here*
If we try on the Oracle 9 setup things look fine initially:
C:\>sqlplus username/password@blah.world
SQL*Plus: Release 10.2.0.1.0 - Production on Wed Sep 22 18:19:20 2010
Copyright (c) 1982, 2005, Oracle. All rights reserved.
Connected to:
Oracle9i Enterprise Edition Release 9.2.0.6.0 - Production
With the Partitioning, OLAP and Oracle Data Mining options
JServer Release 9.2.0.6.0 - Production
However once connected to the Oracle 9 box; if we run a query similar to:
sqlplus username/password@blah.world
select * from <database> where rownum < 10;
This will again hang.
That said however, if we try and run a query similar to:
sqlplus username/password@blah.world
select * from <database> where rownum < 5;
This will return 4 rows of usable data, without issue.
Our systems engineer provided me with a SQLNET trace from the server side and believes he's identified where it occurs:
[21-SEP-2010 16:06:42:989] nsdo: entry
[21-SEP-2010 16:06:42:989] nsdo: cid=0, opcode=85, *bl=0, *what=0, uflgs=0x0, cflgs=0x3
[21-SEP-2010 16:06:42:989] nsdo: rank=64, nsctxrnk=0
[21-SEP-2010 16:06:42:990] nsdo: nsctx: state=8, flg=0x420c, mvd=0
[21-SEP-2010 16:06:42:990] nsdo: gtn=156, gtc=156, ptn=10, ptc=2011
[21-SEP-2010 16:06:42:990] nsdo: switching to application buffer
[21-SEP-2010 16:06:42:990] nsrdr: entry
[21-SEP-2010 16:06:42:990] nsrdr: recving a packet
[21-SEP-2010 16:06:42:990] nsprecv: entry
[21-SEP-2010 16:06:42:990] nsprecv: reading from transport...
[21-SEP-2010 16:06:42:990] nttrd: entry
#
# HANG OCCURS HERE
#
[21-SEP-2010 16:10:13:347] ntt2err: entry
[21-SEP-2010 16:10:13:347] ntt2err: soc 25 error - operation=5, ntresnt[0]=517, ntresnt[1]=131, ntresnt[2]=0
[21-SEP-2010 16:10:13:347] ntt2err: exit
[21-SEP-2010 16:10:13:347] nttrd: exit
[21-SEP-2010 16:10:13:347] nsprecv: transport read error
[21-SEP-2010 16:10:13:347] nsprecv: error exit
[21-SEP-2010 16:10:13:347] nserror: entry
[21-SEP-2010 16:10:13:347] nserror: nsres: id=0, op=68, ns=12547, ns2=12560; nt[0]=517, nt[1]=131, nt[2]=0; ora[0]=0, ora[1]=0, ora[2]=0
[21-SEP-2010 16:10:13:348] nsrdr: error exit
[21-SEP-2010 16:10:13:348] nsdo: nsctxrnk=0
[21-SEP-2010 16:10:13:348] nsdo: error exit
[21-SEP-2010 16:10:13:348] nioqrc: wanted 1 got 0, type 0
[21-SEP-2010 16:10:13:348] nioqper: error from nioqrc
[21-SEP-2010 16:10:13:348] nioqper: nr err code: 0
[21-SEP-2010 16:10:13:348] nioqper: ns main err code: 12547
[21-SEP-2010 16:10:13:348] nioqper: ns (2) err code: 12560
[21-SEP-2010 16:10:13:348] nioqper: nt main err code: 517
[21-SEP-2010 16:10:13:348] nioqper: nt (2) err code: 131
[21-SEP-2010 16:10:13:349] nioqper: nt OS err code: 0
[21-SEP-2010 16:10:13:349] nioqer: entry
[21-SEP-2010 16:10:13:349] nioqer: incoming err = 12151
[21-SEP-2010 16:10:13:349] nioqce: entry
[21-SEP-2010 16:10:13:349] nioqce: exit
[21-SEP-2010 16:10:13:349] nioqer: returning err = 3113
[21-SEP-2010 16:10:13:349] nioqer: exit
[21-SEP-2010 16:10:13:349] nioqrc: exit
[21-SEP-2010 16:10:13:349] nioqds: entry
[21-SEP-2010 16:10:13:349] nioqds: disconnecting...
[21-SEP-2010 16:10:13:349] nsdo: entry
[21-SEP-2010 16:10:13:349] nsdo: cid=0, opcode=67, *bl=0, *what=1, uflgs=0x2, cflgs=0x3
[21-SEP-2010 16:10:13:350] nsdo: rank=64, nsctxrnk=0
[21-SEP-2010 16:10:13:350] nsdo: nsctx: state=1, flg=0x420c, mvd=0
[21-SEP-2010 16:10:13:350] nsdo: nsctxrnk=0
[21-SEP-2010 16:10:13:350] nsdo: error exit
From the client log side, it looks like this:
[21-SEP-2010 16:06:42:886] nsdo: entry
[21-SEP-2010 16:06:42:886] nsdo: cid=0, opcode=84, *bl=0, *what=1, uflgs=0x20, cflgs=0x3
[21-SEP-2010 16:06:42:886] nsdo: rank=64, nsctxrnk=0
[21-SEP-2010 16:06:42:886] nsdo: nsctx: state=8, flg=0x400d, mvd=0
[21-SEP-2010 16:06:42:886] nsdo: gtn=127, gtc=127, ptn=10, ptc=2011
[21-SEP-2010 16:06:42:886] nsdofls: entry
[21-SEP-2010 16:06:42:886] nsdofls: DATA flags: 0x0
[21-SEP-2010 16:06:42:886] nsdofls: sending NSPTDA packet
[21-SEP-2010 16:06:42:886] nspsend: entry
[21-SEP-2010 16:06:42:886] nspsend: plen=17, type=6
[21-SEP-2010 16:06:42:886] nttwr: entry
[21-SEP-2010 16:06:42:886] nttwr: socket 1724 had bytes written=17
[21-SEP-2010 16:06:42:886] nttwr: exit
[21-SEP-2010 16:06:42:886] nspsend: packet dump
[21-SEP-2010 16:06:42:886] nspsend: 00 11 00 00 06 00 00 00 |........|
[21-SEP-2010 16:06:42:886] nspsend: 00 00 03 05 1C 01 01 01 |........|
[21-SEP-2010 16:06:42:886] nspsend: 0F |. |
[21-SEP-2010 16:06:42:886] nspsend: 17 bytes to transport
[21-SEP-2010 16:06:42:886] nspsend: normal exit
[21-SEP-2010 16:06:42:886] nsdofls: exit (0)
[21-SEP-2010 16:06:42:886] nsdo: nsctxrnk=0
[21-SEP-2010 16:06:42:886] nsdo: normal exit
[21-SEP-2010 16:06:42:886] nsdo: entry
[21-SEP-2010 16:06:42:886] nsdo: cid=0, opcode=85, *bl=0, *what=0, uflgs=0x0, cflgs=0x3
[21-SEP-2010 16:06:42:886] nsdo: rank=64, nsctxrnk=0
[21-SEP-2010 16:06:42:886] nsdo: nsctx: state=8, flg=0x400d, mvd=0
[21-SEP-2010 16:06:42:886] nsdo: gtn=127, gtc=127, ptn=10, ptc=2011
[21-SEP-2010 16:06:42:886] nsdo: switching to application buffer
[21-SEP-2010 16:06:42:886] nsrdr: entry
[21-SEP-2010 16:06:42:886] nsrdr: recving a packet
[21-SEP-2010 16:06:42:886] nsprecv: entry
[21-SEP-2010 16:06:42:886] nsprecv: reading from transport...
[21-SEP-2010 16:06:42:886] nttrd: entry
#
#
# HANG OCCURS HERE
#
# Need to <CTRL C> twice to kill
#
I've tried searching the net for similar occurrences of some of the interesting looking trace data but there appears to be limited information available, none of which is terribly helpful.
What I'm really after is either someone who has had this issue before, or someone who can better interpret the error output from the trace files and perhaps give me an idea of what's causing it to occur. Specifically whether that error text above relates to a failed connection on the underlying network connectivity side of things or whether it may be something on a higher level within the application layers. We have done packet dumps on firewalls to check the traffic as it traverses the firewall but there are no anomalies that I can see which may be contributing to the issue at hand.
I have organised for some testing to occur within the next 24 hours as there is a Cisco ASA Firewall that sits in the network path that is performing inspection on packets travelling through it. The inspection for SQLNET specifically is disabled, but we intend to enable this once more for testing to see whether it makes a difference. I'm not entirely confident it will however, and until we do get a chance to test any constructive input or alternate ideas will be greatly appreciated. I'm trying to cover as many bases as possible here.
Cheers,
Josh.