Socket read error: connection reset by peer
843854Dec 12 2002 — edited Dec 12 2002Hi.
Has anybody experienced the error message �Socket read error: connection reset by peer�
Please see below for detailed information.
Appreciate your help
Regards
RT
Enviroment specification
Server: HP/UX 11.00 64-bit, Oracle RDBMS 8.1.6.0.0 64-bit
2 firewalls between client and db.
Client:
Win 2000,
SP3,
Oracle Client 8.1.7.0.0 ,JDBC OCI (thin JDBC driver,class12.zip)
JDK 1.3
JRUN3.0
The TCP protocol is being used in the communication
Error messages
Web Users receive: Socket read error: connection reset by peer
Trace files on the sever: Read unexpected EOF ERROR on 18.
Explanation: The error in the server sqlnet trace file, suggests that a client connection has terminated abnormally, i.e. client machine powered off, a cable removed or a network connection aborted without warning. No user has complained of such a problem and there is no client trace with an error.
The problem
The users of the java web application, experiencing an exception almost once or twice a day.
The JRUN web-server reports broken connections to the db and client are receiving "connection reset by peer".
At the moment when the errors occurs the users just have to wait a while(2-10 min) and then they can use the web application again.(no action is taken)
This problem can not be reproduced. The problem happens only occasionally when the network is under heavy load and new DB connection is being created.
The application
The java web-application uses a customized connection pooling against the database. This pool is shared among all the users of the website. whenever a user process needs to fetch data from the database, a free connection from this pool is allocated. The application is testing if the connection is valid before making a transaction (select '1' from dual). When the error occurs a ORA-3113 end-of-file on communication channel is returned to the application.
The path between the client and db involves at least two firewalls. The firewalls are opened for sql*net traffic. The network group can tell that enquiries from the app.server is not getting feedback from the db. They have not however, identified if the enquiries are reaching the db-srever, or if they are stopped earlier in the network.
Around 1000 users, are using other applications which uses dedicated sqlnet connections against the db and they have not experienced any problems.
Issues considered
Connection pooling
It is a customized connection pooling, developed by Lindorff developers.
I have read through the source code for the connection pooling and it does the job as it should, and in case of bad connection, it tries to create a new connection.
The log file shows that the call to the method DriverManager.getConnection() hangs until the server goes down, which is probably because of the fact that the method DriverManager.setLoginTimeout(), does not take effect and timeout value is Zero. ( According to oracle , Oracle JDBC does not support login timeouts and calling the static DriverManager.setLoginTimeout() method will have no effect).
Firewall
One thing to consider is when the firewall may decide to shut down the socket due to long inactivity of a connection. This will cause problems to JDBC Connection Pool because the pool is not aware of this disconnection at the TCP/IP level; until someone checks out the connection from the pool and tries to use it. The user will get a Socket read error: connection reset by peer.
Jrun timeout paramter is less than the firewall�s timeout so the firewall will not close a connection before Jrun does.
Number of processes the DB can handle
Processes parameter is 1300, , they have not experienced the Oracle error msg �max # of processes reached�.
Port redirection through a firewall:
Since the firewall has a sql net proxy Port redirection through a firewall is not a problem. Problems with port redirection only appear at connect time, but in this situation the connections fail long after the connection is established.
The network group
The network people who investigaged the problem at Lindorff report that there are a significant amount of "dropped packages" between the database server and the jdbc client (web-application) 24 hrs. The reason for this is "unknown established TCP packet" which means that the firewall does not consider these packages to be part of an already established session. The network group believes this happen because one of the hosts send a RESET or FIN signal which the firewall have noticed but are not received by the other host.
It seems like the firewall are dropping packages bacause of "Unknown
established TCP packet" from both the JDBC client and the TNSLISTENER on the database server. The dropped packages are SQL*Net v2 traffic so clearly Oracle products are involved