Skip to Main Content

Infrastructure Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

ETIMEDOUT on socket read()

807557Nov 24 2009 — edited Nov 25 2009
Hello Folks,

I've edited this message now that a bit more information is to hand... It was too vague before - apologies for posting a whooly question.

I'm working on a maintenance project here... It's a client server system - Solaris 10 server on an Ultra 45, written in C++, compiled with Sun Studio 9, and Windows XP clients running an ILOG Views based client written on Visual Studo C++ 6.

The problem is happening at our customer's soak test facility, with about 30 clients.

Each client makes a permanent socket connection to the server. The client subscribes to certain data feeds, and the server pushes data to each client on a couple of threads. One thread for timed updates every minute, and one thread for immediate updates on certain events. (Is the socket thread safe????)

The server runs 365/24/7. Most of the time, all is well.

The 'read' side of the server socket has no traffic on it, as the clients are just soaking up data sent to them, and there's no need for them to interact with the server once the subscriptions have been sent.

Occasionally - POSSIBLY when things get busy - both client and server hit a problem on the connection. It affects most, if not all of the clients at more or less the same time. After the error, the clients reconnect, when the client application finally times out on its own 'no traffic for 10 minutes' timer and all is well.

Now - the server is looking for incoming messages from the clients, by setting up a non blocking select() for the set of subscribed clients. (I.E. I/O multiplexing). We're just setting up read descriptors here, no write or exception descriptors, and a timeout period of zero.

In mid stream as it were, instead of just returning from the select() with 'nothing to do', the server side select() suddenly returns with a positive number to indicate many sockets are ready to be read, but most likely with a pending error on a sub-set of the read descriptors. When the read() is called on the first server side socket we look at, it takes 65 uSecs to return ETIMEDOUT. As we read all of the sockets that indicated 'ready to read', they all return with errno set to ETIMEDOUT. As we're not resetting errno to 0 between calls, this may be a red herring, but all of the socket reads also return 0 bytes read.

Meanwhile, on the Client application, the sockets are reporting EWOULDBLOCK while trying to complete the reading of the data that the server was in the process of sending.

So that all seems consistant, except that we cannot find out why the socket should be asserting ETIMEDOUT on what seems like a working, established socket. While some hints suggest 'trouble' on the network, in the soak test environment, this seems unlikely. However, the evidence does seem to suggest a problem on the network. Our customer would be extremely embarassed however, by a fault on the network, given the nature of their business, so we would have to have strong evidence to push the problem back to them.

Does anyone have a view on the significance of the ETIMEDOUT in this scenario, so that we can understand what the library is trying to tell us?

Thanks very much

Jeff Adams

Edited by: JeffAdams on 25-Nov-2009 09:53
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Dec 22 2009
Added on Nov 24 2009
0 comments
1,017 views