Hi,
We're using NIO for reading and writing from Telnet sockets, and occasionally we're finding that java.nio.channels.Selector.select(timeout) seems to return prematurely. It's supposed to return "only after at least one channel is selected, this selector's wakeup method is invoked, the current thread is interrupted, or the given timeout period expires, whichever comes first" but we're seeing it return when none of those conditions are true. With a 15-second timeout it's returning after about 3 seconds.
I'm posting here before making a bug report to see if any of you all can catch something we're doing wrong. Also because we frustratingly do not have a repro in house - we are only seeing this on a customer's site. Distressingly, the bug only happens if we use a
newer jre. With version 1.6.0_07 there is no repro; with 1.6.0_26 and 1.7.0_04 there is.
I see a similar bug here which was resolved as not a bug: http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=5035864
However we're not using the return value the same way.
We're creating the selector like this:
public void connect(String hostname, int port) {
// hook up the underlying socket
InetSocketAddress addr;
try {
// open socket
addr = new InetSocketAddress(InetAddress.getByName(hostname), port);
channel = SocketChannel.open();
channel.configureBlocking(true); // while connecting
channel.connect(addr);
// switch to non-blocking mode now that we've connected okay
channel.configureBlocking(false);
// open selector
selector = Selector.open();
channel.register(selector, SelectionKey.OP_READ);
} catch (Exception e) {
//handle
}
}
And using it thus:
private void selectRead(Selector selector, long requestStartMilli, long timeoutMilli) throws IOException {
long remainingMilli;
if (timeoutMilli == 0) { // timeout of 0 also means block indefinitely
remainingMilli = 0;
} else {
remainingMilli = timeoutMilli - (System.currentTimeMillis() - requestStartMilli) ;
if (remainingMilli < 0) {
// timeout elapsed since the last time we blocked, so short circuit
log.debug("[TSC] Timeout elapsed since last read");
throw new IOTimeoutException("Timeout elapsed since last read");
}
}
// block for an event
log.debug("requestStartMilli " + requestStartMilli + " timeoutMilli " + timeoutMilli + " remainingMilli " + remainingMilli);
log.debug("current nanos before calling connect " + System.nanoTime() );
Log.debug(" total number of keys "+ selector.keys().size() + " selected keys " + selector.selectedKeys());
int numkeys = selector.select(remainingMilli);
log.debug("current nanos after calling connect " + System.nanoTime() + " got num keys " + numkeys );
// has something bad happened?
if (!selector.isOpen() || !channel.isOpen()) throw new IOException("Channel or selector not open");
if (Thread.interrupted()) throw new InterruptedIOException();
Set<SelectionKey> selectedKeys = selector.selectedKeys();
Iterator<SelectionKey> selectedKeysItr = selectedKeys.iterator();
if (selectedKeysItr.hasNext()) {
SelectionKey selected = selectedKeysItr.next();
selectedKeysItr.remove();
// do stuff
} else {
// the timeout elapsed with no data from the server
log.debug("[TSC] Timeout elapsed while blocking");
throw new IOTimeoutException("Timeout elapsed blocking");
}
}
About once every 20th time we connect and read, we see the inappropriately fast return.
2012-06-08 13:37:40,703 DEBUG requestStartMilli 1339177060687 timeoutMilli 15000 remainingMilli 14984 additionalTimeoutMilli 0
2012-06-08 13:37:40,703 DEBUG current nanos before calling connect 341011346120112
2012-06-08 13:37:40,703 DEBUG total number of keys 1 selected keys []
2012-06-08 13:37:40,718 DEBUG current nanos after calling connect 341011349665502 got num keys 0
2012-06-08 13:37:40,718 DEBUG [TSC] Timeout elapsed while blocking
Are we doing something wrong? Can anybody guess why we'd be unable to repro in house? Any other theories?
Thanks very much.