We are using the Java 1.5 SSLEngine to perform some non-blocking SSL network comms, and are seeing a Java-level deadlock occurring that doesn't seem to be covered in the "concurrency notes" in the javadocs for SSLEngine.
Our software has multiple threads calling into methods that perform either a wrap or unwrap operation. After either operation, if the NEED_TASK flag is set in the SSLEngineResult, we will perform all delegated tasks within the current thread. As I understand it, wrap/unwrap can occur simultaneously in multiple threads, and it is merely important to prevent simultaneous calls to either of the methods. I see no mention of the fact that a delegated task cannot be performed concurrently with either a wrap or unwrap, but this is where we are deadlocking.
Reference the following deadlock from our thread dump:
Found one Java-level deadlock:
=============================
"nbcsWriteWorker-6":
waiting to lock monitor 0x0013c378 (object 0xbb064200, a com.sun.net.ssl.internal.ssl.SSLEngineImpl),
which is held by "nbcsReadWorker-0"
"nbcsReadWorker-0":
waiting to lock monitor 0x0013c3c0 (object 0xbb076290, a java.lang.Object),
which is held by "nbcsWriteWorker-6"
Java stack information for the threads listed above:
===================================================
"nbcsWriteWorker-6":
at com.sun.net.ssl.internal.ssl.SSLEngineImpl.getConnectionState(SSLEngineImpl.java:472)
- waiting to lock <0xbb064200> (a com.sun.net.ssl.internal.ssl.SSLEngineImpl)
at com.sun.net.ssl.internal.ssl.SSLEngineImpl.writeAppRecord(SSLEngineImpl.java:1067)
at com.sun.net.ssl.internal.ssl.SSLEngineImpl.wrap(SSLEngineImpl.java:1026)
- locked <0xbb076290> (a java.lang.Object)
at javax.net.ssl.SSLEngine.wrap(SSLEngine.java:411)
at (our method to encode outbound data)
"nbcsReadWorker-0":
at com.sun.net.ssl.internal.ssl.Handshaker.sendChangeCipherSpec(Handshaker.java:594)
- waiting to lock <0xbb076290> (a java.lang.Object)
at com.sun.net.ssl.internal.ssl.ClientHandshaker.sendChangeCipherAndFinish(ClientHandshaker.java:698)
at com.sun.net.ssl.internal.ssl.ClientHandshaker.serverHelloDone(ClientHandshaker.java:624)
at com.sun.net.ssl.internal.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:160)
at com.sun.net.ssl.internal.ssl.Handshaker.processLoop(Handshaker.java:495)
at com.sun.net.ssl.internal.ssl.Handshaker$1.run(Handshaker.java:437)
at java.security.AccessController.doPrivileged(Native Method)
at com.sun.net.ssl.internal.ssl.Handshaker$DelegatedTask.run(Handshaker.java:932)
- locked <0xbb064200> (a com.sun.net.ssl.internal.ssl.SSLEngineImpl)
at (our method to run all delegated tasks for the given socket)
at (our method to encode inbound data, wherein we performed an unwrap and determined there are tasks to run)
As you can see, there is a deadlock between objects locked inside of the wrap call and objects locked inside of the Handshaker$DelegatedTask. I can guard against this occurring by synchronizing against objects that prevent tasks from being run at the same time as either a wrap or unwrap, but I'm concerned that this might adversely affect performance.
Is a bug in the SSLEngine, or just a documentation problem?
(We are seeing this occur between 2 Solaris machines, for what it's worth.)