"Deadlock in safepoint code. Should have called back to the VM"
837445Feb 7 2011 — edited Feb 9 2011We experience a VM crash in Safepoint.cpp with the following message:
# An unexpected error has been detected by Java Runtime Environment:
#
# Internal Error (safepoint.cpp:558), pid=5452, tid=1900
# Error: Deadlock in safepoint code. Should have called back to the VM before blocking.
#
# Java VM: Java HotSpot(TM) Client VM (11.0-b16 mixed mode windows-x86)
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
This happens in JRE 1.6.0.11 + Microsoft VS2008 C++, on Windows Server 2003. Happens in a busy customer site once in every few days, cannot reporoduce in the lab.
There are two relevant threads:
* Thread A: Created in Java right after system start up. It makes some preparations and then calls C++ code via JNI. In the JNI, Java data types are converted to C++ data types by GetBooleanArrayElements() and GetStringUTFChars(). It remains in C++ forever (until shutdown) and calls back the Java code via JNI every few seconds (that is: the stack contains Java frames, then C++ frames, then from time to time more Java frames and even further C++ frames).
* Thread B: Created upon need, invokes a C++ function via JNI, converts parameters by GetStringUTFChars() and at return by ReleaseStringUTFChars(). The function may take few milliseconds and up to few minutes to complete, eventually it returns to Java and the thread is deleted.
At time of crash we see the following:
* Most of the threads in the system are in state threadblocked - we guess that the VM blocked them for GC.
* Thread A is the active one. As far as we can tell it is in Java (called back from C++), maybe trying to invoke a C++ function via JNI. It locked and released a mutex ("synchronized") just before the crash (we suspect that while performing this lock its thread's Safepoint state may have changed to callback). The thread state is threadin_native_trans.
* Thread B is blocked in the JNI trying to return to Java, maybe while invoking ReleaseStringUTFChars(). It's state is threadblocked.
Another important detail: the crash happens usually certain time (few seconds or even few minutes) after thread B is blocked, and always while thread A is in Java called back from C++. We guess that something caused the Safepoint locking to wait forever while thread B tried to returns to from C++ to Java, and that the activity of thread A make the Safepoint realize that there's a deadlock (so its activity it's not the root cause but a side-effect).
We always see two mutex / monitors with name UNKNOWN at time of error, and 100% "eden" memory utilization.
hotspot error dump fragment:
--------------- T H R E A D ---------------
Current thread (0x26d51400): JavaThread "DialogicBridge" [_thread_in_native_trans, id=1900, stack(0x28490000,0x284e0000)]
Stack: [0x28490000,0x284e0000]
[error occurred during error reporting (printing stack bounds), id 0xc0000005]
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j telemessage.telephony.DialogicBridge.initializeDialogic(Ljava/lang/String;IIIIIIIIIIIILjava/lang/String;Ljava/lang/String;ILjava/lang/String;Ljava/lang/String;Ljava/lang/String;ILjava/lang/String;Ljava/lang/String;IZZ[Z)I+0
j telemessage.telephony.DialogicBridge$1.run()V+293
j java.lang.Thread.run()V+11
v ~StubRoutines::call_stub
...
0x2ed00400 JavaThread "<< Call#16207 >> " [_thread_blocked, id=8604, stack(0x2e5c0000,0x2e610000)] // this is thread B
...
=>0x26d51400 JavaThread "DialogicBridge" [_thread_in_native_trans, id=1900, stack(0x28490000,0x284e0000)] // this is thread A
...
Other Threads:
0x269f7000 VMThread [stack: 0x26a80000,0x26ad0000] [id=2104]
0x26a29000 WatcherThread [stack: 0x26cb0000,0x26d00000] [id=9300]
VM state:synchronizing (normal execution)
VM Mutex/Monitor currently owned by a thread: ([mutex/lock_event])
[0x002a8be0] UNKNOWN - owner thread: 0x269f7000
[0x002a9040] UNKNOWN - owner thread: 0x2ed00400
Heap
def new generation total 2816K, used 2707K [0x02730000, 0x02a30000, 0x04e90000)
eden space 2560K, 100% used [0x02730000, 0x029b0000, 0x029b0000)
from space 256K, 57% used [0x029f0000, 0x02a14f50, 0x02a30000)
to space 256K, 0% used [0x029b0000, 0x029b0000, 0x029f0000)
tenured generation total 30272K, used 17667K [0x04e90000, 0x06c20000, 0x22730000)
the space 30272K, 58% used [0x04e90000, 0x05fd0d88, 0x05fd0e00, 0x06c20000)
compacting perm gen total 12288K, used 10995K [0x22730000, 0x23330000, 0x26730000)
the space 12288K, 89% used [0x22730000, 0x231ecdf8, 0x231ece00, 0x23330000)
No shared spaces configured.