Understanding LockTimeOutExceptions better
Hi,
Background : We use BDB as the storage backend for Voldemort and we run on SSDs, and use a shared cache for multiple environments (upto 25 on a single server). Our writes to a txn with a RMW lock. We use duplicates, and hence use a transactional cursor to perform a get(), delete() , put() cycle. (I am working on making the duplicates go away though and we will be using a simple get put in a txn). Reads use UNCOMMITTED isolation. We don't have secondary indexes or anything.
What I am seeing is that there are heavy locktimeouts sometimes, which drives up the latency very high. Our locktimeout is 500ms. Considering we are SSDs, I would expect any operation to finish much quicker than 500 ms. For example, the following instance shows 10 waiters. Considering a 4 level tree and 1ms access time max for each Node fetch (data / index), all 10 waiters should have been granted access in under 40ms (just guessing).
[BdbStorageEngine] [voldemort-niosocket-server34] [voldemort] com.sleepycat.je.LockTimeoutException: (JE 4.0.92) Lock expired. Locker 1317118982 164639096_voldemort-niosocket-server34_Txn: waited for lock on database=message_sent_history LockAddr:265706381 node=170810678 type=WRITE grant=WAIT_NEW timeoutMillis=500 startTime=1348704310016 endTime=1348704310535
Owners: [<LockInfo locker="606847962 164639091_voldemort-niosocket-server44_Txn" type="WRITE"/>]
Waiters: [<LockInfo locker="893857731 164639092_voldemort-niosocket-server28_Txn" type="WRITE"/>, <LockInfo locker="1826240023 164639093_voldemort-niosocket-server1_Txn" type="WRITE"/>, <LockInfo locker="2142263792 164639094_voldemort-niosocket-server17_Txn" type="WRITE"/>, <LockInfo locker="1710362032 164639095_voldemort-niosocket-server33_Txn" type="WRITE"/>, <LockInfo locker="1317822219 164639098_voldemort-niosocket-server10_Txn" type="WRITE"/>, <LockInfo locker="153266016 164639099_voldemort-niosocket-server35_Txn" type="WRITE"/>, <LockInfo locker="861632969 164639101_voldemort-niosocket-server11_Txn" type="WRITE"/>, <LockInfo locker="1385676009 164639103_voldemort-niosocket-server31_Txn" type="WRITE"/>, <LockInfo locker="1744015195 164639104_voldemort-niosocket-server19_Txn" type="WRITE"/>, <LockInfo locker="1933099921 164639107_voldemort-niosocket-server32_Txn" type="WRITE"/>]
Or in other words, even locking on a node higher up the tree, should be resolved very soon right?
I know lowering the locktimeout could help. But would like to understand why we would get LockTimeoutExceptions in the first place?
Thanks
Vinoth