EnvironmentFailureException question
773442May 17 2010 — edited May 17 2010Hi,
I'm relatively new to BDB... I've (part-)written a distributed crawler which uses BDB to store a persistent on-disk queue of URIs. This is the setup.
+/* Open a transactional Berkeley DB engine environment. */+
EnvironmentConfig envConfig = new EnvironmentConfig();
envConfig.setAllowCreate(true);
envConfig.setTransactional(true);
_env = new Environment(envDir, envConfig);+
+/* Open a transactional entity store. */+
StoreConfig storeConfig = new StoreConfig();
storeConfig.setAllowCreate(true);
storeConfig.setTransactional(true);
store = new EntityStore(env, this.getClass().getSimpleName(), storeConfig);+
+/* Primary index of the queue */+
urlIndex = store.getPrimaryIndex(String.class, URLObject.class);+
+/* Secondary index of the queue */+
countIndex = store.getSecondaryIndex(_urlIndex, Integer.class, "count");+
The machine is running Java 1.6.0_12 on Debian 5.0.4. The environment directory is on a local partition (in fact, everything is pretty much local as far as BDB can see).
This setup seems to work quite well. However, after about 30 hours or crawling one server (of eight) dies with the initial exception:
+<DaemonThread name="Cleaner-1"/> caught exception: com.sleepycat.je.EnvironmentFailureException: (JE 4.0.71) /data/webdb/may10/crawl/q java.io.IOException: Input/output error LOG_READ: IOException on read, log is likely invalid. Environment is invalid and must be closed.+
com.sleepycat.je.EnvironmentFailureException: (JE 4.0.71) /data/webdb/may10/crawl/q java.io.IOException: Input/output error LOG_READ: IOException on read, log is likely invalid. Environment is invalid and must be closed.
at com.sleepycat.je.log.FileManager.readFromFile(FileManager.java:1516)
at com.sleepycat.je.log.FileReader$ReadWindow.fillFromFile(FileReader.java:1116)
at com.sleepycat.je.log.FileReader$ReadWindow.fillNext(FileReader.java:1074)
at com.sleepycat.je.log.FileReader.readData(FileReader.java:759)
at com.sleepycat.je.log.FileReader.readNextEntryAllowExceptions(FileReader.java:315)
at com.sleepycat.je.cleaner.FileProcessor.processFile(FileProcessor.java:396)
at com.sleepycat.je.cleaner.FileProcessor.doClean(FileProcessor.java:236)
at com.sleepycat.je.cleaner.FileProcessor.onWakeup(FileProcessor.java:141)
at com.sleepycat.je.utilint.DaemonThread.run(DaemonThread.java:161)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Input/output error
at java.io.RandomAccessFile.readBytes(Native Method)
at java.io.RandomAccessFile.read(RandomAccessFile.java:322)
at com.sleepycat.je.log.FileManager.readFromFileInternal(FileManager.java:1551)
at com.sleepycat.je.log.FileManager.readFromFile(FileManager.java:1506)
+... 9 more+
Exiting
Followed by a plethora of similar exceptions for different lookup threads:
Exception in thread "LookupThread-XXX" ...
Also, in the je.info.0 file in the environment directory, I found the following:
+SEVERE [data/webdb/may10/crawl/q]Halted log file reading at file 0x997 offset 0x1edcb offset(decimal)=126411 prev=0x1ed84:+
entry=BINDeltatype=22,version=7)
prev=0x1ed84
size=886
Next entry should be at 0x1f14f
I'm generally at a loss as to why this happened. There's no obvious cause (such as no disk space, etc.) and it seems more-so that the index is corrupted. At the time of the exception, there is 1.5GB in the environment directory (~150 x 9.6M *.jdb files).
I cannot really reproduce the error/create a test-case, so I don't know if updating JE would help -- I'd like to have as much information on the bug as possible, and have implemented as sure a fix as possible before I try restarting the crawl.
Any help/thoughts greatly appreciated.