Is it possible to load 1 billion key-value pairs into BerkeleyDB database?
770516May 3 2010 — edited May 4 2010Hello,
I experiment with loading huge datasets into BerkeleyDB database. The procedure is as follows:
1. Generate a dump-like file using a script. The file contains key-value pairs (on separate lines, exactly in the format of the dump file, that can be produced by db_dump). The index is hash.
2. Use db_load to create a database. The OS is windows server 2003.
Both key and values are 64-bit longs.
Using this procedure, I succeeded to load 25 million pairs in the database. It took about 1-2 hours.
Next, I tried to load 250 million pairs into an empty database. db_loader runs already 15 hours. It's memory consumption is very low: private bytes ~2M, working set ~2M, virtual size ~13M. db_loader already read all the pairs from disk, as IO is very low now: ~4M per second. I am not sure if db_loader will finish in next 24h hours.
My goal is to load eventually 3 billion key-value pairs into one DB.
I will appreciate if someone will advise me:
1. If BerkeleyDB is capable of dealing with such database volume.
2. Is my procedure good, how to optimize it. Is it possible to allocate more RAM to db_load? Are there other ways to optimize loading time?
Thank you,
Gregory.