Berkeley DB Family

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Is it possible to load 1 billion key-value pairs into BerkeleyDB database?

770516May 3 2010 — edited May 4 2010

Hello,

I experiment with loading huge datasets into BerkeleyDB database. The procedure is as follows:
1. Generate a dump-like file using a script. The file contains key-value pairs (on separate lines, exactly in the format of the dump file, that can be produced by db_dump). The index is hash.
2. Use db_load to create a database. The OS is windows server 2003.

Both key and values are 64-bit longs.

Using this procedure, I succeeded to load 25 million pairs in the database. It took about 1-2 hours.
Next, I tried to load 250 million pairs into an empty database. db_loader runs already 15 hours. It's memory consumption is very low: private bytes ~2M, working set ~2M, virtual size ~13M. db_loader already read all the pairs from disk, as IO is very low now: ~4M per second. I am not sure if db_loader will finish in next 24h hours.

My goal is to load eventually 3 billion key-value pairs into one DB.

I will appreciate if someone will advise me:
1. If BerkeleyDB is capable of dealing with such database volume.
2. Is my procedure good, how to optimize it. Is it possible to allocate more RAM to db_load? Are there other ways to optimize loading time?

Thank you,
Gregory.

Locked Post

New comments cannot be posted to this locked post.

Locked on Jun 1 2010

Added on May 3 2010

#berkeley-db, #performance

2 comments

1,717 views