Skip to Main Content
This is a BETA environment
This environment is only for testing and may be reset multiple times during the testing period. We will be frequently upgrading this BETA environment, so check back often for changes. Please assist us by providing feedback via the bug icon to the left of your profile avatar. For the current active community, visit

Best caching policies for almost-FIFO database?

1048394Oct 17 2013

We are trying to use Berkeley DB to support the queues of a crawling process. Records are in the 100M-1B range. They are arranged in a number of FIFO queues. Each queue is defined by a key (a host), and the elements of a queue (multiple values) are URIs prefixed with a increasing timestamp so that they are retrieved in FIFO order.

The access pattern to the database is peculiar: we are steadily adding to all queues elements that will be access *much* later (in the sense that the queues tends to be long, and dequeuing takes time). Once in a while, we take a key and we read and remove quickly a burst of, say, 10-20 values (this access is somewhat random).

We were wondering if there's any obvious optimization for this kind of access pattern (we are not experts). We have a large cache, but we can barely keep in memory the internal nodes.

For instance, maybe it could be good to use EVICT_LN as a cache option when appending, as records will be re-read at a much later time.

Thank you for any suggestion!

Post Details
Added on Oct 17 2013