Skip to Main Content

Berkeley DB Family

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Advice needed: is BDB a good fit for what I aim at?

979004Dec 10 2012 — edited Jan 2 2013
Hello everyone,

I'm not a BDB user (yet), but I really think that this the BDB library
IS the perfect fit for my needs.

I'm designing an application with a "tricky" part, that requires a very fast
data storage/retrieval solution, mainly for writes (but for reads too).

Here's a quick summary of this tricky part, that should at least use
2 databases:
- the first db will hold references to contents, with a few writes per hour
(the references being "pushed" to it from a separate admin back end), but
expected high numbers of reads
- the second db will log requests and other events on the references
contained in the first db: it is planned that, on average, one read from DB1
will produce five times as much writes into DB2.

To illustrate:
DB1 => ~25 writes / ~100 000 reads per hour
DB2 => ~500 000 writes / *(60?) reads per hour

(*will explain about reads on DB2 later in this post)

Reads and writes on both DBs are not linear, say that for 500 000 writes
per hour, you could have the first 250 000 being done within 20 minutes,
for instance. There will be picks of activity, and low activity phases
as well.

That being said, do the BDB experts here think that BDB is a good fit for
such a need? If so or if not, could you please let me know what makes you
think what you think? Many thanks in advance.

Now, about the "*(60?) reads per hour" for BD2: actually, data from DB2
should be accessed in real time for reporting. As of now, here is what
I thing I should do to insure and preserve a high rate throughput not to
miss any write in DB2 => once per minute another "DB2" is created that will
now record new events. The "previous" DB2 is now dumped/exported into another
database which will then be queried for real-time (not exactly real-time,
but up to five minutes is an acceptable delay) reporting.
So, in my first approach, DB2 is "stopped" then dumped each minute, to another
DB (not necessarily BDB, by the way - data could probably re-structured another
way into another kind of NoSQL storage to facilite queriing and retrieval
from the admin back end), which would make 60 reads per hour (but "entire"
reads, full db)

The questions are:
- do you think that renewing DB2 as often would improve or strain performances?
- is BDB good and fast at doing massive dumps/exports? (OK: 500 000 entries per
hour would make ~8300 entries per minute on average, so let's say that a dump's
max size is 24 000 rows of data)
- would it or not be better to read directly into the current DB2 as it is
storing (intensively) new rows, which would then avoid the need to dump each
minute and then provide more real-time features? (then would just need a daily
dump, to archive the "old" data)

Anyone who has had to face such questions already is welcome, as well as
any BDB user who think they can help on this topic!

Many thanks in advance for you advice and knowledge.

Cheers,
Jimshell
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jan 14 2013
Added on Dec 10 2012
4 comments
597 views