Berkeley DB Family

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Which data structures are used to represent duplicates in secondary DBs?

527197May 7 2010 — edited May 11 2010

Hello,
Iam currently running an evaluation where I compare two methods of representing textual data structures. It's Oracle BDB XML vs. an approach of mine which relies on Berkeley DB.

I use a secondary database (DB_Type HASH) with DUP_SORT to index words of texts in order to allow fast search and retrieval. As far as I understood, the Hash is used to lookup the hits of (in my case) a given word. Or more precisely: It maps from a word to a set/list of keys in the primary database. Since a word can appear multiple times I must use a secondary database which allows for duplicates. So far so good.

Compared to a similar setting where I use the indexing mechanisms of Oracle BDB XML I discovered that the time complexity to extract all hits of a lookup scales pretty bad using my BerkeleyDB based approach. It is something like O(log_n). In contrast the Oracle BDB XML needs roughly the same time to retrieve 5,000 30,000 or even 900,000 hits.

My question (to the developers I guess): Which data structures are used to store the duplicates in secondary databases, and which approach is used in Oracle BDB XML?

Peter

Locked Post

New comments cannot be posted to this locked post.

Locked on Jun 8 2010

Added on May 7 2010

#berkeley-db

3 comments

1,415 views