Skip to Main Content

Berkeley DB Family

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Which data structures are used to represent duplicates in secondary DBs?

527197May 7 2010 — edited May 11 2010
Hello,
Iam currently running an evaluation where I compare two methods of representing textual data structures. It's Oracle BDB XML vs. an approach of mine which relies on Berkeley DB.

I use a secondary database (DB_Type HASH) with DUP_SORT to index words of texts in order to allow fast search and retrieval. As far as I understood, the Hash is used to lookup the hits of (in my case) a given word. Or more precisely: It maps from a word to a set/list of keys in the primary database. Since a word can appear multiple times I must use a secondary database which allows for duplicates. So far so good.

Compared to a similar setting where I use the indexing mechanisms of Oracle BDB XML I discovered that the time complexity to extract all hits of a lookup scales pretty bad using my BerkeleyDB based approach. It is something like O(log_n). In contrast the Oracle BDB XML needs roughly the same time to retrieve 5,000 30,000 or even 900,000 hits.

My question (to the developers I guess): Which data structures are used to store the duplicates in secondary databases, and which approach is used in Oracle BDB XML?

Peter
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jun 8 2010
Added on May 7 2010
3 comments
1,415 views