load an existing Berkeley DB file into memory
997783Mar 19 2013 — edited Mar 25 2013Dear Experts,
I have created some Berkeley DB (BDB) files onto disk.
I noticed that when I issue key-value retrievals, the page faults are substantial, and the CPU utilization is low.
One sample of the time command line output is as follow:
1.36user 1.45system 0:10.83elapsed 26%CPU (0avgtext+0avgdata 723504maxresident)k
108224inputs+528outputs (581major+76329minor)pagefaults 0swaps
I suspect that the bottleneck is the high frequency of file I/O.
This may be because of page faults of the BDB file, and the pages are loaded in/out of disk fairly frequently.
I wish to explore how to reduce this page fault, and hence expedite the retrieval time.
One way I have read is to load the entire BDB file into main memory.
There are some example programs on docs.oracle.com, under the heading "Writing In-Memory Berkeley DB Applications".
However, I could not get them to work.
I enclosed below my code:
--------------- start of code snippets ---------------
/* Initialize our handles */
DB *dbp = NULL;
DB_ENV *envp = NULL;
DB_MPOOLFILE *mpf = NULL;
const char *db_name = "db.id_url"; // A BDB file on disk, size 66,813,952
u_int32_t open_flags;
/* Create the environment */
db_env_create(&envp, 0);
open_flags =
DB_CREATE | /* Create the environment if it does not exist */
DB_INIT_LOCK | /* Initialize the locking subsystem */
DB_INIT_LOG | /* Initialize the logging subsystem */
DB_INIT_MPOOL | /* Initialize the memory pool (in-memory cache) */
DB_INIT_TXN |
DB_PRIVATE; /* Region files are not backed by the filesystem.
* Instead, they are backed by heap memory. */
/*
* Specify the size of the in-memory cache.
*/
envp->set_cachesize(envp, 0, 70 * 1024 * 1024, 1); // 70 Mbytes, more than the BDB file size of 66,813,952
/*
* Now actually open the environment. Notice that the environment home
* directory is NULL. This is required for an in-memory only application.
*/
envp->open(envp, NULL, open_flags, 0);
/* Open the MPOOL file in the environment. */
envp->memp_fcreate(envp, &mpf, 0);
int pagesize = 4096;
if ((ret = mpf->open(mpf, "db.id_url", 0, 0, pagesize)) != 0) {
envp->err(envp, ret, "DB_MPOOLFILE->open: ");
goto err;
}
int cnt, hits = 66813952/pagesize;
void *p=0;
for (cnt = 0; cnt < hits; ++cnt) {
db_pgno_t pageno = cnt;
mpf->get(mpf, &pageno, NULL, 0, &p);
}
fprintf(stderr,"\n\nretrieve %5d pages\n",cnt);
/* Initialize the DB handle */
db_create(&dbp, envp, 0);
/*
* Set the database open flags. Autocommit is used because we are
* transactional.
*/
open_flags = DB_CREATE | DB_AUTO_COMMIT;
dbp->open(dbp, // Pointer to the database
NULL, // Txn pointer
NULL, // File name -- NULL for inmemory
db_name, // Logical db name
DB_BTREE, // Database type (using btree)
open_flags, // Open flags
0); // File mode. defaults is 0
DBT key,data; int test_key=103456;
memset(&key, 0, sizeof(key));
memset(&data, 0, sizeof(data));
key.data = (int*)&test_key;
key.size = sizeof(test_key);
dbp->get(dbp, NULL, &key, &data, 0);
printf("%d --> %s ", *((int*)key.data),(char*)data.data );
/* Close our database handle, if it was opened. */
if (dbp != NULL) {
dbp->close(dbp, 0);
}
if (mpf != NULL) (void)mpf->close(mpf, 0);
/* Close our environment, if it was opened. */
if (envp != NULL) {
envp->close(envp, 0);
}
/* Final status message and return. */
printf("I'm all done.\n");
--------------- end of code snippets ---------------
After compilation, the code output is:
retrieve 16312 pages
103456 --> (null) I'm all done.
However, the test_key input did not get the correct value retrieval.
I have been reading and trying this for the past 3 days.
I will appreciate any help/tips.
Thank you for your kind attention.
WAN
Singapore