Understanding compact and database file size
755735Mar 17 2010 — edited Mar 19 2010Hi there,
My understanding is that Berkeley DB manages the database file somewhat like a file system on top of a hard-disk partition. In other words, when you delete a file the file system does not erase all the bytes of the file. It simply makes sure that the file deletion is properly represented in its data structures. Nevertheless, the data is still there in the hard disk. My understanding is that Berkeley DB does the same with the database files. If you delete a record, the file will not be reduced in size because the bits and bytes may still be scattered in there. All Berkeley DB does is adjust its data structures to reflect that the record was deleted. Is this correct?
If my understanding as stated above is correct, the problem becomes managing the size of the DB file. DB->compact() can be used in order to reduce the database file size but only when it finds space at the tail of the file that can be returned to the file system. What about fragmentation inside of the file? I know also that it is possible to configure a limit for the size of the file in Berkeley DB. However, if records get deleted, wouldn't I be more and more constrained in space due to fragmentation? Is fragmentation an issue at all?
How do you tackle this issue in your applications?
Thanks