Skip to Main Content

Infrastructure Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Low level handling of compression inside ZFS and deduplication

user7113810Nov 8 2013 — edited Nov 12 2013

Hello,

I am wondering about the interaction between compression and deduplication in ZFS. I'm aware of most the issues with deduplication, memory requirements etc and that it is only useful in some situations. I’ve also read the horror stories..

In my situation I am going to backup some very large files regularly (around 50GB in size). Only very small parts of these files actually change between backups so I am hoping this is an ideal candidate for deduplication and could save TB's of space (I've also put as much RAM as my server will take and also installed a fast large SSD as a cache device).

I can see if there is no compression enabled, that each time these huge files are divided into 128k blocks inside ZFS, the blocks will all start and end at the same offsets in the file. However I am not sure how ZFS handles compression and if it could affect the offsets of how the blocks are allocated from inside the file.

For example, say I first copy the 50GB files to ZFS. Originally say the first 1MB of the file was all 0's and so highly compressible, I update just the first 1MB and the compressibility of this first segment changes. Will this affect how ZFS offsets blocks from inside the file (i.e. does it stream in the data, compressing it and filling up 128k blocks as it goes)? As if so, this would mean the blocks carved out of the second file would never match the first purely just because of different offsets even though 99% of the data between files is still the same.

However, I am hoping though that with compression on, ZFS still first slices the file into the same 128k blocks with the same offsets in the file and then compresses the data inside each block, which would mean the blocks would always be aligned between files. I suspect this is the case..

I can run some experiments but I thought I would ask in case anyone knows.

Thanks for any info,

David

Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Dec 10 2013
Added on Nov 8 2013
4 comments
3,030 views