Hello,
I am wondering about the interaction between compression and deduplication in ZFS. I'm aware of most the issues with deduplication, memory requirements etc and that it is only useful in some situations. I’ve also read the horror stories..
In my situation I am going to backup some very large files regularly (around 50GB in size). Only very small parts of these files actually change between backups so I am hoping this is an ideal candidate for deduplication and could save TB's of space (I've also put as much RAM as my server will take and also installed a fast large SSD as a cache device).
I can see if there is no compression enabled, that each time these huge files are divided into 128k blocks inside ZFS, the blocks will all start and end at the same offsets in the file. However I am not sure how ZFS handles compression and if it could affect the offsets of how the blocks are allocated from inside the file.
For example, say I first copy the 50GB files to ZFS. Originally say the first 1MB of the file was all 0's and so highly compressible, I update just the first 1MB and the compressibility of this first segment changes. Will this affect how ZFS offsets blocks from inside the file (i.e. does it stream in the data, compressing it and filling up 128k blocks as it goes)? As if so, this would mean the blocks carved out of the second file would never match the first purely just because of different offsets even though 99% of the data between files is still the same.
However, I am hoping though that with compression on, ZFS still first slices the file into the same 128k blocks with the same offsets in the file and then compresses the data inside each block, which would mean the blocks would always be aligned between files. I suspect this is the case..
I can run some experiments but I thought I would ask in case anyone knows.
Thanks for any info,
David