Hi Henk,
First of all I want to pay tribute to you and all the folks developing VDbench. I personally appreciate the work and use VDbench frequently. Its invaluable to my tasks and projects.
The question I have relates to my testing of a vendors ability to perform inline De-Duplication and Compression. In most cases this works perfectly for me and I have come to trust VDbench and the parameters.
One of the tests I like to perform as a baseline is to re-write all my data using shred (Latest Linux CoreUtils) using Direct_IO and 64KB writes. This usually results in no data reduction 1:1. Then perform the same test using VDbench and the fully random data patterns(No compression or De-Dupe in general parameters).
In a recent test with a Vendor I got the following data reduction results:
VDBench (Random Unique) 16:1
Shred(Random Unique) 1:1
I was puzzled by this and decided to write out the same patterns to test files and compare them in a HEX Editor. I am no expert in this field but I can see in the HEX Editor that there is every little difference between both patterns.
Could you offer any theories as to why there is such a discrepancy between the reduction numbers. I did try to zip the test files with various compression tools and none of them can make the test file smaller in either the shred or VDbench fully random example. So I believe the discrepancy to be more related to De-Duplication.
I use Raw Block Devices in RHEL 6.5 and the Vendors Equipment is performing WAN Optimization during the replication of this data.
I basically want to be able to trust VDbench with Random Unique data as a worst case scenario for replication of data.
Thanks for any pointers,
Eoin