We've been running complex vdbench (050406) workloads using multiple Linux hosts and multiple shared disks, where each disk is split into ranges to allow I/O from each host to the same disk. However, this can cause vdbench to report false data validation errors if the hosts use different xfersize options when generating I/O to the same disk. The reason is that vdbench apparently uses the xfersize to determine the logical block address (LBA) range that it uses.
To understand how vdbench calculates the LBA range, I ran this vdbench config with a 1% range on a 10GB disk (total number of 4KiB blocks is 2441472) that had been zeroed prior to running vdbench:
sd=s0000,lun=/dev/sdb,range=(1,2),openflags=o_direct
wd=w0000,sd=(s0000),rdpct=15,seekpct=0,xfersize=64k
rd=r0000,wd=(w0000),iorate=max,distribution=exponential,elapsed=120,interval=2,forthreads=4
and I see zeros at LBA 24399 and random data written to LBA 24400 through 48799, but my calculations show the range boundaries as:
2441472 * .01 = 24414.72
2441472 * .02 = 48829.44
2441472 * .03 = 73244.16
2441472 * .04 = 97658.88
I wasn't expecting any data to be written until LBA 24414 or 24415, and was expecting data to be written to LBA 48800 through 48828 or so. I determined that the 64KiB transfer size makes vdbench round the start/end LBAs down to an even multiple of 64KiB which is sixteen 4KiB blocks, so an even multiple of sixteen 4KiB LBAs. LBA 24400 and 48800 are even multiples of 16.
I repeated this test using a 1MiB transfer size (256 4KiB blocks) and found that LBA 24320 through 24399 had now been written with random data. LBA 24320 is an even multiple of 256.
This means that for a 10GB volume with 2441472 total 4KiB blocks, the 1% boundary is LBA 24414 when the transfer size is 4KiB, LBA 24400 when the transfer size is 64KiB, and LBA 24320 when the transfer size is 1MiB. If one client is doing 4KiB transfers and another client is doing 1MiB transfers to the same volume then their LBA ranges overlap by 94 blocks (376KiB) which would cause false data validation errors.
I then created this vdbench config to see if I could reproduce a false data validation error with only a single host:
# cat /opt/vdbench/tests/oneLunVariableRange
data_errors=0
sd=default,openflags=o_direct
sd=s0000,lun=/dev/sdb,range=(0,1)
sd=s0001,lun=/dev/sdb,range=(1,25)
sd=s0002,lun=/dev/sdb,range=(25,26)
sd=s0003,lun=/dev/sdb,range=(26,50)
sd=s0004,lun=/dev/sdb,range=(50,51)
sd=s0005,lun=/dev/sdb,range=(51,75)
sd=s0006,lun=/dev/sdb,range=(75,76)
sd=s0007,lun=/dev/sdb,range=(76,100)
wd=w0000,sd=(s0000),rdpct=15,seekpct=100,xfersize=4k
wd=w0001,sd=(s0001),rdpct=15,seekpct=100,xfersize=1m
wd=w0002,sd=(s0002),rdpct=15,seekpct=100,xfersize=4k
wd=w0003,sd=(s0003),rdpct=15,seekpct=100,xfersize=1m
wd=w0004,sd=(s0004),rdpct=15,seekpct=100,xfersize=4k
wd=w0005,sd=(s0005),rdpct=15,seekpct=100,xfersize=1m
wd=w0006,sd=(s0006),rdpct=15,seekpct=100,xfersize=4k
wd=w0007,sd=(s0007),rdpct=15,seekpct=100,xfersize=1m
rd=r0000,wd=(w0000,w0001,w0002,w0003,w0004,w0005,w0006,w0007),iorate=max,distribution=exponential,elapsed=1200,interval=2,forthreads=8
Ran it like this:
# /opt/vdbench/vdbench -f /opt/vdbench/tests/oneLunVariableRange -i 5 -e 1200 -v
And got this data validation error after 90 seconds:
21:51:19.798 localhost-4: 21:51:19.796
21:51:19.798 localhost-4: 21:51:19.796 Time of first corruption: Tue Aug 04 2020 21:51:19.796 UTC
21:51:19.798 localhost-4: 21:51:19.796
21:51:19.799 localhost-4: 21:51:19.796 At least one Data Validation error has been detected.
21:51:19.799 localhost-4: 21:51:19.796
21:51:19.799 localhost-4: 21:51:19.796 Terminology:
21:51:19.799 localhost-4: 21:51:19.796 - Data block: a block of xfersize= bytes.
21:51:19.800 localhost-4: 21:51:19.796 - Key block: the smallest xfersize specified by the user which is the unit of
21:51:19.843 localhost-4: 21:51:19.796 data that Data Validation keeps track of.
21:51:19.844 localhost-4: 21:51:19.796 - Sector: 512 bytes of disk storage, regardless of actual storage sector size.
21:51:19.844 localhost-4: 21:51:19.796 - Lba: Logical Byte Address, not to be confused with Logical Block Address.
21:51:19.844 localhost-4: 21:51:19.796
21:51:19.844 localhost-4: 21:51:19.796
21:51:19.844 localhost-4: 21:51:19.796 The output starts with a summary of a data block, followed by a summary of each
21:51:19.845 localhost-4: 21:51:19.796 key block. If all sectors in a key block show a similar type of data corruption
21:51:19.845 localhost-4: 21:51:19.797 only the FIRST sector of the key block will be reported.
21:51:19.845 localhost-4: 21:51:19.797 For all other cases, ALL sectors will be reported.
21:51:19.845 localhost-4: 21:51:19.797
21:51:19.846 localhost-4: 21:51:19.797 Contents of the first 32 bytes of each sector:
21:51:19.846 localhost-4: 21:51:19.797
21:51:19.846 localhost-4: 21:51:19.797 Byte 0x00 - 0x07: Byte offset of this disk block
21:51:19.846 localhost-4: 21:51:19.797 Byte 0x08 - 0x0f: Timestamp: number of milliseconds since 1/1/1970
21:51:19.846 localhost-4: 21:51:19.797 Byte 0x10 : Data Validation key from 1 - 126
21:51:19.847 localhost-4: 21:51:19.797 Byte 0x11 : Checksum of timestamp
21:51:19.847 localhost-4: 21:51:19.797 Byte 0x12 - 0x13: Reserved
21:51:19.847 localhost-4: 21:51:19.797 Byte 0x14 - 0x1b: SD or FSD name in ASCII hexadecimal
21:51:19.847 localhost-4: 21:51:19.797 Byte 0x1c - 0x1f: Process-id when written
21:51:19.847 localhost-4: 21:51:19.797 Byte 0x20 - 0x1ff: 480 bytes of compression data pattern
21:51:19.848 localhost-4: 21:51:19.797
21:51:19.848 localhost-4: 21:51:19.797 On the left: the data that was expected ('.' marks unknown value).
21:51:19.848 localhost-4: 21:51:19.797 On the right: the data that was found.
21:51:19.848 localhost-4: 21:51:19.797
21:51:19.849 localhost-4: 21:51:19.803
21:51:19.849 localhost-4: 21:51:19.804 Corrupted data block for sd=s0004,lun=/dev/sdb; lba: 5,099,782,144 (0x12ff88000) xfersize=4096
21:51:19.849 localhost-4: 21:51:19.804
21:51:19.849 localhost-4: 21:51:19.804 Data block has 1 key block(s) of 4096 bytes each.
21:51:19.849 localhost-4: 21:51:19.804 All key blocks are corrupted.
21:51:19.850 localhost-4: 21:51:19.804 Key block lba: 0x12ff88000
21:51:19.850 localhost-4: 21:51:19.804 Key block of 4,096 bytes has 8 512-byte sectors.
21:51:19.850 localhost-4: 21:51:19.804 Timeline:
21:51:19.850 localhost-4: 21:51:19.808 Tue Aug 04 2020 21:51:03.228 UTC Sector last written. (As found in the first corrupted sector, timestamp is taken just BEFORE the actual write).
21:51:19.850 localhost-4: 21:51:19.808 Tue Aug 04 2020 21:51:19.789 UTC Key block first found to be corrupted during a read-before-write.
21:51:19.851 localhost-4: 21:51:19.808
21:51:19.851 localhost-4: 21:51:19.808 All 8 sectors in this key block are corrupted.
21:51:19.851 localhost-4: 21:51:19.808 All corruptions are of the same type:
21:51:19.851 localhost-4: 21:51:19.808 ===> SD or FSD name miscompare. Expecting 's0004 ', receiving 's0005 ' (0x3030307320202035)
21:51:19.851 localhost-4: 21:51:19.808 Only the FIRST sector will be reported:
21:51:19.852 localhost-4: 21:51:19.809
21:51:19.852 localhost-4: 21:51:19.809 Data Validation error for sd=s0004,lun=/dev/sdb
21:51:19.852 localhost-4: 21:51:19.809 Block lba: 0x12ff88000; sector lba: 0x12ff88000; Key block size: 4096; relative sector in data block: 0x00 ( 0); current pid: 7233 (0x1c41)
21:51:19.852 localhost-4: 21:51:19.817 SD or FSD name in block expected: 's0004'; received: 's0005 '.
21:51:19.852 localhost-4: 21:51:19.820 0x000 00000001 2ff88000 ........ ........ 00000001 2ff88000 00000173 bb74d63c
21:51:19.852 localhost-4: 21:51:19.820 0x010 01..0000 30303073 20202034 00000000 01b50000 30303073 20202035 00001c65
21:51:19.853 localhost-4: 21:51:19.820 There are no mismatches in bytes 32-51121:51:19.853 localhost-4: 21:51:19.836 21:51:19.835 op: read lun: /dev/sdb lba: 5099782144 0x12FF88000 xfer: 4096 errno: 60003: '60003 A Data Validation error was discovered'
This matches what I expected, the 1MiB transfer size workloads are corrupting the end of the previous 4KiB transfer size LBA range because the 1MiB transfer size range starting LBAs are rounded down into the previous range used by the 4KiB workloads.
I checked the vdbench user guide but could not find anything about aligning the start LBA of an I/O to an even multiple of the xfersize, but that's apparently what vdbench does. There is an align option, but it doesn't seem to affect the starting LBA of the range.