I found a few small problems in Data Validation for Vdbench raw i/o testing (SD/WD)
Vdbench file system functionality (FSD/FWD) works just fine.
I discovered two places where data blocks were NOT validated, even though they were counted as having been validated.
This is how Vdbench Data Validation works:
- Every time a block is written it writes a unique data pattern and remembers what was written where.
- Before a block is written the code checks to see if the block has been written before. If that is the case then, before the write, a pre-read is issued and the block validated to make sure that the previous data is still valid. Then a new write is done using a different unique data pattern.
- When the 'read immediately after write (-vr)' option is used the block is immediately read again, and the data contents validated.
- When, using a read percentage other than rdpct=0, Vdbench generates a read (not a pre-read), it will validate its contents after the read, of course only when it knows what was written there.
And there in #3 and #4 above I discovered a problem, all because of an email with subject "Vdbench: anyone using Vdbench to validate snap/clone?" that I sent out six weeks ago.
- The 'read immediately' for #3 is done, but Validation, though counted, is not done after the FIRST write of that block, further writes are properly validated.
- The read for #4 is done, and if Vdbench has written that block earlier it will COUNT that block as validated, but alas, the Validation itself is NOT done. Remember, this is for a READ request, not a PRE-READ request which works just fine.
A reminder: this problem only exists with raw i/o (SD/WD), and not with file system i/o (FSD,FWD).
This does not mean a possible data corruption was missed, it just took a little while longer to recognize and report it. We just had to wait for the next pre-read.
There is one exception though: doing a 100% read and nothing else.
Wait, hold on, if all we do is READ, Vdbench never WRITES, so it does not know what it wrote and therefore can not validate anyway.
Correct.
But, what if in the same Vdbench test you first do writes and then, in a separate Run Definition (RD) do a 100% read?
Then indeed, in that Run Definition , the data is read, but data is not validated.
I do not know if anyone is running a 100% read workload, so please double check your Vdbench parameter files.
The only workload that I know of that does a 100% sequential read is Journal Recovery, but in that case, Data Validation is properly done.
And this last item is indeed why and how I discovered this problem. Remember the "Vdbench: anyone using Vdbench to validate snap/clone?" email I mentioned above?
The result of that email is that I DID find a much better and quicker way to have Vdbench validate snap/clone operations, and it DOES 100% read operations.
I'll explain the what and how in an other email.
Vdbench50404rc2 contains a fix for that.
Below is a parameter file I created to quickly prove problem #4 above. Just read the comments and change the file names.
Henk.
*
* This parameter file identifies a bug in Data Validation code:
* Data blocks requested as READS in rd=rd2 have been corrupted by the copy operation
* at the end of rd1 but are not identified as such until they are requested as pre-READS in rd3.
*
*
data_errors=1
validate=yes
sd=sd1,lun=w:\temp\vdb1,size=40m,journal=w:\temp\jnl1
sd=sd2,lun=w:\temp\vdb2,size=40m,journal=w:\temp\jnl1
* Do random writes to both files.
* After this, copy file2 over file1, causing file1 contents to be corrupted.
rd=default,xfersize=1024,elapsed=30,interval=1
rd=rd1,sd=sd*,iorate=10000,rdpct=0,
end_cmd="cp w:\temp\vdb2 w:\temp\vdb1"
* We now sequentially read all of file1, but Vdbench sees no errors
* even though file1 has been corrupted.
rd=rd2,sd=sd1,iorate=10000,rdpct=100,seek=-1
* We now sequentially WRITE file1 and Data Validation now finally discovers the corruption.
* rd=rd3 will FAIL, while rd=rd2 sshould have failed also.
rd=rd3,sd=sd1,iorate=100,rdpct=0,seek=-1