Hi,
I am running the following OLTP file I/O on four file systems which are in a multi host environment. Vdbench starts running without any issue but after few data points, vdbench stops with the error that the file systems are not properly formatted though the previous data points used the same file system. There are only four file systems which are mounted in all the servers via different IPs. Also, while running with less number of threads OPS rate is decreasing significantly.
Please help me to resolve the issue. Let me know if you need any further information.
Thanks,
Abhishek
OS: RHEL6.4
Vdbench: 5.04.02
Vdbench Parameter File:
---------------------------------------
# Host/Client Definition
hd=default,vdbench=/home/bccc/vdbench50402,shell=ssh,user=root
hd=hd1,system=172.17.50.125
hd=hd2,system=172.17.50.126
hd=hd3,system=172.17.50.127
hd=hd4,system=172.17.50.128
# File system Definition
fsd=default,depth=1,width=1,files=64,size=6g,openflags=(o_direct)
fsd=f1,anchor=/HNAS_MNT1/nfs1
fsd=f2,anchor=/HNAS_MNT2/nfs1
fsd=f3,anchor=/HNAS_MNT3/nfs1
fsd=f4,anchor=/HNAS_MNT4/nfs1
# File System Workload Definition
fwd=wd1,fsd=(f*),host=hd1,xfersize=8k,fileio=random,fileselect=random,threads=128,skew=20
fwd=wd2,fsd=(f*),host=hd1,xfersize=8k,fileio=sequential,fileselect=random,threads=128,skew=5
fwd=wd3,fsd=(f*),host=hd2,xfersize=8k,fileio=random,fileselect=random,threads=128,skew=20
fwd=wd4,fsd=(f*),host=hd2,xfersize=8k,fileio=sequential,fileselect=random,threads=128,skew=5
fwd=wd5,fsd=(f*),host=hd3,xfersize=8k,fileio=random,fileselect=random,threads=128,skew=20
fwd=wd6,fsd=(f*),host=hd3,xfersize=8k,fileio=sequential,fileselect=random,threads=128,skew=5
fwd=wd7,fsd=(f*),host=hd4,xfersize=8k,fileio=random,fileselect=random,threads=128,skew=20
fwd=wd8,fsd=(f*),host=hd4,xfersize=8k,fileio=sequential,fileselect=random,threads=128,skew=5
#Run Definition
rd=create,fwd=(wd*),format=yes,fwdrate=max,threads=128,interval=5
rd=rd1,fwd=(wd*),fwdrate=(1000,5000,10000-110000,1000),format=no,forrdpct=80,elapsed=180,interval=5,pause=180
Mount Points:
---------------------
172.17.50.125
192.168.0.161:/EXP01 3.0T 400G 2.6T 14% /HNAS_MNT1
192.168.0.162:/EXP02 3.0T 400G 2.6T 14% /HNAS_MNT2
192.168.0.165:/EXP03 3.0T 400G 2.6T 14% /HNAS_MNT3
192.168.0.166:/EXP04 3.0T 400G 2.6T 14% /HNAS_MNT4
172.17.50.126
192.168.0.163:/EXP01 3.0T 400G 2.6T 14% /HNAS_MNT1
192.168.0.164:/EXP02 3.0T 400G 2.6T 14% /HNAS_MNT2
192.168.0.167:/EXP03 3.0T 400G 2.6T 14% /HNAS_MNT3
192.168.0.168:/EXP04 3.0T 400G 2.6T 14% /HNAS_MNT4
172.17.50.127
192.168.0.161:/EXP01 3.0T 400G 2.6T 14% /HNAS_MNT1
192.168.0.162:/EXP02 3.0T 400G 2.6T 14% /HNAS_MNT2
192.168.0.165:/EXP03 3.0T 400G 2.6T 14% /HNAS_MNT3
192.168.0.166:/EXP04 3.0T 400G 2.6T 14% /HNAS_MNT4
172.17.50.128
192.168.0.163:/EXP01 3.0T 400G 2.6T 14% /HNAS_MNT1
192.168.0.164:/EXP02 3.0T 400G 2.6T 14% /HNAS_MNT2
192.168.0.167:/EXP03 3.0T 400G 2.6T 14% /HNAS_MNT3
192.168.0.168:/EXP04 3.0T 400G 2.6T 14% /HNAS_MNT4
Error:
--------
09:17:03.386 Waiting 180 seconds; requested by 'pause' parameter
09:20:03.734 hd3-0:
fwd=wd3
when='not in control file'
old depth=0; new depth=1
old width=0; new width=1
old files=0; new files=64
old dist=null; new dist=bottom
also check the sizes=() parameters from previous and current execution.
The FWD parameters defined for 'fwd=wd3' do not
match the parameters used in the previous run.
- Correct the parameters, or
- use the 'format=' RD parameter, or
- Add '-c' execution parameter
Make sure you also specify 'format=yes' in the Run Definition (RD)
09:20:03.763
09:20:03.763 **********************************************************
09:20:03.763 Slave hd3-0 aborting: Parameter definition error
09:20:03.763 **********************************************************
09:20:03.763
09:20:03.763 Slave hd1-0 killed by master
09:20:03.763 Slave hd2-0 killed by master
09:20:03.763 Slave hd4-0 killed by master
09:20:05.595
09:20:05.595 Slave hd3-0 prematurely terminated.
09:20:05.595
09:20:05.595 Slave aborted. Abort message received:
09:20:05.595 Parameter definition error
09:20:05.595
09:20:05.595 Look at file hd3-0.stdout.html for more information.
09:20:05.595
09:20:05.595 Slave hd3-0 prematurely terminated.
09:20:05.596
- java.lang.RuntimeException: Slave hd3-0 prematurely terminated.
at Vdb.common.failure(common.java:306)
at Vdb.SlaveStarter.startSlave(SlaveStarter.java:179)
at Vdb.SlaveStarter.run(SlaveStarter.java:50)