Incredibly poor zfs recv performance
BonesDec 19 2011 — edited Mar 4 2012While doing my backups, I ran into a case of zfs recv performance not meeting my expectations. This is on Solaris 11, zfs version 33
On my primary server I had a set of snapshots, taken after periods of intense activity and updating:
storage/avmedia@201006140624 0 - 8.81G -
storage/avmedia@201006191633 0 - 8.81G -
storage/avmedia@201007201232 0 - 8.81G -
storage/avmedia@201112111350 33K - 1.33T -
storage/avmedia@201112150024 33K - 1.33T -
storage/home@200903241123 21.7M - 425G -
storage/home@200903241355 5.96M - 425G -
storage/home@200906292018 498M - 407G -
storage/home@201004171421 72.0M - 421G -
storage/home@201004190940 74.5M - 421G -
storage/home@201006140624 1.30G - 1.05T -
storage/home@201006191633 1.22G - 1.05T -
storage/home@201007201232 8.70G - 1.08T -
storage/home@201112111350 47K - 1.29T -
storage/home@201112150024 47K - 1.29T -
Copying all the storage/avmedia snapshots over the network to the backup server took just under a day. I use :
zfs send -i <snap1> <snap2> | ssh <user>@<host> zfs recv -d <zfs name>
to perform the backup.
The next set of snapshots, from storage/home took less than a day, until storage/home@201007201232 completed. Since then, it has already been working for three days to copy the storage/home@201112111350 snapshot, with no end in sight. In fact, due to the poor performance, I actually did a local dump:
zfs send -i <snap1> <snap2> | compress - > snap1_2.Z
This local dump completed in 14 hours. Then, I copied this 488GB file to the backup host and proceeded to restore it using zcat snap1_2.Z | zfs recv -d<zfs name>
Since then, the typical zfs iostat behavior shows:
root@snapshot:/storage/mbuffer# zpool iostat 60
capacity operations bandwidth
pool alloc free read write read write
rpool 22.7G 349G 0 1 26.4K 939K
storage 3.18T 2.26T 23 530 1.09M 9.71M
rpool 22.7G 349G 0 0 1.62K 0
storage 3.18T 2.26T 292 285 14.2M 4.42M
rpool 22.7G 349G 0 0 1.62K 0
storage 3.18T 2.26T 287 282 14.2M 4.10M
rpool 22.7G 349G 0 0 1.62K 0
storage 3.18T 2.26T 279 278 13.2M 4.06M
This has been going on for hours now - if it is indeed moving forward at a rate of 13.2M read and 4.06M write a minute, I'm in for a long wait.
edit: it just completed:
ls -% all
zfs-increment.Z
timestamp: atime Dec 20 17:17:44 2011
timestamp: ctime Dec 18 23:24:09 2011
timestamp: mtime Dec 18 23:24:09 2011
timestamp: crtime Dec 18 09:58:39 2011
The zfs send started Dec 18th at 09:58, and the zcat stopped on Dec 20th at 17:17 - 2 days, 7 hours, 19 minutes for 488GB of backup data -> 2.5 MB/s, substantially below the capacity of the drives and controller.
Can anybody shed some light on this behavior? Could it be this slow because the volume has lots of little files that all need to be checked? Is there a way to monitor the progress of zfs recv?
Kind regards.
B.
Edited by: Bones on Dec 19, 2011 1:54 PM
Edited by: Bones on Dec 20, 2011 8:40 AM