Performance Issues with ZFS + STMF
I've set up Solaris 11 Express as an iSCSI target server using the STMF system and ZFS for FS & Volume management. I'm experiencing horrendous performance between my iSCSI clients and the Solaris target system, and I'm looking for some hints as to what might be wrong with the system.
The hardware platform is Intel - a Dell PowerEdge 1950, to be exact, with 2 x Xeon Processors and 8 GB of RAM. This has two on-board network controllers (Broadcom) that I use for administrative purposes. I have an add-in quad-port Intel GigE PCIe network card, with two of the four ports set up in an LACP aggregation. I use these ports for the iSCSI target. The storage on the back end is FC-attached (2Gb) to a dual-port Qlogic PCIe FC controller. This is attached through a McData Sphereon FC switch. The array is an EonStor 12 disk array configured as a RAID5 disk array. On the network side, I have a Foundry EIF48G switch to which all of the iSCSI systems are connected.
The software platform is, of course, Solaris 11 Express, using ZFS for volume and filesystem management and STMF for the target side, particularly iSCSI. I do have deduplication enabled on most of the volumes in the pool, though, as I'll explain later, I'm not sure this is really a factor in the performance of my system, or at least it isn't the bottleneck.
To test the performance right now, I'm simply using the "dd" command on my Linux client with the open-iscsi client, using /dev/zero as the input and the iSCSI disk as the output.
The main problem I've having is that performance of my iSCSI clients to the ZFS system is horrible - I'm getting data transfer rates around 2 MB/s, 16 Mb/s. Furthermore, I cannot find anything that should be causing this type of bottleneck. Analysis of my switch, the Foundry EIF48G, does not show anything at the switch level that would be causing this (e.g. excessive CPU usage on the switch, etc.). Furthermore, I've used a different pathway that uses a different switch, and that did not result in any significant difference in the performance. I am using flow control on the switch and the hosts (802.3x), but toggling this doesn't really seem to make a difference. I am not using jumbo frames, primarily because I'm still verifying that everything in the iSCSI path supports it - network cards, the switch itself, etc. The network cards on most of the systems are using the Intel e1000 interface - the Solaris system sees it as an "igb"-type interface. I've performed some network analysis and don't see anything out of the ordinary - lots of segmentation, but I'm not sure that's not entirely unexpected given the block sizes vs. the 1500 MTU.
On the storage side, the monitoring on the array shows hardly any I/O usage on the array - FC or disk, and very little cache usage. Furthermore, on the Solaris side, if I watch top, I don't see any I/O wait. So, I can rule out a bottleneck on the F/C side, the array, the individual disks, etc.
Also, looking at top on the Solaris system, I see no userspace CPU usage, and 3-5% kernel usage. Memory is a little harder for me to track down on the Solaris side, but, based on the fact that I'm not seeing I/O wait and the like, I'd say memory pressure probably isn't affecting performance.
I mentioned that I don't think dedup is to blame for the problems. The actual throughput is about the same whether dedup is enabled or not for the ZFS volume that I'm using. Furthermore, when using zpool iostat at 10 second intervals on the Solaris iSCSI target while running the tests. While running against a volume with dedup disabled, the number of reads per 10 seconds is between 5 and 20 while the number of write operations per 10 seconds is between 120 and 145. The read bandwidth is 10K and 50K and the write bandwidth between 3 MB and 4.5 MB. While running against a dedup-enabled volume, the read ops per 10 seconds is between 40 and 70 and the write ops per 10 seconds is between 120 and 150; for bandwidth, the read bandwidth per 10 seconds is between 2.5 and 3.5 MB while write is between 4.5 and 6.0 MB. While the rates with dedup enabled are certainly higher, none of these are really all that extraordinary for either GigE network or Fiber-attached storage.
On the client side, most of the clients are Linux-based, with a couple of Windows initiators. Most are software initiators, but there are a couple of hardware initiators. There is a mix of physical machines and virtual machines. As far as I can tell, they're all experiencing the same performance issues.
So, I'm wondering if anyone has any hints or suggestions for me for improving performance on the system, or where I should look next for the bottleneck? I've tried with Linux-based targets and am not seeing the same low performance, so, at this point, I'm thinking something Solaris based. Any hints on tweaks to the system that would help performance would be greatly appreciated!
Thanks!
-Nick