Sun AVS4 with ZFS failover
821871May 12 2011 — edited May 13 2011I'm testing to gain some experience with the Sun Availability suite on two hosts with a very simple setup:
A primary host, called fokke:
SunOS fokke 5.10 Generic_139555-08 sun4v sparc SUNW,SPARC-Enterprise-T5220
and a secundairy, called sukke:
SunOS sukke 5.10 Generic_139555-08 sun4v sparc SUNW,SPARC-Enterprise-T5220
On both hosts, I'm using /dev/rdsk/c1t0d0s6 of about 80gb as the data volume, and /dev/rdsk/c1t0d0s7 of about 100mb as the bitmap volume.
I have setup the config using the command on both hosts:
sndradm -e fokke /dev/rdsk/c1t0d0s6 /dev/rdsk/c1t0d0s7 sukke /dev/rdsk/c1t0d0s6 /dev/rdsk/c1t0d0s7 ip async
This all seems to work fine. After a sndradm -m I see data going, and dsstat -m sndr went from 100% to 0%.
Next, I made a zpool on the data volume (I am aware it's only a single disk, but this is just a test setup):
zpool create tpool /dev/dsk/c1t0d0s6
This also worked fine. Next, I copied some files to the zpool, and waited till dsstat -m sndr showed 0%.
Next, I put it in logging mode using sndradm -l to test a situation where we would need to switch over to the secundairy node.
After sndradm -l, dsstat -m sndr showed the state as "L".
I then imported the zpool on sukke using zpool import -f tpool without problems. I had it available, and could read from and write data to it.
The problem now is returning from the slave host sukke to the primairy host fokke. I've tried several approaches but none seem to work.
The first, what seemed to me to be the most obvious one, was to simply do a reverse update using sndradm -u -r.
As you can not have both zpools active I first exported the zpool on sukke using zpool export tpool. This would not be a preferred situation in a production environment as it means service downtime. The whole reason we're using AVS is to have high availability, downtime while AVS copies several terabytes is not ideal. In any case, the approach didn't work. The commands completed fine, but after doing the reverse update I couldn't import the zpool on the primary host:
pool: tpool
id: 5341753121485047760
state: UNAVAIL
status: One or more devices contains corrupted data.
action: The pool cannot be imported due to damaged devices or data.
see: http://www.sun.com/msg/ZFS-8000-5E
config:
tpool UNAVAIL insufficient replicas
c1t0d0s6 UNAVAIL corrupted data
I tried the same mechanism but this time using a full reverse sync, but that didn't work either.
I also tried using zpool destroy instead of zpool export and several other crazy ideas, but nothing seems to work.
My guess is that ZFS and AVS are both making block-level changes which causes the zfs information on the disk that zpool needs to be able to do the import to fail, but I am in no way certain of this.
At this moment, the only option that seems available, is to disable the AVS set, make the slave host sukke the new primary host, then recreating a new AVS set this time from sukke to fokke, so the other way around.
The reason we wil be using asynchronous replication is because both nodes are in a different datacenter, and we prefer the primary host to be in one of them as that's the one where we have field engineers etc. on-site.
I am wondering, is there no way around this? We simply would like to be able to switch from A to B, then make changes while it's active on B, then switch back to host A using B's data.
Any help would be greatly appreciated,
Reinder.