I have a T2000 setup with the primary domain being presented with a number of LUNs from a Clariion which I have used as boot and data devices for the guest domains. While this worked for a while, it now hangs regularly. The primary domain is fine, but the guest domain requires 2-3 reboots to become usable again and then might only last 20 minutes before requireing another reboot.
Sometimes it also seems as if the entire VDS config is stuck, even after rebooting the control/service domain. For example even after unbinding a guest domain, the disk devices dont reappear in format on the control domain.
Here is a stack trace from a core file I got of one of the guest domain hangs:
0t943::pid2proc
300084a53b0
300084a53b0::ps
S PID PPID PGID SID UID FLAGS ADDR NAME
R 943 641 943 641 101 0x4a004000 00000300084a53b0 vi
300084a53b0::print proc_t p_tlist
p_tlist = 0x3000744c6a0
0x3000744c6a0::findstack
stack pointer for thread 3000744c6a0: 2a1016f4c41
[ 000002a1016f4c41 cv_wait+0x38() ]
000002a1016f4cf1 vdc_send_request+0x2c()
000002a1016f4dc1 vdc_strategy+0x88()
000002a1016f4e91 vdev_mirror_io_start+0x1b4()
000002a1016f4f71 zil_lwb_write_start+0x20c()
000002a1016f5021 zil_commit+0x21c()
000002a1016f50d1 zfs_fsync+0xa8()
000002a1016f5181 fop_fsync+0x14()
000002a1016f5231 fdsync+0x20()
000002a1016f52e1 syscall_trap32+0xcc()
>
0x3000744c6a0::print kthread_t t_lwpchan
{ t_lwpchan.lc_wchan0 = 0
t_lwpchan.lc_wchan = 0x300035d76c8
}
0x300035d76c8::wchaninfo -v
ADDR TYPE NWAITERS THREAD PROC
00000300035d76c8 cond 1: 000003000744c6a0 vi
Here is the bindings for the domain:
solprdinfs001[]# ldm ls-bindings
Name: primary
State: active
Flags: transition,control,vio service
OS:
Util: 0.1%
Uptime: 9m
Vcpu: 16
vid pid util strand
0 0 0.9% 100%
1 1 0.0% 100%
2 2 0.1% 100%
3 3 0.0% 100%
4 4 0.2% 100%
5 5 0.1% 100%
6 6 0.5% 100%
7 7 0.2% 100%
8 8 0.1% 100%
9 9 0.1% 100%
10 10 0.1% 100%
11 11 0.1% 100%
12 12 0.1% 100%
13 13 0.2% 100%
14 14 0.3% 100%
15 15 0.2% 100%
Mau: 4
mau cpuset (0, 1, 2, 3)
mau cpuset (4, 5, 6, 7)
mau cpuset (8, 9, 10, 11)
mau cpuset (12, 13, 14, 15)
Memory: 3968M
real-addr phys-addr size
0x4000000 0x4000000 3968M
Vars: reboot-command=boot
IO: pci@780 (bus_a)
pci@7c0 (bus_b)
Vldc: primary-vldc0
(HV Control channel)]
[LDC: 0x1]
[LDom primary (Domain Services channel)]
[LDC: 0x3]
[LDom primary (FMA Services channel)]
[LDC: 0xb]
[LDom ender-dev (Domain Services channel)]
[LDC: 0x11]
[LDom ipgdrpinfs001 (Domain Services channel)]
Vldc: primary-vldc3
(SP channel)]
(SP channel)]
(SP channel)]
(SP channel)]
(SP channel)]
(SP channel)]
(SP channel)]
Vds: san
vdsdev: ender-boot device=/dev/dsk/c6t60060160B944130042C5D7DB5CE2DB11d0s2
vdsdev: ender-data device=/dev/dsk/c6t60060160B944130028DDB6665EE2DB11d0s2
vdsdev: ipgdrp-data device=/dev/dsk/c6t60060160B94413008072F232D4E0DB11d0s2
vdsdev: ipgdrp-boot device=/dev/dsk/c6t60060160B944130043C5D7DB5CE2DB11d0s2
[LDom ender-dev, dev-name: ender-boot]
[LDC: 0xe]
[LDom ender-dev, dev-name: ender-data]
[LDC: 0xf]
[LDom ipgdrpinfs001, dev-name: ipgdrp-boot]
[LDC: 0x18]
[LDom ipgdrpinfs001, dev-name: ipgdrp-data]
[LDC: 0x19]
Vcc: cons
[LDC: 0x10]
[LDom ender-dev, group: ender-dev, port: 2000]
[LDC: 0x1a]
[LDom ipgdrpinfs001, group: ipgdrpinfs001, port: 2001]
port-range=2000-2020
Vsw: admin-a
mac-addr=0:14:4f:fb:5b:ff
net-dev=e1000g0
[LDC: 0xc]
[LDom ender-dev, name: admin-a, mac-addr:0x144ffb8030]
[LDC: 0x12]
[LDom ipgdrpinfs001, name: admin-a, mac-addr:0x144ffa0d2e]
mode=prog,promisc
Vsw: admin-b
mac-addr=0:14:4f:f9:ab:44
net-dev=e1000g1
[LDC: 0x13]
[LDom ipgdrpinfs001, name: admin-b, mac-addr:0x144ffae5b0]
mode=prog,promisc
Vsw: backup
mac-addr=0:14:4f:fb:88:77
net-dev=e1000g2
[LDC: 0x15]
[LDom ipgdrpinfs001, name: backup, mac-addr:0x144ffb4ee2]
mode=prog,promisc
Vldcc: vldcc1 [FMA Services]
service: ldmfma
service: primary-vldc0 @ primary
[LDC: 0x4]
Vldcc: vldcc2 [SP Channel]
service: spfma
Vldcc: vldcc0 [Domain Services]
service: primary-vldc0 @ primary
[LDC: 0x2]
Vldcc: hvctl [Hypervisor Control]
service: primary-vldc0 @ primary
[LDC: 0x0]
Vcons: SP
----------------------------------------------------------------------------
Name: ipgdrpinfs001
State: active
Flags: transition
OS:
Util: 27%
Uptime: 1m
Vcpu: 8
vid pid util strand
0 24 93% 100%
1 25 92% 100%
2 26 92% 100%
3 27 91% 100%
4 28 92% 100%
5 29 92% 100%
6 30 92% 100%
7 31 92% 100%
Mau: 2
mau cpuset (24, 25, 26, 27)
mau cpuset (28, 29, 30, 31)
Memory: 2G
real-addr phys-addr size
0xc800000 0x17c800000 2G
Vars: nvramrc=devalias net /virtual-devices@100/channel-devices@200/network@0
boot-device=/virtual-devices@100/channel-devices@200/disk@0:a disk net
use-nvramrc?=true
Vldcc: vldcc0 [Domain Services]
service: primary-vldc0 @ primary
[LDC: 0x0]
Vnet: admin-a [LDC: 0x2]
[Peer LDom: ender-dev, mac-addr 0x144ffb8030]
mac-addr=0:14:4f:fa:d:2e
service: admin-a @ primary
[LDC: 0x1]
Vnet: admin-b
mac-addr=0:14:4f:fa:e5:b0
service: admin-b @ primary
[LDC: 0x3]
Vnet: backup
mac-addr=0:14:4f:fb:4e:e2
service: backup @ primary
[LDC: 0x4]
Vdisk: boot ipgdrp-boot@san
service: san @ primary
[LDC: 0x5]
Vdisk: data ipgdrp-data@san
service: san @ primary
[LDC: 0x6]
Vcons: [via LDC:7]
ipgdrpinfs001@cons [port:2001]
----------------------------------------------------------------------------
Name: ender-dev
State: active
Flags: transition
OS:
Util: 0.4%
Uptime: 5m
Vcpu: 8
vid pid util strand
0 16 49% 100%
1 17 48% 100%
2 18 48% 100%
3 19 48% 100%
4 20 48% 100%
5 21 48% 100%
6 22 48% 100%
7 23 48% 100%
Mau: 2
mau cpuset (16, 17, 18, 19)
mau cpuset (20, 21, 22, 23)
Memory: 2G
real-addr phys-addr size
0xc000000 0xfc000000 2G
Vars: nvramrc=devalias net /virtual-devices@100/channel-devices@200/network@0
boot-device=/virtual-devices@100/channel-devices@200/disk@0:a disk net
auto-boot?=false
use-nvramrc?=true
Vldcc: vldcc0 [Domain Services]
service: primary-vldc0 @ primary
[LDC: 0x0]
Vnet: admin-a [LDC: 0x5]
[Peer LDom: ipgdrpinfs001, mac-addr 0x144ffa0d2e]
mac-addr=0:14:4f:fb:80:30
service: admin-a @ primary
[LDC: 0x1]
Vdisk: boot ender-boot@san
service: san @ primary
[LDC: 0x2]
Vdisk: data ender-data@san
service: san @ primary
[LDC: 0x3]
Vcons: [via LDC:4]
ender-dev@cons [port:2000]