Hi, I have problem with Solaris 11.3 LAG technology and Juniper QFX5100 switches.
Sometimes solaris does not respond to LACP PDU packets in time and link aggregation become unavailable.
Solaris configuration:
The physical processor has 14 cores and 28 virtual processors (0-13,28-41)
x86 (GenuineIntel 306F2 family 6 model 63 step 2 clock 2600 MHz)
Intel(r) Xeon(r) CPU E5-2697 v3 @ 2.60GHz
The physical processor has 14 cores and 28 virtual processors (14-27,42-55)
x86 (GenuineIntel 306F2 family 6 model 63 step 2 clock 2600 MHz)
Intel(r) Xeon(r) CPU E5-2697 v3 @ 2.60GHz
root@srv-da-zfs-02:~# dladm show-aggr -x aggr3
LINK PORT SPEED DUPLEX STATE ADDRESS PORTSTATE
aggr3 -- 10000Mb full up 90:e2:ba:86:54:c0 --
net6 10000Mb full up 90:e2:ba:86:54:c0 attached
net8 10000Mb full up 90:e2:ba:86:5b:28 attached
root@srv-da-zfs-02:/etc/driver/drv# dladm show-aggr -L aggr3
LINK PORT AGGREGATABLE SYNC COLL DIST DEFAULTED EXPIRED
aggr3 net8 yes yes yes yes no no
-- net6 yes yes yes yes no no
root@srv-da-zfs-02:/etc/driver/drv# dladm show-aggr -P aggr3
LINK MODE POLICY ADDRPOLICY LACPACTIVITY LACPTIMER
aggr3 trunk L3,L4 auto passive short
root@srv-da-zfs-02:/etc/driver/drv# dladm show-vlan vlan616
LINK VID SVID PVLAN-TYPE FLAGS OVER
vlan616 616 -- -- ----- aggr3
root@srv-da-zfs-02:~# ipmpstat -a
ADDRESS STATE GROUP INBOUND OUTBOUND
:: down sc_ipmp3 -- --
zclu01-616-1 up sc_ipmp3 vlan616 vlan616
root@srv-da-zfs-02:~# kstat -p | grep alloc_fail | ggrep -v '0$'
root@srv-da-zfs-02:~#
Juniper Configuration
Switches uses MC-LAG technology.
jpqfx5100-1-sdn> show forwarding-options enhanced-hash-key
Slot 0
Current RTAG7 Settings
----------------------
Hash-Mode : layer2-payload
inet RTAG7 settings-
inet packet fields
Protocol : Yes
Destination L4 Port : Yes
Source L4 Port : Yes
Destination IPv4 Addr : Yes
Source IPv4 Addr : Yes
Vlan id : No
jpqfx5100-1-sdn> show configuration interfaces ae2
description da-zfs02;
mtu 9216;
aggregated-ether-options {
minimum-links 1;
link-speed 10g;
lacp {
active;
periodic fast;
system-id 00:01:02:03:04:05;
admin-key 4;
}
mc-ae {
mc-ae-id 2;
chassis-id 0;
mode active-active;
status-control active;
init-delay-time 2;
}
}
unit 0 {
family ethernet-switching {
interface-mode trunk;
vlan {
members SDN\_DC\_STORE\_DVLP;
}
}
}
jpqfx5100-1-sdn> show configuration interfaces xe-0/0/20
description ae2;
ether-options {
802.3ad ae2;
}
ERRORS
Juniper side
Aug 18 04:12:18.555 2016 jpqfx5100-1-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:18.554 2016 jpqfx5100-1-sdn mcsnoopd[1352]: krt_decode_iflogical: xe-0/0/20.0 has got color 0
Aug 18 04:12:18.680 2016 jpqfx5100-1-sdn lacpd[1321]: LACPD_TIMEOUT: xe-0/0/20: lacp current while timer expired current Receive State: CURRENT
Aug 18 04:12:18.684 2016 jpqfx5100-1-sdn /kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd xe-0/0/20 - ATTACHED state - acting as standby link
Aug 18 04:12:18.684 2016 jpqfx5100-1-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:18.682 2016 jpqfx5100-1-sdn lacpd[1321]: LACP_INTF_DOWN: ae2: Interface marked down due to lacp timeout on member xe-0/0/20
Aug 18 04:12:18.683 2016 jpqfx5100-1-sdn rpd[1327]: Decode ifd xe-0/0/20 index 741: ifdm_flags 0xc000
Aug 18 04:12:18.684 2016 jpqfx5100-1-sdn mcsnoopd[1352]: Decode ifd xe-0/0/20 index 741: ifdm_flags 0xc000
Aug 18 04:12:18.684 2016 jpqfx5100-1-sdn mcsnoopd[1352]: krt_decode_iflogical: xe-0/0/20.0 has got color 0
Aug 18 04:12:19.274 2016 jpqfx5100-1-sdn /kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd xe-0/0/20 - CD state - ready to carry traffic
Aug 18 04:12:19.274 2016 jpqfx5100-1-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:19.269 2016 jpqfx5100-1-sdn rpd[1327]: Decode ifd xe-0/0/20 index 741: ifdm_flags 0xc000
Aug 18 04:12:19.275 2016 jpqfx5100-1-sdn mcsnoopd[1352]: Decode ifd xe-0/0/20 index 741: ifdm_flags 0xc000
Aug 18 04:12:19.275 2016 jpqfx5100-1-sdn mcsnoopd[1352]: krt_decode_iflogical: xe-0/0/20.0 has got color 0
Aug 18 04:12:19.308 2016 jpqfx5100-1-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:19.310 2016 jpqfx5100-1-sdn mcsnoopd[1352]: krt_decode_iflogical: xe-0/0/20.0 has got color 2
Aug 18 04:12:26.278 2016 jpqfx5100-1-sdn lacpd[1321]: LACPD_TIMEOUT: xe-0/0/20: lacp current while timer expired current Receive State: CURRENT
Aug 18 04:12:26.283 2016 jpqfx5100-1-sdn /kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd xe-0/0/20 - ATTACHED state - acting as standby link
Aug 18 04:12:26.283 2016 jpqfx5100-1-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:26.280 2016 jpqfx5100-1-sdn lacpd[1321]: LACP_INTF_DOWN: ae2: Interface marked down due to lacp timeout on member xe-0/0/20
Aug 18 04:12:26.281 2016 jpqfx5100-1-sdn rpd[1327]: Decode ifd xe-0/0/20 index 741: ifdm_flags 0xc000
Aug 18 04:12:26.282 2016 jpqfx5100-1-sdn mcsnoopd[1352]: Decode ifd xe-0/0/20 index 741: ifdm_flags 0xc000
Aug 18 04:12:26.282 2016 jpqfx5100-1-sdn mcsnoopd[1352]: krt_decode_iflogical: xe-0/0/20.0 has got color 2
Aug 18 04:12:26.336 2016 jpqfx5100-1-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:26.334 2016 jpqfx5100-1-sdn mcsnoopd[1352]: krt_decode_iflogical: xe-0/0/20.0 has got color 0
Aug 18 04:12:29.334 2016 jpqfx5100-1-sdn rpd[1327]: Decode ifd xe-0/0/20 index 741: ifdm_flags 0xc000
Aug 18 04:12:29.338 2016 jpqfx5100-1-sdn /kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd xe-0/0/20 - DETACHED state - will not carry traffic
Aug 18 04:12:29.338 2016 jpqfx5100-1-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:29.335 2016 jpqfx5100-1-sdn mcsnoopd[1352]: Decode ifd xe-0/0/20 index 741: ifdm_flags 0xc000
Aug 18 04:12:29.335 2016 jpqfx5100-1-sdn mcsnoopd[1352]: krt_decode_iflogical: xe-0/0/20.0 has got color 0
Aug 18 04:12:30.553 2016 jpqfx5100-1-sdn rpd[1327]: Decode ifd xe-0/0/20 index 741: ifdm_flags 0xc000
Aug 18 04:12:30.553 2016 jpqfx5100-1-sdn mcsnoopd[1352]: Decode ifd xe-0/0/20 index 741: ifdm_flags 0xc000
Aug 18 04:12:30.571 2016 jpqfx5100-1-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:30.569 2016 jpqfx5100-1-sdn mcsnoopd[1352]: krt_decode_iflogical: xe-0/0/20.0 has got color 2
Aug 18 04:12:31.542 2016 jpqfx5100-1-sdn rpd[1327]: Decode ifd xe-0/0/20 index 741: ifdm_flags 0xc000
Aug 18 04:12:31.543 2016 jpqfx5100-1-sdn mcsnoopd[1352]: Decode ifd xe-0/0/20 index 741: ifdm_flags 0xc000
Aug 18 04:12:32.559 2016 jpqfx5100-1-sdn /kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd xe-0/0/20 - ATTACHED state - acting as standby link
Aug 18 04:12:32.559 2016 jpqfx5100-1-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:32.559 2016 jpqfx5100-1-sdn /kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd xe-0/0/20 - CD state - ready to carry traffic
Aug 18 04:12:32.559 2016 jpqfx5100-1-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:32.554 2016 jpqfx5100-1-sdn rpd[1327]: Decode ifd xe-0/0/20 index 741: ifdm_flags 0xc000
Aug 18 04:12:32.555 2016 jpqfx5100-1-sdn mcsnoopd[1352]: Decode ifd xe-0/0/20 index 741: ifdm_flags 0xc000
Aug 18 04:12:32.555 2016 jpqfx5100-1-sdn mcsnoopd[1352]: krt_decode_iflogical: xe-0/0/20.0 has got color 2
Aug 18 04:12:32.558 2016 jpqfx5100-1-sdn rpd[1327]: Decode ifd xe-0/0/20 index 741: ifdm_flags 0xc000
Aug 18 04:12:32.559 2016 jpqfx5100-1-sdn mcsnoopd[1352]: Decode ifd xe-0/0/20 index 741: ifdm_flags 0xc000
Aug 18 04:12:32.559 2016 jpqfx5100-1-sdn mcsnoopd[1352]: krt_decode_iflogical: xe-0/0/20.0 has got color 2
Aug 18 04:12:32.565 2016 jpqfx5100-1-sdn rpd[1327]: Decode ifd xe-0/0/20 index 741: ifdm_flags 0xc000
Aug 18 04:12:32.566 2016 jpqfx5100-1-sdn mcsnoopd[1352]: Decode ifd xe-0/0/20 index 741: ifdm_flags 0xc000
Aug 18 04:12:18.512 2016 jpqfx5100-2-sdn lacpd[2664]: LACPD_TIMEOUT: xe-0/0/20: lacp current while timer expired current Receive State: CURRENT
Aug 18 04:12:18.516 2016 jpqfx5100-2-sdn /kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd xe-0/0/20 - ATTACHED state - acting as standby link
Aug 18 04:12:18.516 2016 jpqfx5100-2-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:18.514 2016 jpqfx5100-2-sdn lacpd[2664]: LACP_INTF_DOWN: ae2: Interface marked down due to lacp timeout on member xe-0/0/20
Aug 18 04:12:18.515 2016 jpqfx5100-2-sdn mcsnoopd[1353]: Decode ifd xe-0/0/20 index 740: ifdm_flags 0xc000
Aug 18 04:12:18.515 2016 jpqfx5100-2-sdn mcsnoopd[1353]: krt_decode_iflogical: xe-0/0/20.0 has got color 2
Aug 18 04:12:18.515 2016 jpqfx5100-2-sdn rpd[1327]: Decode ifd xe-0/0/20 index 740: ifdm_flags 0xc000
Aug 18 04:12:18.727 2016 jpqfx5100-2-sdn mcsnoopd[1353]: krt_decode_iflogical: xe-0/0/20.0 has got color 0
Aug 18 04:12:18.729 2016 jpqfx5100-2-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:19.271 2016 jpqfx5100-2-sdn /kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd xe-0/0/20 - CD state - ready to carry traffic
Aug 18 04:12:19.271 2016 jpqfx5100-2-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:19.270 2016 jpqfx5100-2-sdn rpd[1327]: Decode ifd xe-0/0/20 index 740: ifdm_flags 0xc000
Aug 18 04:12:19.277 2016 jpqfx5100-2-sdn mcsnoopd[1353]: Decode ifd xe-0/0/20 index 740: ifdm_flags 0xc000
Aug 18 04:12:19.277 2016 jpqfx5100-2-sdn mcsnoopd[1353]: krt_decode_iflogical: xe-0/0/20.0 has got color 0
Aug 18 04:12:19.306 2016 jpqfx5100-2-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:19.305 2016 jpqfx5100-2-sdn mcsnoopd[1353]: krt_decode_iflogical: xe-0/0/20.0 has got color 2
Aug 18 04:12:26.287 2016 jpqfx5100-2-sdn lacpd[2664]: LACPD_TIMEOUT: xe-0/0/20: lacp current while timer expired current Receive State: CURRENT
Aug 18 04:12:26.291 2016 jpqfx5100-2-sdn /kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd xe-0/0/20 - ATTACHED state - acting as standby link
Aug 18 04:12:26.291 2016 jpqfx5100-2-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:26.289 2016 jpqfx5100-2-sdn lacpd[2664]: LACP_INTF_DOWN: ae2: Interface marked down due to lacp timeout on member xe-0/0/20
Aug 18 04:12:26.320 2016 jpqfx5100-2-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:26.290 2016 jpqfx5100-2-sdn mcsnoopd[1353]: Decode ifd xe-0/0/20 index 740: ifdm_flags 0xc000
Aug 18 04:12:26.290 2016 jpqfx5100-2-sdn mcsnoopd[1353]: krt_decode_iflogical: xe-0/0/20.0 has got color 2
Aug 18 04:12:26.290 2016 jpqfx5100-2-sdn rpd[1327]: Decode ifd xe-0/0/20 index 740: ifdm_flags 0xc000
Aug 18 04:12:26.318 2016 jpqfx5100-2-sdn mcsnoopd[1353]: krt_decode_iflogical: xe-0/0/20.0 has got color 0
Aug 18 04:12:29.291 2016 jpqfx5100-2-sdn mcsnoopd[1353]: Decode ifd xe-0/0/20 index 740: ifdm_flags 0xc000
Aug 18 04:12:29.294 2016 jpqfx5100-2-sdn /kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd xe-0/0/20 - DETACHED state - will not carry traffic
Aug 18 04:12:29.294 2016 jpqfx5100-2-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:29.291 2016 jpqfx5100-2-sdn mcsnoopd[1353]: krt_decode_iflogical: xe-0/0/20.0 has got color 0
Aug 18 04:12:29.291 2016 jpqfx5100-2-sdn rpd[1327]: Decode ifd xe-0/0/20 index 740: ifdm_flags 0xc000
Aug 18 04:12:30.554 2016 jpqfx5100-2-sdn mcsnoopd[1353]: Decode ifd xe-0/0/20 index 740: ifdm_flags 0xc000
Aug 18 04:12:30.554 2016 jpqfx5100-2-sdn rpd[1327]: Decode ifd xe-0/0/20 index 740: ifdm_flags 0xc000
Aug 18 04:12:30.583 2016 jpqfx5100-2-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:30.582 2016 jpqfx5100-2-sdn mcsnoopd[1353]: krt_decode_iflogical: xe-0/0/20.0 has got color 2
Aug 18 04:12:31.542 2016 jpqfx5100-2-sdn mcsnoopd[1353]: Decode ifd xe-0/0/20 index 740: ifdm_flags 0xc000
Aug 18 04:12:31.542 2016 jpqfx5100-2-sdn rpd[1327]: Decode ifd xe-0/0/20 index 740: ifdm_flags 0xc000
Aug 18 04:12:32.560 2016 jpqfx5100-2-sdn /kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd xe-0/0/20 - ATTACHED state - acting as standby link
Aug 18 04:12:32.560 2016 jpqfx5100-2-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:32.560 2016 jpqfx5100-2-sdn /kernel: KERN_LACP_INTF_STATE_CHANGE: lacp_update_state_userspace: cifd xe-0/0/20 - CD state - ready to carry traffic
Aug 18 04:12:32.560 2016 jpqfx5100-2-sdn /kernel: if_pfe_mcae_color_pfe_update: xe-0/0/20.0: need to send mcae color to pfe
Aug 18 04:12:32.555 2016 jpqfx5100-2-sdn mcsnoopd[1353]: Decode ifd xe-0/0/20 index 740: ifdm_flags 0xc000
Aug 18 04:12:32.555 2016 jpqfx5100-2-sdn mcsnoopd[1353]: krt_decode_iflogical: xe-0/0/20.0 has got color 2
Aug 18 04:12:32.555 2016 jpqfx5100-2-sdn rpd[1327]: Decode ifd xe-0/0/20 index 740: ifdm_flags 0xc000
Aug 18 04:12:32.558 2016 jpqfx5100-2-sdn mcsnoopd[1353]: Decode ifd xe-0/0/20 index 740: ifdm_flags 0xc000
Aug 18 04:12:32.558 2016 jpqfx5100-2-sdn mcsnoopd[1353]: krt_decode_iflogical: xe-0/0/20.0 has got color 2
Aug 18 04:12:32.559 2016 jpqfx5100-2-sdn rpd[1327]: Decode ifd xe-0/0/20 index 740: ifdm_flags 0xc000
Aug 18 04:12:32.568 2016 jpqfx5100-2-sdn mcsnoopd[1353]: Decode ifd xe-0/0/20 index 740: ifdm_flags 0xc000
Aug 18 04:12:32.568 2016 jpqfx5100-2-sdn rpd[1327]: Decode ifd xe-0/0/20 index 740: ifdm_flags 0xc000
Solaris side
root@srv-da-zfs-02:/etc/driver/drv# egrep "mpath|mac:" /var/adm/messages
Aug 18 04:12:30 srv-da-zfs-02.net.billing.ru mac: [ID 486395 kern.info] NOTICE: aggr2 link down
Aug 18 04:12:30 srv-da-zfs-02.net.billing.ru mac: [ID 486395 kern.info] NOTICE: vlan612 link down
Aug 18 04:12:30 srv-da-zfs-02.net.billing.ru in.mpathd[115]: [ID 215189 daemon.error] The link has gone down on vlan612
Aug 18 04:12:30 srv-da-zfs-02.net.billing.ru in.mpathd[115]: [ID 773107 daemon.error] All IP interfaces in group sc_ipmp0 are now unusable
Aug 18 04:12:30 srv-da-zfs-02.net.billing.ru mac: [ID 486395 kern.info] NOTICE: aggr3 link down
Aug 18 04:12:30 srv-da-zfs-02.net.billing.ru mac: [ID 486395 kern.info] NOTICE: vlan616 link down
Aug 18 04:12:30 srv-da-zfs-02.net.billing.ru in.mpathd[115]: [ID 215189 daemon.error] The link has gone down on vlan616
Aug 18 04:12:30 srv-da-zfs-02.net.billing.ru in.mpathd[115]: [ID 773107 daemon.error] All IP interfaces in group sc_ipmp3 are now unusable
Aug 18 04:12:32 srv-da-zfs-02.net.billing.ru mac: [ID 435574 kern.info] NOTICE: aggr3 link up, 10000 Mbps, full duplex
Aug 18 04:12:32 srv-da-zfs-02.net.billing.ru mac: [ID 435574 kern.info] NOTICE: vlan616 link up, 10000 Mbps, unknown duplex
Aug 18 04:12:32 srv-da-zfs-02.net.billing.ru in.mpathd[115]: [ID 820239 daemon.error] The link has come up on vlan616
Aug 18 04:12:32 srv-da-zfs-02.net.billing.ru in.mpathd[115]: [ID 561795 daemon.error] At least 1 IP interface (vlan616) in group sc_ipmp3 is now usable
Aug 18 04:12:32 srv-da-zfs-02.net.billing.ru mac: [ID 435574 kern.info] NOTICE: aggr2 link up, 10000 Mbps, full duplex
Aug 18 04:12:32 srv-da-zfs-02.net.billing.ru mac: [ID 435574 kern.info] NOTICE: vlan612 link up, 10000 Mbps, unknown duplex
Aug 18 04:12:32 srv-da-zfs-02.net.billing.ru in.mpathd[115]: [ID 820239 daemon.error] The link has come up on vlan612
Aug 18 04:12:32 srv-da-zfs-02.net.billing.ru in.mpathd[115]: [ID 561795 daemon.error] At least 1 IP interface (vlan612) in group sc_ipmp0 is now usable
LOAD AVG 1min when LACP TIMEOUT appeared. It only happend when system in hight load.

Network Traffic
