Skip to Main Content

Infrastructure Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

ISCSI Multipath not working over 10G Mellanox (mlx5_core) after OLVM Update to 4.4 with OEL 8.8/8.9

Andreas StainfelsJan 18 2024 — edited Jan 18 2024

Hi all,

we have a strange problem. For three years we have been running an OLVM 4.3 installation with 2 Dell R640 hosts. In addition to the integrated 10Gig Broadcom network card, these hosts also have a 10Gig 2-port adapter from Mellanox (Type 4x, mlx5_core driver).
OEL7.9 runs on the hosts and the iscsi network runs via the Mellanox adapter. Everything has been completely problem-free since then.

We have now updated the olvmengine (standalone) to OLVM4.4, no problems either.

On the (updated) hosts, however, iscsi now only works via the integrated Broaddcom adapters. As soon as we run the ISCSI networks (VLAN, mtu 9000) via the Mellanox, we keep getting errors from the multipath daemon.

The strange thing is that the connection establishment works (iscsid: login response status 0000) but the tur-checker gets an error.

...
Jan 18 11:50:10 srv-olkvm02 iscsid[1864]: iscsid: connecting to 10.100.11.12:3260
Jan 18 11:50:10 srv-olkvm02 iscsid[1864]: iscsid: connected local port 55660 to 10.100.11.12:3260
Jan 18 11:50:10 srv-olkvm02 iscsid[1864]: iscsid: login response status 0000
Jan 18 11:50:10 srv-olkvm02 iscsid[1864]: iscsid: deleting a scheduled/waiting thread!
Jan 18 11:50:10 srv-olkvm02 iscsid[1864]: iscsid: connection4:0 is operational after recovery (1 attempts)
Jan 18 11:50:10 srv-olkvm02 kernel: sd 17:0:0:25: Power-on or device reset occurred
Jan 18 11:50:11 srv-olkvm02 multipathd[1126]: 360050763808184ae4000000000000043: sdq - tur checker timed out
Jan 18 11:50:11 srv-olkvm02 multipathd[1126]: checker failed path 65:0 in map 360050763808184ae4000000000000043
Jan 18 11:50:14 srv-olkvm02 kernel: connection3:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4335948769, last ping 4335953920, now 4335959040
Jan 18 11:50:14 srv-olkvm02 kernel: connection3:0: detected conn error (1022)

The networks itself can be pinged without any problems. The interfaces or the switch port show no abnormalities in terms of errors, discards or drops ...

Does anyone have similar problems?

Andreas

Comments
Post Details