We just reinstalled a OLVM node from scratch - OL8 - with UEK kernel.
We were not able to connect to the SAN at all.
We have these Fibre channel cards
‘Fibre Channel: QLogic Corp. ISP2532-based 8Gb Fibre Channel to PCI Express HBA (rev 02)’
In OLVM we got this message
VDSM olvm-host2 command ConnectStoragePoolVDS failed: Cannot find master domain: 'spUUID=091986b4-2952-4501-a4dc-57ce2e261394, msdUUID=e8b6d7eb-8d68-4917-aca7-4a9bcefab410'
In the node dmesg was showing this constant error looping
[ 59.440128] qla2xxx [0000:05:00.0]-500b:1: LOOP DOWN detected (2 3 0 0).
[ 59.844773] qla2xxx [0000:05:00.1]-500b:2: LOOP DOWN detected (2 3 0 0).
[ 60.387016] qla2xxx [0000:05:00.0]-500a:1: LOOP UP detected (8 Gbps).
[ 60.508611] qla2xxx [0000:05:00.0]-503f:1: Driver ELS logo IOCB Done hdl=41 comp_status=0x15
[ 60.508624] qla2xxx [0000:05:00.0]-503f:1: subcode 1=0x0 subcode 2=0x0 bytes=0x8 000001 -> 000002
[ 60.508672] qla2xxx [0000:05:00.0]-5037:1: Async-prli failed: handle=42 pid=000002 wwpn=21:70:00:c0:ff:28:0b:ae comp_status=31 iop0=9 iop1=707
[ 60.508787] qla2xxx [0000:05:00.0]-2119:1: qla24xx_handle_prli_done_event 2396 21:70:00:c0:ff:28:0b:ae Unable to reconnect
[ 60.792691] qla2xxx [0000:05:00.1]-500a:2: LOOP UP detected (8 Gbps).
also
[Tue Sep 19 12:59:42 2023] qla2xxx [0000:05:00.1]-503f:2: Driver ELS logo IOCB Done hdl=1ae comp_status=0x15
[Tue Sep 19 12:59:42 2023] qla2xxx [0000:05:00.1]-503f:2: subcode 1=0x0 subcode 2=0x0 bytes=0x8 000001 -> 000002
[Tue Sep 19 12:59:42 2023] qla2xxx [0000:05:00.1]-5037:2: Async-prli failed: handle=1af pid=000002 wwpn=25:70:00:c0:ff:28:0b:ae comp_status=31 iop0=9 iop1=707
[Tue Sep 19 12:59:42 2023] qla2xxx [0000:05:00.1]-2119:2: qla24xx_handle_prli_done_event 2396 25:70:00:c0:ff:28:0b:ae Unable to reconnect
Now all other nodes were running UEK kernel 5.15.x - but an older one (5.15.0-102.110.5.el8uek.x86_64) and that is fine.
The issue has appeared between kernel-uek-5.15.0-102.110.5.el8uek.x86_64 and the latest kernel-uek-105.125.6.2.1.el8uek
looking at https://www.linuxcompatible.org/story/elba202312794-oracle-linux-8-unbreakable-enterprise-kernel-bug-fix-update/ it looks like Oracle have applied many patches recently to the uek kernel for qla2xxx driver - this made our node broken.
And if we were to issue an update to any others they would also break…
My solution is to remove the UEK kernel and go back to the rhel8 one (4.18.x).
This as well as not breaking FC cards also fixes the issues I had with VLAN adding to bonds → https://forums.oracle.com/ords/apexds/post/olvm-ol8-x-help-some-nodes-we-cannot-do-any-network-changes-6280
Anyone using the same driver/card I strongly advise NOT using the latest uek kernel.