Hi,
I just wanted to document some issues I've been having with the oracle linux 9 boxes at https://yum.oracle.com/boxes/
I was using centos7 official boxes to do dev for binary compatibility with rhel7.
With that now not possible I've been looking at alternative so wanted to give OL9 a go.
I migrated my vagrant provisioner script pretty easily from centos 7 to ol9 and was working away but I've had some strange
behavior.
Firstly if rebooting the VM it will get stuck often on boot. Seemed to be some kind of filesystem corruption.
I first noticed lvm is used for the root fs but lvm commands are broken e.g
sudo vgdisplay
Devices file sys_wwid t10.ATA_VBOX_HARDDISK_VBb7a57793-09889c58 PVID tOH9wlnxSVA15YQUBxinyJcV8c997NSg last seen on /dev/sda2 not found.
This in itself does not seem fatal unless like I did you needed to add more disks and grow the volumes or add new ones.
I think this could be fixed by disabling the default use_devicesfile = 0 in the /etc/lvm/lvm.conf. The uuid will always
change on clone in vagrant/box so I dont see much in the way of alternatives. See https://portal.nutanix.com/page/documents/kbs/details?targetId=kA07V000000LaGrSAK
I made this change hoping it would solve my issues but unfortunately I still had them.
As time passed I noticed the issues not not only on reboot but also during high load like k8s pulling lots of images the
storage would just disappear altogether crashing the vm.
I see these kinds of errors in dmesg and journal logs.
[ 96.356459] ata3.00: exception Emask 0x0 SAct 0x200 SErr 0x0 action 0x6 frozen
[ 96.356637] ata3.00: failed command: WRITE FPDMA QUEUED
[ 96.356793] ata3.00: cmd 61/00:48:20:69:90/0a:00:04:00:00/40 tag 9 ncq dma 1310720 ou
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 96.357106] ata3.00: status: { DRDY }
[ 96.357298] ata3: hard resetting link
[ 96.702971] ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 96.703489] ata3.00: configured for UDMA/133
[ 96.703702] ata3.00: device reported invalid CHS sector 0
[ 96.703892] ata3: EH complete
I tired building my own box from iso without LVM. This did not help.
Ultimately I changed the controller type from SATA to IDE and this has made the dmesg errors disappear.
So far I've had no crashes.
I would happily contribute a PR to the project that produces the boxes but I cannot find the source.
Hopefully this helps someone else.