Getting hung database after OEL upgrade, BUG: soft lockup - CPU#1 stuck
I ran a yum update on an X2100 system, so it now is running
Linux version 2.6.18-194.17.4.0.1.el5 (mockbuild@ca-build9.us.oracle.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Tue Oct 26 20:10:33 EDT 2010
The update installed
oracleasm-support - 2.1.3-1.el5.x86_64
oracleasm-2.6.18-92.el5 - 2.0.5-1.el5.x86_64
Then I updated oraclesm to match the kernel.
oracleasm-2.6.18-194.17.4.0.1.el5-2.0.5-1.el5.x86_64
Initially the systems boots up OK and the Oracle database runs fine.
But I get some sort of file corruption which hangs the database, and forces a reboot and manual fsck
Here are some messages:
Nov 13 10:48:46 aus-perfdb kernel: BUG: soft lockup - CPU#1 stuck for 65s! [swapper:0]
Nov 13 10:48:46 aus-perfdb kernel: CPU 1:
Nov 13 10:48:46 aus-perfdb kernel: Modules linked in: nfs fscache nfsd exportfs nfs_acl ipv6 xfrm_nalgo oracleasm(U) autofs4 hidp rfcomm l2cap bluetooth rpcsec_gss_krb5 auth_rpcgss testmgr_cipher testmgr aead crypto_blkcipher crypto_algapi crypto_api des lockd sunrpc cpufreq_ondemand powernow_k8 freq_table dm_multipath scsi_dh video backlight sbs power_meter i2c_ec dell_wmi wmi button battery asus_acpi acpi_memhotplug ac parport_pc lp parport joydev sr_mod i2c_amd756 cdrom k8temp k8_edac i2c_amd8111 i2c_core e1000 hwmon edac_mc serio_raw amd_rng pcspkr sg dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod usb_storage shpchp sata_mv libata sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Nov 13 10:48:46 aus-perfdb kernel: Pid: 0, comm: swapper Tainted: G 2.6.18-194.17.4.0.1.el5 #1
Nov 13 10:48:46 aus-perfdb kernel: RIP: 0010:[<ffffffff80064b50>] [<ffffffff80064b50>] spinunlock_irqrestore+0x8/0x9
Nov 13 10:48:46 aus-perfdb kernel: RSP: 0018:ffff8101070efd48 EFLAGS: 00000246
Nov 13 10:48:46 aus-perfdb kernel: RAX: 0000000000000000 RBX: ffff8103ffa81000 RCX: 0000000000000001
Nov 13 10:48:46 aus-perfdb kernel: RDX: 0000000000000282 RSI: 0000000000000246 RDI: ffff8103ffa81050
Nov 13 10:48:46 aus-perfdb kernel: RBP: ffff8101070efcc0 R08: ffff8101ff03f3f0 R09: ffff81020726c000
Nov 13 10:48:46 aus-perfdb kernel: R10: ffff8101c20c9288 R11: ffffffff80044ffe R12: ffffffff8005dc8e
Nov 13 10:48:46 aus-perfdb kernel: R13: ffff8101a2faf9c0 R14: ffffffff8007821b R15: ffff8101070efcc0
Nov 13 10:48:46 aus-perfdb kernel: FS: 00002b3bb3b3bc90(0000) GS:ffff81010709a440(0000) knlGS:0000000000000000
Nov 13 10:48:46 aus-perfdb kernel: CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
Nov 13 10:48:46 aus-perfdb kernel: CR2: 00002addc57ff000 CR3: 00000001d0753000 CR4: 00000000000006e0
Nov 13 10:48:46 aus-perfdb kernel:
Nov 13 10:48:46 aus-perfdb kernel: Call Trace:
Nov 13 10:48:46 aus-perfdb kernel: <IRQ> [<ffffffff88075c70>] :scsi_mod:scsi_dispatch_cmd+0x27d/0x2ff
Nov 13 10:48:46 aus-perfdb kernel: [<ffffffff8807b174>] :scsi_mod:scsi_request_fn+0x2c1/0x390
Nov 13 10:48:46 aus-perfdb kernel: [<ffffffff8005c1f7>] blk_run_queue+0x41/0x73
Nov 13 10:48:46 aus-perfdb kernel: [<ffffffff8807979d>] :scsi_mod:scsi_run_queue+0x155/0x1bf
Nov 13 10:48:46 aus-perfdb kernel: [<ffffffff88079efe>] :scsi_mod:scsi_next_command+0x2d/0x39
Nov 13 10:48:46 aus-perfdb kernel: [<ffffffff8807a07d>] :scsi_mod:scsi_end_request+0xbf/0xcd
Nov 13 10:48:46 aus-perfdb kernel: [<ffffffff8807a1d9>] :scsi_mod:scsi_io_completion+0x14e/0x324
Nov 13 10:48:46 aus-perfdb kernel: [<ffffffff880a7802>] :sd_mod:sd_rw_intr+0x252/0x28c
Nov 13 10:48:46 aus-perfdb smartd[7328]: Device: /dev/sdal, failed to read SMART Attribute Data
Nov 13 10:48:46 aus-perfdb kernel: [<ffffffff8807a46e>] :scsi_mod:scsi_device_unbusy+0x67/0x81
Nov 13 10:48:46 aus-perfdb smartd[7328]: Sending warning via mail to root ...
Nov 13 10:48:46 aus-perfdb kernel: [<ffffffff80037ca3>] blk_done_softirq+0x5f/0x6d
Nov 13 10:48:47 aus-perfdb kernel: [<ffffffff8001244a>] __do_softirq+0x89/0x133
Nov 13 10:48:47 aus-perfdb kernel: [<ffffffff8005e2fc>] call_softirq+0x1c/0x28
Nov 13 10:48:47 aus-perfdb kernel: [<ffffffff8006cba6>] do_softirq+0x2c/0x85
Nov 13 10:48:47 aus-perfdb kernel: [<ffffffff8006ca2e>] do_IRQ+0xec/0xf5
Nov 13 10:48:47 aus-perfdb kernel: [<ffffffff8006b35e>] default_idle+0x0/0x50
Nov 13 10:48:47 aus-perfdb kernel: [<ffffffff8005d615>] ret_from_intr+0x0/0xa
Nov 13 10:48:47 aus-perfdb kernel: <EOI> [<ffffffff8006b387>] default_idle+0x29/0x50
Nov 13 10:48:47 aus-perfdb kernel: [<ffffffff8004920e>] cpu_idle+0x95/0xb8
Nov 13 10:48:47 aus-perfdb kernel: [<ffffffff80077987>] start_secondary+0x498/0x4a7
Nov 13 10:48:47 aus-perfdb kernel:
Nov 13 10:48:47 aus-perfdb kernel: sd 37:0:0:0: timing out command, waited 60s
Patches? Tweaks?