Oracle Database migrated to ZFS filesystem crashes server
lrp1Jan 21 2013 — edited Jan 21 2013I've got some alarming results from a recent migration from UFS to ZFS-based filesystems in one of our environments.
Quick hit of our environment:
- two node cluster M4000's connected to a storageTek SAN
- Solaris 10 update 10 SPARC
- Oracle Solaris Cluster 3.3u1 for Solaris 10 sparc
- Oracle Database 11.1.0.7 (we're in the process of upgrading to 11gR2)
We moved the database (oracle_home, datafiles, archive logs, redologs, tempfiles, controlfiles) from UFS to equivalent ZFS directories. The only thing we changed out of the default was to set the recordsize=8k,logbias=throughput on the datafiles directory in accordance with the Oracle Database on ZFS whitepaper (Sep 2012) (http://www.oracle.com/technetwork/server-storage/solaris10/config-solaris-zfs-wp-167894.pdf).
To our dismay, the database saw the following errors, resulting in a system CRASH and failover to the other node. It's been happening multiple times under load, and we can't make sense of it. Before anybody asks, I'm firing a ticket to OracleSupport, but I wanted to know if the community has seen these sorts of errors.
Below is a snap of the alert log I'm seeing these errors on:
----- Error Stack Dump -----
ORA-01115: IO error reading block from file 72 (block # 2288034)
ORA-27063: number of bytes read/written is incorrect
SVR4 Error: 45: Deadlock situation detected/avoided
Additional information: -1
Additional information: *16384*
ORA-01115: IO error reading block from file 72 (block # 2288034)
ORA-27063: number of bytes read/written is incorrect
SVR4 Error: 45: Deadlock situation detected/avoided
Additional information: -1
Additional information: 16384
...
ORA-01115: IO error reading block from file 74 (block # 2289584)
ORA-27063: number of bytes read/written is incorrect
SVR4 Error: 45: Deadlock situation detected/avoided
Additional information: -1
Additional information: *24576*
The key here is the term SVR4 error: 45. I know it's an OS error, and I'm currently looking up our /var/adm/message history. however, I don't know what the 16384 and 24576 numbers mean-- I assume them to be values that I'm running up against (ie. it tried to write 16384 bytes, or ran into a 16384 stack, or open files descriptors limit).
The problem is that I don't see any such errors on my UFS filesystems, so I assume this is purely having to do with the ZFS setup. Has anybody else seen these SVR4: error 45 statements in their environment?