Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

RMAN, RAC, NFS, and server lock ups

896971Dec 20 2013 — edited Jan 7 2014

Good day. My environment is:

--a 2-node RAC

--Enterprise Edition 11.2.0.3

--RHEL 5.1

The goal is to use RMAN to push backups to a shared NFS mount (on a different server). Both nodes will have access to this location (in the event one node goes down, the other can still run backups). Easy, right?

Wrong.

I've tried every NFS mount option in the book. Most work just fine, some don't. When I use the recommended NFS mount options:

rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp, vers=3,timeo=600, actimeo=0

or

rw,bg,hard,nointr,rsize=32768,wsize=32768,proto=tcp,noac,forcedirectio, vers=3,suid

The mount works normally. I can "ls" and "mkdir" and "touch" and "vi" and "cp" files back and forth from the NFS backup location to the RAC node all day long. No problems. However, when I try to do almost anything in RMAN which requires writing to the NFS backup location such as the command "backup archive all delete input;" (or even things as simple as a Crosscheck or RMAN configuration change which writes any changes back to the autobackup ControlFile) the node locks up. There are no errors (or if there are, I don't know where to find them), even when I use RMAN log.

Just to recap: I run a Crosscheck (or any RMAN process that writes to the NFS backup location), the node will lock up, and I can let it sit for a day, inaccessible, with CRSCTL on the other node saying it's offline, and the node will never come out of a "frozen" state. It cannot be pinged or connected to.

I think I can safely rule out NFS mount options at this point.

I understand (after extensive reading of MOS docs and testing) that RAC RMAN can and does suffer from inefficient I/O when writing to an NFS mount. I don't think that's the culprit either. The autobackup ControlFile is not that big and I cannot see how running a simple Crosscheck would lock an entire node.


I am hoping someone has encountered this in the past and hopefully it's just a simple misconfiguration somewhere.

Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Feb 4 2014
Added on Dec 20 2013
5 comments
3,380 views