Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Aborted command 'check for resource' results in ASM failures

user12027113Apr 11 2011 — edited Apr 12 2011

If anyone has any ideas, please let me know. I've been struggling with this issue for some time and haven't found anything that has helped. I'll admit upfront that my RAC and Clusterware experience is limited, so if I state anything incorrectly, please keep that in mind.

We are transitioning off a standalone RDBMS openVMS system to a Linux RAC/Grid Infrastructure configuration (a small, five node cluster, with two RDBMS servers, two job servers and a grid control node). Since upgrading from Grid Infrastruce 11.2.0.1 to 11.2.0.2 we have been plagued by periods where the nodes will encounter errors like this

2011-04-08 00:50:44.600
[ora01/11.2.0.2/gridi/bin/orarootagent.bin(11426)]CRS-5818:Aborted command 'check for resource: ora.drivers.acfs 1 1' for resource 'ora.drivers.acfs'. Details at (:CRSAGF00113:) {0:0:2} in /ora01/11.2.0.2/gridi/log/gridc/agent/ohasd/orarootagent_root/orarootagent_root.log.
2011-04-08 00:50:44.702
[ora01/11.2.0.2/gridi/bin/orarootagent.bin(11426)]CRS-5014:Agent "/ora01/11.2.0.2/gridi/bin/orarootagent.bin" timed out starting process "/ora01/11.2.0.2/gridi/bin/acfsload" for action "check": details at "(:CLSN00009:)" in "/ora01/11.2.0.2/gridi/log/gridc/agent/ohasd/orarootagent_root/orarootagent_root.log"
2011-04-08 00:51:24.904
[ora01/11.2.0.2/gridi/bin/orarootagent.bin(11426)]CRS-5832:Agent '/ora01/11.2.0.2/gridi/bin/orarootagent_root' was unable to process commands. Details at (:CRSAGF00128:) {0:0:2} in /ora01/11.2.0.2/gridi/log/gridc/agent/ohasd/orarootagent_root/orarootagent_root.log.
2011-04-08 00:52:33.340
[ora01/11.2.0.2/gridi/bin/oraagent.bin(11381)]CRS-5818:Aborted command 'check for resource: ora.asm 1 1' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:0:2} in /ora01/11.2.0.2/gridi/log/gridc/agent/ohasd/oraagent_oracle/oraagent_oracle.log.

these continue for some time, until eventually (from /var/log/messages)
Apr 8 01:00:39 gridc kernel: [Oracle ADVM] The ASM instance terminated unexpectedly. All ADVM volumes will be taken offline. You must close all applications using these volumes and unmount the file systems. After restarting the instance, you may need to re-enable the volumes for use.

Since it's ASM that fails, it affects more than this node, it impacts all database services - this seems somewhat contrary to how a high-availability system should function, when configured properly.

We had thought originally the problem was caused by us not understanding the difference between CRS managed ACFS and General purpose, and the fact we were incorrectly mounting the CRS managed resource. But since we've gone to using srvctl to mount, the problem has persisted.

Since the specific logs had information indicating the device timeout was exceeded, and the specified value was the one used for disk access, we then considered our use of RMAN backups to disk. We thought that not specifying a RATE parameter may have been the problem. But after throttling back RMAN, the problem persists. We made sure that we were not performing disk to tape backups at the specified time the errors occur.

They occurred again, this weekend, during the day. There was no scheduled activity underway and no users on the system, so I am not sure why we are getting these check for resource errors. We've applied all the 11.2.0.2 patches we can, but the problem has persisted.

I searched the forums and found one post similar to this, but with no followups. I haven't found much helpful using Google either and Oracle Support hasn't been helpful in the two months I've been trying to puzzle this out. So if anyone has any insight or help they can share, I'd be ecstatically appreciative.

Locked Post

New comments cannot be posted to this locked post.

Locked on May 10 2011

Added on Apr 11 2011

#performance-availability, #real-application-clusters

4 comments

5,380 views