Grid Infrastructure intense IO on local device
I have installed 11gR2 and configured a 2-node RAC on some home systems. After rebooting the systems, and randomly, one of the nodes will begin experiencing intense IO on the local drive where the software is installed. The shared storage where the OCR/VDISK/DB files are located is seeing almost no activity, even with the database started. The IO jumps as soon as the CRS is started, and disappears as soon as the final crs process stops. No matter how long I leave the host up, it just keeps hammering away on that local drive. I don't even know where to begin looking.
Restoring the entire host from a backup resolves the issue, but inevitably, the problem will return on one host or the other. I'm running centOS-5 and the grid version 11.2.0.3. The host has 4 gigs of memory in it. Oddly enough, I checked to see which logs were getting written to regularly in the GI home. There were of course many that are constantly getting updated. So I compared what was being generated before and after the issue presents. I found something in the crflogd.log.
2012-03-31 22:25:40.296: [CRFLDREP][1092643136]Error inserting record into bdb: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
2012-03-31 22:25:40.296: [CRFLDREP][1092643136]msg :Error inserting record :: error :DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
2012-03-31 22:26:37.455: [CRFLDREP][1092643136]Error inserting record into bdb: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
2012-03-31 22:26:37.455: [CRFLDREP][1092643136]msg :Error inserting record :: error :DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
2012-03-31 22:27:32.262: [CRFLDREP][1092643136]Error inserting record into bdb: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
2012-03-31 22:27:32.262: [CRFLDREP][1092643136]msg :Error inserting record :: error :DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock
This is now constantly being generated. It was not prior to the issue. Might be unrelated, but again, I don't know for sure. What is odd is that I appear to have plenty of memory available:
total used free shared buffers cached
Mem: 4043760 2364212 1679548 0 79588 1153528
-/+ buffers/cache: 1131096 2912664
Swap: 5144568 0 5144568
Could it be related to the ASM instance? Hope someone can help me out on this one.
Edited by: athompson88 on Mar 31, 2012 10:30 PM