Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

How long can a node be off-line and sucessfully rejoin cluster after reboot

OldSchoolDBAOct 11 2011 — edited Oct 13 2011
Hi All,

I am looking for inputs here. We have a 2 node RAC, db version 10.2.0.4, using ASM, on Linux OS, IBM boxes. We have a situation where one of our nodes has been non-functional for 2 months. When I say non-functional I mean the server does not recognize it's own local disks. It is frozen, hung, non-functioning.

When we log onto the non-functional server, it says the cluster process is running. And on the functional node, the CRS log shows both nodes as active.

2011-04-19 10:05:34.611
[crsd(17985)]CRS-1204:Recovering CRS resources for node xxx-xxxxx-2.
[cssd(18546)]CRS-1601:CSSD Reconfiguration complete. Active nodes are xxx-xxxxx-1 xxx-xxxxx-2 .

Nothing after this.

However, on the working node, the ASM log shows that node as having been evicted back in August.

List of nodes:
0
Global Resource Directory frozen
* dead instance detected - domain 1 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Mon Aug 15 19:55:03 2011
LMS 0: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Mon Aug 15 19:55:03 2011
NOTE: SMON starting instance recovery for group 1 (mounted)
Mon Aug 15 19:55:03 2011
LMS 0: 5446 GCS shadows traversed, 0 replayed
Mon Aug 15 19:55:03 2011
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
Mon Aug 15 19:55:03 2011
NOTE: F1X0 found on disk 0 fcn 0.1332669
NOTE: starting recovery of thread=1 ckpt=63.8866 group=1
Mon Aug 15 19:55:03 2011
NOTE: waiting for instance recovery of group 1
Mon Aug 15 19:55:03 2011
NOTE: advancing ckpt for thread=1 ckpt=63.8866
NOTE: smon did instance recovery for domain 1
Mon Aug 15 19:55:06 2011
NOTE: recovering COD for group 1/0xeaec03f1 (DGROUP1)
SUCCESS: completed COD recovery for group 1/0xeaec03f1 (DGROUP1)


My question is if the non-functional node comes back up after a reboot, is it possible the cluster will recognize it and resume it's work?

There have been no errors working nodes alert logs.

I understand each case is different; I am wanting to see what the experts think, will happen.
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Nov 10 2011
Added on Oct 11 2011
2 comments
489 views