node 1 evicted due to ORA- 29740
686680Aug 26 2009 — edited Aug 26 2009Hi Guys,
We have a two node cluster with 10.2.0.2 on Hp unix 11.31
Yesterday node 1 was evicted by the other node due to ORA 29740 error;
When I checked the alert log file I sae some IPC errors, below are some excerpts from the alert log files of both the nodes
Node 1 Alert log file
Mon Aug 24 22:03:00 2009
Thread 1 advanced to log sequence 10484
Current log# 7 seq# 10484 mem# 0: +DATADG/orcl/onlinelog/group_7.298.670427121
Mon Aug 24 22:03:00 2009
SUCCESS: diskgroup FLASHDG was mounted
SUCCESS: diskgroup FLASHDG was dismounted
Mon Aug 24 22:50:04 2009
IPC Send timeout detected. Receiver ospid 15041
Mon Aug 24 22:51:08 2009
*Trace dumping is performing id=[cdmp_20090824225031]*
Mon Aug 24 22:52:27 2009
Errors in file /u01/app/oracle/db/admin/orcl/bdump/orcl1_lmon_15039.trc:
ORA-29740: evicted by member 1, group incarnation 10
Mon Aug 24 22:52:27 2009
LMON: terminating instance due to error 29740
Mon Aug 24 22:52:27 2009
Errors in file /u01/app/oracle/db/admin/orcl/bdump/orcl1_lms1_15045.trc:
ORA-29740: evicted by member , group incarnation
Mon Aug 24 22:52:27 2009
Errors in file /u01/app/oracle/db/admin/orcl/bdump/orcl1_lms0_15043.trc:
ORA-29740: evicted by member , group incarnation
Mon Aug 24 22:52:30 2009
Errors in file /u01/app/oracle/db/admin/orcl/bdump/orcl1_rbal_15336.trc:
ORA-29740: evicted by member , group incarnation
Mon Aug 24 22:52:59 2009
Shutting down instance (abort)
License high water mark = 254
Mon Aug 24 22:53:02 2009
Instance terminated by LMON, pid = 15039
Mon Aug 24 22:53:04 2009
Instance terminated by USER, pid = 8745
Mon Aug 24 22:53:13 2009
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
-----------
Node 2 Alert log file
Mon Aug 24 19:55:31 2009
Thread 2 advanced to log sequence 6803
Current log# 10 seq# 6803 mem# 0: +DATADG/orcl/onlinelog/group_10.301.670427207
Mon Aug 24 19:55:31 2009
SUCCESS: diskgroup FLASHDG was mounted
SUCCESS: diskgroup FLASHDG was dismounted
Mon Aug 24 22:50:03 2009
IPC Send timeout detected.Sender: ospid 6382
Receiver: inst 1 binc 275179919 ospid 15041
Mon Aug 24 22:50:04 2009
IPC Send timeout detected.Sender: ospid 25897
Receiver: inst 1 binc 275179919 ospid 15041
Mon Aug 24 22:50:05 2009
IPC Send timeout detected.Sender: ospid 26617
Receiver: inst 1 binc 275179919 ospid 15041
Mon Aug 24 22:50:06 2009
IPC Send timeout detected.Sender: ospid 25678
Receiver: inst 1 binc 275179919 ospid 15041
Mon Aug 24 22:50:07 2009
IPC Send timeout detected.Sender: ospid 21344
Receiver: inst 1 binc 275179919 ospid 15041
Mon Aug 24 22:50:31 2009
IPC Send timeout to 0.0 inc 8 for msg type 12 from opid 198
Mon Aug 24 22:50:31 2009
Communications reconfiguration: instance_number 1+
Mon Aug 24 22:50:33 2009
IPC Send timeout to 0.0 inc 8 for msg type 12 from opid 112
Mon Aug 24 22:50:35 2009
Trace dumping is performing id=[cdmp_20090824225031]
Mon Aug 24 22:50:35 2009
IPC Send timeout detected.Sender: ospid 984
Receiver: inst 1 binc 275179919 ospid 15041
Mon Aug 24 22:50:35 2009
IPC Send timeout to 0.0 inc 8 for msg type 12 from opid 15
Mon Aug 24 22:50:49 2009
IPC Send timeout to 0.0 inc 8 for msg type 12 from opid 16
Mon Aug 24 22:50:52 2009
IPC Send timeout detected.Sender: ospid 12489
Receiver: inst 1 binc 275179919 ospid 15041
Mon Aug 24 22:50:57 2009
IPC Send timeout to 0.0 inc 8 for msg type 12 from opid 84
Mon Aug 24 22:51:00 2009
IPC Send timeout to 0.0 inc 8 for msg type 12 from opid 97
Mon Aug 24 22:51:07 2009
IPC Send timeout to 0.0 inc 8 for msg type 12 from opid 75
Mon Aug 24 22:51:08 2009
IPC Send timeout detected.Sender: ospid 8900
Receiver: inst 1 binc 275179919 ospid 15041
Mon Aug 24 22:51:25 2009
Receiver: inst 1 binc 275179919 ospid 15041
Mon Aug 24 22:52:09 2009
Mon Aug 24 22:52:42 2009
Waiting for instances to leave:
*1*
Mon Aug 24 22:52:57 2009
IPC Send timeout detected.Sender: ospid 6378
Receiver: inst 1 binc 275179919 ospid 15041
Mon Aug 24 22:53:02 2009
Reconfiguration started (old inc 8, new inc 12)
List of nodes:
1
Global Resource Directory frozen
* dead instance detected - domain 0 invalid = TRUE
Communication channels reestablished
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Mon Aug 24 22:53:02 2009
LMS 0: 10 GCS shadows cancelled, 2 closed
Mon Aug 24 22:53:02 2009
LMS 1: 1 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Post SMON to start 1st pass IR
Mon Aug 24 22:53:04 2009
LMS 0: 317502 GCS shadows traversed, 0 replayed
Mon Aug 24 22:53:04 2009
LMS 1: 302589 GCS shadows traversed, 0 replayed
Mon Aug 24 22:53:04 2009
Submitted all GCS remote-cache requests
Post SMON to start 1st pass IR
Fix write in gcs resources
Mon Aug 24 22:53:04 2009
Instance recovery: looking for dead threads
Mon Aug 24 22:53:04 2009
Beginning instance recovery of 1 threads
Reconfiguration complete
Mon Aug 24 22:53:06 2009
parallel recovery started with 3 processes
Mon Aug 24 22:53:07 2009
Started redo scan
Mon Aug 24 22:53:07 2009
Completed redo scan
53 redo blocks read, 30 data blocks need recovery
Mon Aug 24 22:53:07 2009
Started redo application at
Thread 1: logseq 10484, block 40586
Mon Aug 24 22:53:07 2009
Recovery of Online Redo Log: Thread 1 Group 7 Seq 10484 Reading mem 0
Mem# 0 errs 0: +DATADG/orcl/onlinelog/group_7.298.670427121
Mon Aug 24 22:53:08 2009
Completed redo application
Mon Aug 24 22:53:08 2009
Completed instance recovery at
Thread 1: logseq 10484, block 40639, scn 1479311755
30 data blocks read, 32 data blocks written, 53 redo blocks read
Switch log for thread 1 to sequence 10485
Mon Aug 24 22:53:27 2009
Reconfiguration started (old inc 12, new inc 14)
List of nodes:
0 1
Global Resource Directory frozen
Communication channels reestablished
* domain 0 valid = 1 according to instance 0
Mon Aug 24 22:53:27 2009
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Mon Aug 24 22:53:27 2009
LMS 0: 0 GCS shadows cancelled, 0 closed
Mon Aug 24 22:53:27 2009
LMS 1: 0 GCS shadows cancelled, 0 closed
Set master node info
Submitted all remote-enqueue requests
Dwn-cvts replayed, VALBLKs dubious
All grantable enqueues granted
Mon Aug 24 22:53:28 2009
LMS 1: 11913 GCS shadows traversed, 4001 replayed
Mon Aug 24 22:53:28 2009
LMS 0: 11725 GCS shadows traversed, 4001 replayed
Mon Aug 24 22:53:28 2009
LMS 0: 11680 GCS shadows traversed, 4001 replayed
Mon Aug 24 22:53:28 2009
LMS 1: 11945 GCS shadows traversed, 4001 replayed
Mon Aug 24 22:53:28 2009
LMS 1: 11808 GCS shadows traversed, 4001 replayed
LMS 1: 239 GCS shadows traversed, 80 replayed
Mon Aug 24 22:53:28 2009
LMS 0: 8065 GCS shadows traversed, 2737 replayed
Mon Aug 24 22:53:28 2009
Submitted all GCS remote-cache requests
Fix write in gcs resources
Reconfiguration complete
Tue Aug 25 02:11:36 2009
Thread 2 advanced to log sequence 6804
Current log# 12 seq# 6804 mem# 0: +DATADG/orcl/onlinelog/group_12.303.670427257
-------------------------
I checked the spu performance and I saw one oracle process i.e; SMON is utilising 86% CPU
CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND
*1 ? 6378 oracle 241 20 17060M 18200K run 1951:13 86.48 86.33 ora_smon_orcl*
Please Help me in investigating this issue.