# db alert node 2
Fri May 22 00:17:19 2015
SMON: Parallel transaction recovery tried <<==============
Fri May 22 00:18:25 2015
Dumping diagnostic data in directory=[cdmp_20150522001825], requested by (instance=1, osid=8926), summary=[abnormal process termination].
Fri May 22 00:19:54 2015
LMS1 (ospid: 934) has detected no messaging activity from instance 1
LMS1 (ospid: 934) issues an IMR to resolve the situation
Please check LMS1 trace file for more detail.
Fri May 22 00:19:55 2015
Communications reconfiguration: instance_number 1
Fri May 22 00:20:08 2015
Detected an inconsistent instance membership by instance 2
Fri May 22 00:20:08 2015
Received an instance abort message from instance 1
Please check instance 1 alert and LMON trace files for detail.
Fri May 22 00:20:08 2015
System state dump requested by (instance=2, osid=932 (LMS0)), summary=[abnormal instance termination].
LMS0 (ospid: 932): terminating the instance due to error 481
Fri May 22 00:20:09 2015
opiodr aborting process unknown ospid (11156) as a result of ORA-1092
# LMS 1 TRACE ON NODE 2
*** 2015-05-22 00:19:54.995
===== Idle Connection Overview =====
Idle connections [0x64da7]: 1
IdleConn List: 1[r:0.541144340,t:0x64da7]
GSIPC:IKILL: ping to inst 1 start 413095 now 413255 icktm 140 psm 1
: Pending Send Queue:
: OMSG type 65518 dest 1.2 waited 333699063 usec
: OMSG type 65521 dest 1.2 waited 333699063 usec
: OMSG type 65518 dest 1.2 waited 333679066 usec
: OMSG type 34 dest 1.2 waited 333679066 usec
: OMSG type 34 dest 1.2 waited 333679066 usec
: OMSG type 65518 dest 1.2 waited 333679066 usec
: OMSG type 65518 dest 1.2 waited 333659065 usec
: OMSG type 65518 dest 1.2 waited 333639048 usec
...
Reporting Communication error with instance 1
*** 2015-05-22 00:19:55.004
kjctsrcikill: Invoking KST Dump on receiver Kill <<<<<<<<<<<<<<<<<<<<<kjc.h: Kernel Lock Manager Communication layer
-------------------------------------------------------------------------------
Trace Bucket Dump of current process skipped - empty bucket
-------------------------------------------------------------------------------
KSI PGA Bucket: <<<<<<<<<<<<<<<<<<<ksi: Instance locks
-------------------------------------------------------------------------------
Trace Bucket Dump Begin: KSI
TIME(*=approx):SEQ:FILE@LINE:FUNCTION: DATA
-------------------------------------------------------------------------------
-------------------------------------------------------------------------------
Trace Bucket Dump End: KSI
*** 2015-05-22 00:19:55.004
kjctsrcikill: Completed KST Dump on receiver Kill
*** 2015-05-22 00:20:08.960
Received a speical admin message (type 2) from instance 1
Message body: flag 0x1 data 0x92804 0xc0 0x2f493d
Abort the instance
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+544<-kjzdssdmp()+400<-kjzduptcctx()+432<-kjzdicrshnfy()+128<-$cold_ksuitm()+5808<-kjxmrsv()+496<-kjctr_pmsg()+7696<-kjctr_watq()+848<-kjctr_rksxp()+1616<-kjctrcv()+416<-kjcsrmg()+160<-kjmsm()+22656<-ksbrdp()+2736<-opirip()+1296<-opidrv()+1152<-sou2o()+256
<-opimai_real()+352<-ssthrdmain()+576<-main()+336<-main_opd_entry()+80
----- End of Abridged Call Stack Trace -----
# LMON TRACE ON NODE2
*** 2015-05-22 00:19:55.004
2015-05-22 00:19:55.003731 : kjxgrcomerr: Communications reconfig: instance 1 (20,20) <<<<<<<<<<<<kjxg* : the CGS layer
2015-05-22 00:19:55.121064 : kjxgrrcfg: done - ret = 3 hist 0x1679a (initial rsn: 3)
kjxgrrcfgchk: Initiating reconfig, reason=3
kjxgrrcfgchk: COMM rcfg - Disk Vote Required
kjfmReceiverHealthCB_CheckAll: Recievers are healthy.
2015-05-22 00:19:55.121164 : kjxgrnetchk: start 0xbbd56d04, end 0xbbd6234e
2015-05-22 00:19:55.121232 : kjxgrnetchk: Network Validation wait: 46 sec
kjxgrnetchk: ce-event: from inst 2 to inst 1 ver 0x3534d1a
kjxgrrcfgchk: prev pstate 6 mapsz 512
kjxgrrcfgchk: new bmp: 1 2
kjxgrrcfgchk: work bmp: 1 2
kjxgrrcfgchk: rr bmp: 1 2
#osw interval 3s
udp:
23352 incomplete headers
0 bad checksums
23352 socket overflows
ip:
61045476513 total packets received
25311 bad IP headers
2125370576 fragments received
0 fragments dropped (dup or out of space)
0 fragments dropped after timeout
0 packets forwarded
0 packets not forwardable
==============================
udp:
23664 incomplete headers
0 bad checksums
23664 socket overflows
ip:
62160103655 total packets received
25846 bad IP headers
2165186814 fragments received
0 fragments dropped (dup or out of space)
0 fragments dropped after timeout
0 packets forwarded
0 packets not forwardable
2 nodes DB v 11.2.0.3.7 on hpux ia11-31
Question:
1, the root causes of this phenomenon will be lms hang it? Or network problems?
2, the lms communication 300 seconds timeout control in which configuration? can to increate it?
3, "OMSG type 65518" what 's mean?