Skip to Main Content

Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

An Instance Eviction 3 times, and cpu,memory,I/O not busy(osw show cpu used <10%),Lms process commun

anbobMay 28 2015 — edited May 29 2015

# db alert node 2

Fri May 22 00:17:19 2015

SMON: Parallel transaction recovery tried  <<==============

Fri May 22 00:18:25 2015

Dumping diagnostic data in directory=[cdmp_20150522001825], requested by (instance=1, osid=8926), summary=[abnormal process termination].

Fri May 22 00:19:54 2015

LMS1 (ospid: 934) has detected no messaging activity from instance 1

LMS1 (ospid: 934) issues an IMR to resolve the situation

Please check LMS1 trace file for more detail.

Fri May 22 00:19:55 2015

Communications reconfiguration: instance_number 1

Fri May 22 00:20:08 2015

Detected an inconsistent instance membership by instance 2

Fri May 22 00:20:08 2015

Received an instance abort message from instance 1

Please check instance 1 alert and LMON trace files for detail.

Fri May 22 00:20:08 2015

System state dump requested by (instance=2, osid=932 (LMS0)), summary=[abnormal instance termination].

LMS0 (ospid: 932): terminating the instance due to error 481

Fri May 22 00:20:09 2015

opiodr aborting process unknown ospid (11156) as a result of ORA-1092

# LMS 1 TRACE ON NODE 2

*** 2015-05-22 00:19:54.995

===== Idle Connection Overview =====

Idle connections [0x64da7]: 1

IdleConn List: 1[r:0.541144340,t:0x64da7]

GSIPC:IKILL: ping to inst 1 start 413095 now 413255 icktm 140 psm 1

  : Pending Send Queue:

  :  OMSG type 65518 dest 1.2 waited 333699063 usec

  :  OMSG type 65521 dest 1.2 waited 333699063 usec

  :  OMSG type 65518 dest 1.2 waited 333679066 usec

  :  OMSG type 34 dest 1.2 waited 333679066 usec

  :  OMSG type 34 dest 1.2 waited 333679066 usec

  :  OMSG type 65518 dest 1.2 waited 333679066 usec

  :  OMSG type 65518 dest 1.2 waited 333659065 usec

  :  OMSG type 65518 dest 1.2 waited 333639048 usec

...

Reporting Communication error with instance 1

*** 2015-05-22 00:19:55.004

kjctsrcikill: Invoking KST Dump on receiver Kill      <<<<<<<<<<<<<<<<<<<<<kjc.h: Kernel Lock Manager Communication  layer

-------------------------------------------------------------------------------

Trace Bucket Dump of current process skipped - empty bucket

-------------------------------------------------------------------------------

KSI PGA Bucket:   <<<<<<<<<<<<<<<<<<<ksi: Instance locks

-------------------------------------------------------------------------------

Trace Bucket Dump Begin: KSI

TIME(*=approx):SEQ:FILE@LINE:FUNCTION: DATA

-------------------------------------------------------------------------------

-------------------------------------------------------------------------------

Trace Bucket Dump End: KSI

*** 2015-05-22 00:19:55.004

kjctsrcikill: Completed KST Dump on receiver Kill

*** 2015-05-22 00:20:08.960

Received a speical admin message (type 2) from instance 1

Message body: flag 0x1 data 0x92804 0xc0 0x2f493d

Abort the instance

kjzduptcctx: Notifying DIAG for crash event

----- Abridged Call Stack Trace -----

ksedsts()+544<-kjzdssdmp()+400<-kjzduptcctx()+432<-kjzdicrshnfy()+128<-$cold_ksuitm()+5808<-kjxmrsv()+496<-kjctr_pmsg()+7696<-kjctr_watq()+848<-kjctr_rksxp()+1616<-kjctrcv()+416<-kjcsrmg()+160<-kjmsm()+22656<-ksbrdp()+2736<-opirip()+1296<-opidrv()+1152<-sou2o()+256

<-opimai_real()+352<-ssthrdmain()+576<-main()+336<-main_opd_entry()+80

----- End of Abridged Call Stack Trace -----

# LMON TRACE ON NODE2

*** 2015-05-22 00:19:55.004

2015-05-22 00:19:55.003731 : kjxgrcomerr: Communications reconfig: instance 1 (20,20)    <<<<<<<<<<<<kjxg* : the CGS layer

2015-05-22 00:19:55.121064 : kjxgrrcfg: done - ret = 3  hist 0x1679a (initial rsn: 3)

kjxgrrcfgchk: Initiating reconfig, reason=3

kjxgrrcfgchk: COMM rcfg - Disk Vote Required

kjfmReceiverHealthCB_CheckAll: Recievers are healthy.

2015-05-22 00:19:55.121164 : kjxgrnetchk: start 0xbbd56d04, end 0xbbd6234e

2015-05-22 00:19:55.121232 : kjxgrnetchk: Network Validation wait: 46 sec

kjxgrnetchk: ce-event: from inst 2 to inst 1 ver 0x3534d1a

kjxgrrcfgchk: prev pstate 6  mapsz 512

kjxgrrcfgchk: new  bmp: 1 2

kjxgrrcfgchk: work bmp: 1 2

kjxgrrcfgchk: rr  bmp: 1 2

#osw interval  3s

udp:

        23352 incomplete headers

        0 bad checksums

        23352 socket overflows

ip:

        61045476513 total packets received

        25311 bad IP headers

        2125370576 fragments received

        0 fragments dropped (dup or out of space)

        0 fragments dropped after timeout

        0 packets forwarded

        0 packets not forwardable

==============================

udp:

        23664 incomplete headers

        0 bad checksums

        23664 socket overflows

ip:

        62160103655 total packets received

        25846 bad IP headers

        2165186814 fragments received

        0 fragments dropped (dup or out of space)

        0 fragments dropped after timeout

        0 packets forwarded

        0 packets not forwardable

2 nodes DB v 11.2.0.3.7 on hpux ia11-31

Question:

  1, the root causes of this phenomenon will be lms hang it? Or network problems?

  2, the lms communication 300 seconds timeout control in which configuration? can to increate it?

  3, "OMSG type 65518" what 's mean?

Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jun 26 2015
Added on May 28 2015
2 comments
1,331 views