RAC Hang
HI all!
2x Oracle 10gR2(10.2.0.3) , Windows 2003 EE( 9GB RAM).
My database was hang (two node in the same time).
At Tue Feb 05 14:36:18 2008 every session was hang, i canot log on SYS.
Windows eventslog is empty and no errors in storage array.
after the node was hang i wait few minutes, but nothing was change, so i use srvctl to shutdown node1, and then i restart the serwer (node1).
I canot shutdown Node2 using srvctl so i restart the serwer (restart windows)
After i was restart nodes everything backu to normal (works fine)
altert log NODA1
Tue Feb 05 14:36:18 2008
Errors in file d:\oracle\product\admin\kalobr\bdump\kalobr1_lmon_4460.trc:
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:18 2008
LMON: terminating instance due to error 481
Tue Feb 05 14:36:18 2008
Errors in file d:\oracle\product\admin\kalobr\bdump\kalobr1_lms1_768.trc:
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:18 2008
Errors in file d:\oracle\product\admin\kalobr\bdump\kalobr1_lms2_3344.trc:
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:19 2008
WARNING: inbound connection timed out (ORA-3136)
Tue Feb 05 14:36:19 2008
Errors in file d:\oracle\product\admin\kalobr\udump\kalobr1_ora_936.trc:
ORA-00603: sesja serwera ORACLE zakończona błędem krytycznym
ORA-00449: proces drugoplanowy 'LCK0' niespodziewanie zakończony z błędem 481
ORA-00481: proces LMON zakończony błędem
ORA-00604: wystąpił błąd na poziomie 3 rekurencyjnego SQL
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:19 2008
Errors in file d:\oracle\product\admin\kalobr\bdump\kalobr1_lmd0_4960.trc:
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:19 2008
Errors in file d:\oracle\product\admin\kalobr\bdump\kalobr1_q003_1596.trc:
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:19 2008
Errors in file d:\oracle\product\admin\kalobr\bdump\kalobr1_pmon_2904.trc:
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:19 2008
Errors in file d:\oracle\product\admin\kalobr\bdump\kalobr1_lck0_2880.trc:
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:19 2008
Errors in file d:\oracle\product\admin\kalobr\bdump\kalobr1_lgwr_3744.trc:
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:19 2008
Errors in file d:\oracle\product\admin\kalobr\udump\kalobr1_ora_1460.trc:
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:19 2008
Errors in file d:\oracle\product\admin\kalobr\udump\kalobr1_ora_4748.trc:
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:19 2008
Errors in file d:\oracle\product\admin\kalobr\bdump\kalobr1_psp0_4288.trc:
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:19 2008
Errors in file d:\oracle\product\admin\kalobr\bdump\kalobr1_lms0_672.trc:
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:20 2008
Errors in file d:\oracle\product\admin\kalobr\bdump\kalobr1_j000_5072.trc:
ORA-00604: wystąpił błąd na poziomie 1 rekurencyjnego SQL
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:20 2008
Errors in file d:\oracle\product\admin\kalobr\bdump\kalobr1_q002_4316.trc:
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:21 2008
Errors in file d:\oracle\product\admin\kalobr\bdump\kalobr1_ckpt_644.trc:
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:21 2008
Errors in file d:\oracle\product\admin\kalobr\udump\kalobr1_ora_4748.trc:
ORA-07445: napotkano wyjątek: zrzut pamięci [ACCESS_VIOLATION] [kxsdcbc+696] [PC:0x9B35A6] [ADDR:0x1C] [UNABLE_TO_READ] []
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:22 2008
Errors in file d:\oracle\product\admin\kalobr\udump\kalobr1_ora_1460.trc:
ORA-07445: napotkano wyjątek: zrzut pamięci [ACCESS_VIOLATION] [kxsdcbc+988] [PC:0x9B36CA] [ADDR:0x1C] [UNABLE_TO_READ] []
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:23 2008
Errors in file d:\oracle\product\admin\kalobr\udump\kalobr1_ora_4748.trc:
ORA-07445: napotkano wyjątek: zrzut pamięci [ACCESS_VIOLATION] [kxsdcbc+696] [PC:0x9B35A6] [ADDR:0x1C] [UNABLE_TO_READ] []
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:23 2008
Errors in file d:\oracle\product\admin\kalobr\udump\kalobr1_ora_1460.trc:
ORA-07445: napotkano wyjątek: zrzut pamięci [ACCESS_VIOLATION] [kxsdcbc+988] [PC:0x9B36CA] [ADDR:0x1C] [UNABLE_TO_READ] []
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:24 2008
Errors in file d:\oracle\product\admin\kalobr\udump\kalobr1_ora_4748.trc:
ORA-07445: napotkano wyjątek: zrzut pamięci [ACCESS_VIOLATION] [kxsdcbc+696] [PC:0x9B35A6] [ADDR:0x1C] [UNABLE_TO_READ] []
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:24 2008
Errors in file d:\oracle\product\admin\kalobr\udump\kalobr1_ora_936.trc:
ORA-00603: sesja serwera ORACLE zakończona błędem krytycznym
ORA-00449: proces drugoplanowy 'LGWR' niespodziewanie zakończony z błędem 481
ORA-00481: proces LMON zakończony błędem
ORA-00603: sesja serwera ORACLE zakończona błędem krytycznym
ORA-00449: proces drugoplanowy 'LCK0' niespodziewanie zakończony z błędem 481
ORA-00481: proces LMON zakończony błędem
ORA-00604: wystąpił błąd na poziomie 3 rekurencyjnego SQL
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:25 2008
Errors in file d:\oracle\product\admin\kalobr\udump\kalobr1_ora_1460.trc:
ORA-07445: napotkano wyjątek: zrzut pamięci [ACCESS_VIOLATION] [kxsdcbc+988] [PC:0x9B36CA] [ADDR:0x1C] [UNABLE_TO_READ] []
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:25 2008
Errors in file d:\oracle\product\admin\kalobr\bdump\kalobr1_rbal_4492.trc:
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:25 2008
Errors in file d:\oracle\product\admin\kalobr\bdump\kalobr1_mman_568.trc:
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:25 2008
Errors in file d:\oracle\product\admin\kalobr\bdump\kalobr1_dbw0_1260.trc:
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:36:27 2008
WARNING: inbound connection timed out (ORA-3136)
Tue Feb 05 14:36:35 2008
ORA-449 encountered when generating server alert SMG-3503
Tue Feb 05 14:36:36 2008
Doing block recovery for file 2 block 153
Tue Feb 05 14:36:57 2008
Errors in file d:\oracle\product\admin\kalobr\bdump\kalobr1_o002_4016.trc:
ORA-00481: proces LMON zakończony błędem
Tue Feb 05 14:37:07 2008
Errors in file d:\oracle\product\admin\kalobr\bdump\kalobr1_reco_1448.trc:
ORA-00481: proces LMON zakończony błędem
Dump file d:\oracle\product\admin\kalobr\bdump\kalobr1_lmon_4460.trc
Sat Jan 26 16:41:09 2008
ORACLE V10.2.0.3.0 - 64bit Production vsnsta=0
vsnsql=14 vsnxtr=3
Oracle Database 10g Release 10.2.0.3.0 - 64bit Production
With the Real Application Clusters option
Windows Server 2003 Version V5.2 Service Pack 2
CPU : 4 - type 8664, 2 Physical Cores
Process Affinity : 0x0000000000000000
Memory (Avail/Total): Ph:2102M/8189M, Ph+PgF:3872M/9795M
Instance name: kalobr1
Redo thread mounted by this instance: 0 <none>
Oracle process number: 5
Windows thread id: 4460, image: ORACLE.EXE (LMON)
*** SERVICE NAME:() 2008-01-26 16:41:09.281
*** SESSION ID:(222.1) 2008-01-26 16:41:09.281
GES resources 5035 pool 4
GES enqueues 7451
GES IPC: Receivers 4 Senders 4
GES IPC: Buffers Receive 1000 Send (i:1150 b:1150) Reserve 401
GES IPC: Msg Size Regular 408 Batch 8192
Batching factor: enqueue replay 201, ack 224
Batching factor: cache replay 126 size per lock 64
kjxggin: receive buffer size = 32768
high load threshold = 280
*** 2008-01-26 16:41:09.546
kjxgmrcfg: Reconfiguration started, reason 1
kjxgmcs: Setting state to 0 0.
*** 2008-01-26 16:41:09.546
Name Service frozen
kjxgmcs: Setting state to 0 1.
kjxgrssvote: reconfig bitmap chksum 0xb1c6 cnt 1 master 0 ret 0
kjfcpiora: published my fusion master weight 420261
kjfcpiora: published my enqueue weight 5035
kjfcpiora: publish my flogb 14
kjfcpiora: publish my cluster_database_instances parameter = 2
kjxggpoll: change poll time to 50 ms
kjxgmps: proposing substate 2
kjxgmcs: Setting state to 2 2.
kjfmuin: bitmap 0
kjfmmhi: received msg from 0 (inc 2)
Performed the unique instance identification check
kjxgmps: proposing substate 3
kjxgmcs: Setting state to 2 3.
Name Service recovery started
Deleted all dead-instance name entries
kjxgmps: proposing substate 4
kjxgmcs: Setting state to 2 4.
Multicasted all local name entries for publish
Replayed all pending requests
kjxgmps: proposing substate 5
kjxgmcs: Setting state to 2 5.
Name Service normal
Name Service recovery done
*** 2008-01-26 16:41:09.718
kjxgmps: proposing substate 6
kjxgmcs: Setting state to 2 6.
*** 2008-01-26 16:41:09.906
kjfcrfg: DRM window size = 0->2048 (min lognb = 14)
*** 2008-01-26 16:41:09.968
Reconfiguration started (old inc 0, new inc 2)
Synchronization timeout interval: 900 sec
List of nodes:
0
*** 2008-01-26 16:41:09.968
kjxggpoll: change poll time to 600 ms
Global Resource Directory frozen
node 0
release 10 2 0 3
number of mastership buckets = 128
domain attach called for domid 0
* kjbdomalc: domain 0 invalid = TRUE
* kjbdomatt: first attach for domain 0
asby init, 0/0/x1
asby returns, 0/0/x1/false
* Domain maps before reconfiguration:
* DOMAIN 0 (valid 0): 0
* End of domain mappings
* Domain maps after recomputation:
* DOMAIN 0 (valid 0): 0
* End of domain mappings
Active Sendback Threshold = 50 %
Communication channels reestablished
sent syncr inc 2 lvl 1 to 0 (2,5/0/0)
sent syncr inc 2 lvl 2 to 0 (2,7/0/0)
Master broadcasted resource hash value bitmaps
Non-local Process blocks cleaned out
Set master node info
sent syncr inc 2 lvl 3 to 0 (2,13/0/0)
Submitted all remote-enqueue requests
sent syncr inc 2 lvl 4 to 0 (2,15/0/0)
Dwn-cvts replayed, VALBLKs dubious
sent syncr inc 2 lvl 5 to 0 (2,18/0/0)
All grantable enqueues granted
sent syncr inc 2 lvl 6 to 0 (2,20/0/0)
*** 2008-01-26 16:41:12.109
Post SMON to start 1st pass IR
Submitted all GCS cache requests
sent syncr inc 2 lvl 7 to 0 (2,22/0/0)
Fix write in gcs resources
sent syncr inc 2 lvl 8 to 0 (2,24/0/0)
*** 2008-01-26 16:41:13.109
Reconfiguration complete
* domain 0 valid?: 0
*** 2008-01-26 16:41:22.406
kjxgrtmc2: mounting member 0 thread 1
*** 2008-01-26 16:49:54.031
kjxgmpoll reconfig bitmap: 0 1
*** 2008-01-26 16:49:54.031
kjxgmrcfg: Reconfiguration started, reason 1
kjxgmcs: Setting state to 2 0.
*** 2008-01-26 16:49:54.031
Name Service frozen
kjxgmcs: Setting state to 2 1.
kjxgrssvote: reconfig bitmap chksum 0x1314e cnt 2 master 0 ret 0
kjxggpoll: change poll time to 50 ms
*** 2008-01-26 16:49:54.359
Obtained RR update lock for sequence 3, RR seq 2
*** 2008-01-26 16:49:54.359
Voting results, upd 0, seq 4, bitmap: 0 1
kjxgmps: proposing substate 2
kjxgmcs: Setting state to 4 2.
kjfmuin: bitmap 0 1
kjfmmhi: received msg from 0 (inc 2)
kjfmmhi: received msg from 1 (inc 4)
Performed the unique instance identification check
kjxgmps: proposing substate 3
kjxgmcs: Setting state to 4 3.
Name Service recovery started
Deleted all dead-instance name entries
kjxgmps: proposing substate 4
kjxgmcs: Setting state to 4 4.
Multicasted all local name entries for publish
Replayed all pending requests
kjxgmps: proposing substate 5
kjxgmcs: Setting state to 4 5.
Name Service normal
Name Service recovery done
*** 2008-01-26 16:49:55.390
kjxgmps: proposing substate 6
kjxgmcs: Setting state to 4 6.
*** 2008-01-26 16:49:55.609
kjfcrfg: DRM window size = 2048->2048 (min lognb = 14)
*** 2008-01-26 16:49:55.609
Reconfiguration started (old inc 2, new inc 4)
Synchronization timeout interval: 900 sec
List of nodes:
0 1
*** 2008-01-26 16:49:55.609
*** 2008-01-26 16:49:55.609
kjfcrfg: query of NESTED_RECONFIGURATION for node 1 failed with 7
kjxggpoll: change poll time to 600 ms
Global Resource Directory frozen
node 0
node 1
release 10 2 0 3
asby init, 0/0/x2
asby returns, 0/0/x2/false
* Domain maps before reconfiguration:
* DOMAIN 0 (valid 1): 0
* End of domain mappings
* Domain maps after recomputation:
* DOMAIN 0 (valid 1): 0 1
* End of domain mappings
Dead inst
Join inst 1
Exist inst 0
Active Sendback Threshold = 50 %
Communication channels reestablished
sent syncr inc 4 lvl 1 to 0 (4,5/0/0)
sent synca inc 4 lvl 1 (4,5/0/0)
received all domreplay (4.6)
sent master 0 (4.6)
*** 2008-01-26 16:49:55.796
KJBDOMHVMAP: BEGINS
*** 2008-01-26 16:49:55.796
KJBDOMHVMAP: ENDS
sent dom info (4.6)
sent hv info (4.6)
sent pt info (4.6)
sent syncr inc 4 lvl 2 to 0 (4,7/0/0)
sent synca inc 4 lvl 2 (4,7/0/0)
Master broadcasted resource hash value bitmaps
* kjfcrfg: domain 0 valid, valid_ver = 4
Non-local Process blocks cleaned out
Set master node info
sent syncr inc 4 lvl 3 to 0 (4,13/0/0)
sent synca inc 4 lvl 3 (4,13/0/0)
Submitted all remote-enqueue requests
kjfcrfg: Number of mesgs sent to node 1 = 4130
sent syncr inc 4 lvl 4 to 0 (4,15/0/0)
sent synca inc 4 lvl 4 (4,15/0/0)
Dwn-cvts replayed, VALBLKs dubious
sent syncr inc 4 lvl 5 to 0 (4,18/0/0)
sent synca inc 4 lvl 5 (4,18/0/0)
All grantable enqueues granted
sent syncr inc 4 lvl 6 to 0 (4,20/0/0)
sent synca inc 4 lvl 6 (4,20/0/0)
Submitted all GCS cache requests
sent syncr inc 4 lvl 7 to 0 (4,22/0/0)
sent synca inc 4 lvl 7 (4,22/0/0)
Post SMON to start 1st pass IR
Fix write in gcs resources
sent syncr inc 4 lvl 8 to 0 (4,24/0/0)
sent synca inc 4 lvl 8 (4,24/0/0)
*** 2008-01-26 16:49:58.843
Reconfiguration complete
* domain 0 valid?: 1
*** 2008-01-26 16:50:18.015
Begin DRM(2)
sent syncr inc 4 lvl 9 to 0 (4,0/31/0)
sent synca inc 4 lvl 9 (4,0/31/0)
sent syncr inc 4 lvl 10 to 0 (4,0/34/0)
sent synca inc 4 lvl 10 (4,0/34/0)
sent syncr inc 4 lvl 11 to 0 (4,0/36/0)
sent synca inc 4 lvl 11 (4,0/36/0)
sent syncr inc 4 lvl 12 to 0 (4,0/38/0)
sent synca inc 4 lvl 12 (4,0/38/0)
sent syncr inc 4 lvl 13 to 0 (4,0/31/0)
sent synca inc 4 lvl 13 (4,0/31/0)
sent syncr inc 4 lvl 14 to 0 (4,0/34/0)
sent synca inc 4 lvl 14
.....................
On node2 no errors message in alert.log (it was just hang)zawiesił
Thanks in advice!