Database Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Data Guard switchover with broker fails with ORA-16570 / ORA-16665

User_LKVFASep 10 2014 — edited Sep 10 2014

I'm using Data Guard 11g on Windows Server 2012 64 Bit. I recently configured the broker in order to handle switchovers more comfortable.

Unfortunately the switchover doesn't work. Broker confguration seems fine, since I can activate t and redo apply is working perfectly...

Now, the behaviour is the following.... Once I commit the switchover via broker it shows the following:

DGMGRL> switchover to smart_stb

Performing switchover NOW, please wait...

Error: ORA-16665: timeout waiting for the result from a database

Failed.

Unable to switchover, primary database is still "smart"

The switchover itself does happen, there is no restart of the primary database though!! Doesn't seem to be related to a buggy DGMGRL service entry. Actually the database won't even come down. The primary is supending every process while trying to do so. Broker is running in a timeout of course, but the primary remains in a supending state. The primary doesn't seem to be able to cominicate with the standby side or vice versa. Which is odd since redo log shipping and apply is workng fine as mentioned... so does a manual switchover. I have no idea what could cause this kind of behavour... Double checked everything. Once I restart both databases and bring them into the right state the switchover is completed. But no need to say that this isn't the intention of using the broker....

drc.log primary shows the following:

2014-09-10 09:13:01.378 02000000 746441090 DMON: SWITCHOVER TO smart_stb

2014-09-10 09:13:01.378 02000000 746441090 DMON: start task execution: SWITCHOVER

2014-09-10 09:13:01.378 NSV1: Using RFIUPI_DGCI_CDESC

2014-09-10 09:13:01.409 NSV1: Using RFIUPI_DGCI_CDESC

2014-09-10 09:13:01.409 02001000 746441090 Notifying Oracle Clusterware to teardown primary database for SWITCHOVER

2014-09-10 09:13:01.425 CLSR: CRS not configured, config = 2

2014-09-10 09:13:01.425 02001000 746441090 DMON: posting primary instances for SWITCHOVER phase 1

2014-09-10 09:13:01.425 02001000 746441090 DMON: status from rfi_post_instances() for CTL_SWITCH = ORA-00000

2014-09-10 09:13:01.425 INSV: Received message for inter-instance publication

2014-09-10 09:13:01.425 02001000 746441090 DMON: dispersing message to standbys for SWITCHOVER phase BEGIN

2014-09-10 09:13:01.425 req ID 1.1.746441090, opcode CTL_SWITCH, phase BEGIN, flags 5

2014-09-10 09:13:01.425 NSV1: Using RFIUPI_DB_INST_CDESC

2014-09-10 09:13:01.425 02001000 746441090 DMON: Entered rfmsoexinst() for phase BEGIN

2014-09-10 09:13:01.425 INSV: Reply received for message with

2014-09-10 09:13:01.425 req ID 1.1.746441090, opcode CTL_SWITCH, phase BEGIN

2014-09-10 09:13:01.425 02001000 746441090 DMON: posting primary instances for SWITCHOVER phase 2

2014-09-10 09:13:01.440 02001000 746441090 DMON: status from rfi_post_instances() for CTL_SWITCH = ORA-00000

2014-09-10 09:13:01.440 INSV: Received message for inter-instance publication

2014-09-10 09:13:01.440 02001000 746441090 DMON: dispersing message to standbys for SWITCHOVER phase TEARDOWN

2014-09-10 09:13:01.440 req ID 1.1.746441090, opcode CTL_SWITCH, phase TEARDOWN, flags 5

2014-09-10 09:13:01.440 02001000 746441090 DMON: Entered rfmsoexinst() for phase TEARDOWN

2014-09-10 09:13:01.440 RSM0: Received Set State Request: rid=0x01041001, sid=0, phid=1, econd=2, sitehndl=0x02001000

2014-09-10 09:13:01.440 Log Transport Resource: SetState OFFLINE, phase TEAR-DOWN, External Cond SWITCH-OVER-PHYS_STBY

2014-09-10 09:13:01.440 RSM0: Received Set State Request: rid=0x01011001, sid=4, phid=1, econd=2, sitehndl=0x02001000

2014-09-10 09:13:01.440 Database Resource[IAM=PRIMARY]: SetState PHYSICAL-APPLY-ON, phase TEAR-DOWN, External Cond SWITCH-OVER-PHYS_STBY, Target Site Handle 0x02001000

2014-09-10 09:13:01.440 RSM0: Executing SQL [ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY WITH SESSION SHUTDOWN]

2014-09-10 09:13:06.630 SQL [ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY WITH SESSION SHUTDOWN] Executed successfully

2014-09-10 09:13:06.630 RSM: clearing IncarnationTable internal property of site 0x01010000

2014-09-10 09:13:06.630 02001000 746441090 DMON: Broker determines that instance restart is required: Operation = SWITCHOVER status = ORA-16570

2014-09-10 09:13:06.630 02001000 746441090 Resource: smart (01011001) State: PHYSICAL-APPLY-ON

2014-09-10 09:13:06.630 INSV: Reply received for message with

2014-09-10 09:13:06.630 req ID 1.1.746441090, opcode CTL_SWITCH, phase TEARDOWN

2014-09-10 09:13:06.630 02001000 746441090 DMON: Instance "smart" (ID 1) returned ORA-16570

2014-09-10 09:13:06.630 02001000 746441090 for phase TEARDOWN of operation CTL_SWITCH

2014-09-10 09:13:06.646 NSV1: Using RFIUPI_DGCI_CDESC

2014-09-10 09:13:06.646 02001000 746441090 DMON: posting primary instances for SWITCHOVER phase 2

2014-09-10 09:13:06.646 INSV: Received message for inter-instance publication

2014-09-10 09:13:06.646 02001000 746441090 DMON: status from rfi_post_instances() for CTL_SWITCH = ORA-00000

2014-09-10 09:13:06.646 req ID 1.1.746441090, opcode CTL_SWITCH, phase TEARDOWN, flags 5

2014-09-10 09:13:06.646 02001000 746441090 DMON: dispersing message to standbys for SWITCHOVER phase TEARDOWN

2014-09-10 09:13:06.646 02001000 746441090 DMON: Entered rfmsoexinst() for phase TEARDOWN

2014-09-10 09:13:06.646 NSV1: Using RFIUPI_DB_INST_CDESC

2014-09-10 09:13:06.646 INSV: Reply received for message with

2014-09-10 09:13:06.646 req ID 1.1.746441090, opcode CTL_SWITCH, phase TEARDOWN

2014-09-10 09:13:06.646 02001000 746441090 DMON: Instance "smart" (ID 1) returned ORA-16570

2014-09-10 09:13:06.646 02001000 746441090 for phase TEARDOWN of operation CTL_SWITCH

2014-09-10 09:13:28.619 NSV1: Using RFIUPI_DB_INST_CDESC

2014-09-10 09:13:28.619 NSV1: Site smart_stb returned ORA-16665.

2014-09-10 09:13:43.620 NSV1: Using RFIUPI_DB_INST_CDESC

2014-09-10 09:13:43.620 NSV1: Site smart_stb returned ORA-16665.

2014-09-10 09:13:58.639 NSV1: Using RFIUPI_DB_INST_CDESC

2014-09-10 09:13:58.639 NSV1: Site smart_stb returned ORA-16665.

2014-09-10 09:14:13.657 NSV1: Using RFIUPI_DB_INST_CDESC

2014-09-10 09:14:13.657 NSV1: Site smart_stb returned ORA-16665.

2014-09-10 09:14:28.674 NSV1: Using RFIUPI_DB_INST_CDESC

2014-09-10 09:14:28.674 NSV1: Site smart_stb returned ORA-16665.

2014-09-10 09:14:43.676 NSV1: Using RFIUPI_DB_INST_CDESC

2014-09-10 09:14:43.676 NSV1: Site smart_stb returned ORA-16665.

2014-09-10 09:14:58.693 NSV1: Using RFIUPI_DB_INST_CDESC

2014-09-10 09:14:58.693 NSV1: Site smart_stb returned ORA-16665.

2014-09-10 09:15:13.695 NSV1: Using RFIUPI_DB_INST_CDESC

2014-09-10 09:15:13.695 NSV1: Site smart_stb returned ORA-16665.

2014-09-10 09:15:28.707 NSV1: Using RFIUPI_DB_INST_CDESC

2014-09-10 09:15:28.707 NSV1: Site smart_stb returned ORA-16665.

2014-09-10 09:15:43.724 NSV1: Using RFIUPI_DB_INST_CDESC

2014-09-10 09:15:43.724 NSV1: Site smart_stb returned ORA-16665.

2014-09-10 09:15:58.726 NSV1: Using RFIUPI_DB_INST_CDESC

2014-09-10 09:15:58.726 NSV1: Site smart_stb returned ORA-16665.

2014-09-10 09:16:13.727 NSV1: Using RFIUPI_DB_INST_CDESC

2014-09-10 09:16:13.727 NSV1: Site smart_stb returned ORA-16665.

2014-09-10 09:16:28.729 02001000 746441090 DMON: Database smart_stb returned ORA-16665

2014-09-10 09:16:28.729 02001000 746441090 for opcode = CTL_SWITCH, phase = TEARDOWN, req_id = 1.1.746441090

2014-09-10 09:16:28.729 02001000 746441090 Operation CTL_SWITCH canceled during phase 2, error = ORA-16665

2014-09-10 09:16:28.729 02001000 746441090 DMON: Switchover operation failed with status ORA-16665

2014-09-10 09:16:28.729 NSV1: Using RFIUPI_DGCI_CDESC

2014-09-10 09:16:28.744 02001000 746441090 DMON: posting primary instances for SWITCHOVER phase 5

2014-09-10 09:16:28.744 INSV: Received message for inter-instance publication

2014-09-10 09:16:28.744 02001000 746441090 DMON: status from rfi_post_instances() for CTL_SWITCH = ORA-00000

2014-09-10 09:16:28.744 req ID 1.1.746441090, opcode CTL_SWITCH, phase END, flags 5

2014-09-10 09:16:28.744 02001000 746441090 DMON: Switchover Aborted due to errors

2014-09-10 09:16:28.744 02001000 746441090 Site named: smart is still primary

2014-09-10 09:16:28.744 02001000 746441090 error = ORA-16665

drc.log standby shows the following:

2014-09-10 09:13:01.352 02001000 746441090 Notifying Oracle Clusterware to teardown target standby database for SWITCHOVER

2014-09-10 09:13:01.367 CLSR: CRS not configured, config = 2

2014-09-10 09:13:01.367 02001000 746441090 DMON: posting standby instances for SWITCHOVER phase 1

2014-09-10 09:13:01.367 INSV: Received message for inter-instance publication

2014-09-10 09:13:01.367 req ID 1.1.746441090, opcode CTL_SWITCH, phase BEGIN, flags 5

2014-09-10 09:13:01.367 02001000 746441090 DMON: Entered rfmsoexinst() for phase BEGIN

2014-09-10 09:13:01.367 INSV: Reply received for message with

2014-09-10 09:13:01.367 req ID 1.1.746441090, opcode CTL_SWITCH, phase BEGIN

2014-09-10 09:13:01.367 02001000 746441090 DMON: Entered rfm_release_chief_lock() for CTL_SWITCH

2014-09-10 09:13:06.571 02001000 746441090 DMON: Entered rfm_get_chief_lock() for CTL_SWITCH, reason 0

2014-09-10 09:13:06.571 02001000 746441090 DMON: start task execution: SWITCHOVER

2014-09-10 09:13:06.586 02001000 746441090 DMON: posting standby instances for SWITCHOVER phase 2

2014-09-10 09:13:06.586 INSV: Received message for inter-instance publication

2014-09-10 09:13:06.586 req ID 1.1.746441090, opcode CTL_SWITCH, phase TEARDOWN, flags 5

2014-09-10 09:13:06.586 02001000 746441090 DMON: Entered rfmsoexinst() for phase TEARDOWN

2014-09-10 09:13:06.586 RSM0: Received Set State Request: rid=0x02031001, sid=0, phid=1, econd=2, sitehndl=0x02001000

2014-09-10 09:13:06.586 Redo Apply Resource: SetState OFFLINE, phase TEAR-DOWN, External Cond SWITCH-OVER-PHYS_STBY

2014-09-10 09:13:06.586 RSM0: Received Set State Request: rid=0x02012001, sid=9, phid=1, econd=2, sitehndl=0x02001000

2014-09-10 09:13:06.586 Database Resource[IAM=PHYSICAL]: SetState READ-WRITE-XPTON, phase TEAR-DOWN, External Cond SWITCH-OVER-PHYS_STBY, Target Site Handle 0x02001000

2014-09-10 09:13:06.586 RSM0: Executing SQL [ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL]

2014-09-10 09:13:07.602 SQL [ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL] Executed successfully

2014-09-10 09:13:07.711 RSM0: Executing SQL [ALTER DATABASE RECOVER MANAGED STANDBY DATABASE THROUGH LAST SWITCHOVER NODELAY]

2014-09-10 09:13:09.509 SQL [ALTER DATABASE RECOVER MANAGED STANDBY DATABASE THROUGH LAST SWITCHOVER NODELAY] Executed successfully

2014-09-10 09:13:09.509 RSM0: Executing SQL [ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WAIT WITH SESSION SHUTDOWN]

2014-09-10 09:13:11.345 SQL [ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY WAIT WITH SESSION SHUTDOWN] Executed successfully

2014-09-10 09:13:11.345 Database Resource SetState succeeded

2014-09-10 09:13:11.345 INSV: Reply received for message with

2014-09-10 09:13:11.345 req ID 1.1.746441090, opcode CTL_SWITCH, phase TEARDOWN

2014-09-10 09:13:11.345 02001000 746441090 DMON: Entered rfm_release_chief_lock() for CTL_SWITCH

2014-09-10 09:13:28.555 drcx: Found task req_id=1.1.746441090 for PROBE, but the phase is done

2014-09-10 09:13:43.555 drcx: Found task req_id=1.1.746441090 for PROBE, but the phase is done

2014-09-10 09:13:58.571 drcx: Found task req_id=1.1.746441090 for PROBE, but the phase is done

2014-09-10 09:14:13.590 drcx: Found task req_id=1.1.746441090 for PROBE, but the phase is done

2014-09-10 09:14:28.605 drcx: Found task req_id=1.1.746441090 for PROBE, but the phase is done

2014-09-10 09:14:43.606 drcx: Found task req_id=1.1.746441090 for PROBE, but the phase is done

2014-09-10 09:14:58.622 drcx: Found task req_id=1.1.746441090 for PROBE, but the phase is done

2014-09-10 09:15:13.625 drcx: Found task req_id=1.1.746441090 for PROBE, but the phase is done

2014-09-10 09:15:28.641 drcx: Found task req_id=1.1.746441090 for PROBE, but the phase is done

2014-09-10 09:15:43.646 drcx: Found task req_id=1.1.746441090 for PROBE, but the phase is done

2014-09-10 09:15:58.652 drcx: Found task req_id=1.1.746441090 for PROBE, but the phase is done

2014-09-10 09:16:13.653 drcx: Found task req_id=1.1.746441090 for PROBE, but the phase is done

2014-09-10 09:16:27.606 02001000 746441090 DMON: Data Guard Broker terminating NSV0, timed out waiting for a response from database smart

2014-09-10 09:16:27.825 02001000 746441090 DMON: Database smart returned ORA-16662

2014-09-10 09:16:27.825 02001000 746441090 for opcode = CTL_SWITCH, phase = RESYNCH, req_id = 1.1.746441090

2014-09-10 09:16:42.840 02001000 746441090 DMON: Data Guard Broker terminating NSV0, timed out waiting for a response from database smart

2014-09-10 09:16:42.840 02001000 746441090 DMON: Database smart returned ORA-16662

2014-09-10 09:16:42.840 02001000 746441090 for opcode = CTL_SWITCH, phase = RESYNCH, req_id = 1.1.746441090

2014-09-10 09:16:57.861 02001000 746441090 DMON: Data Guard Broker terminating NSV0, timed out waiting for a response from database smart

2014-09-10 09:16:57.861 02001000 746441090 DMON: Database smart returned ORA-16662

2014-09-10 09:16:57.861 02001000 746441090 for opcode = CTL_SWITCH, phase = RESYNCH, req_id = 1.1.746441090

2014-09-10 09:16:58.705 00001000 746441091 DMON: Entered rfm_get_chief_lock() for HEALTH_CHECK, reason 0

2014-09-10 09:16:58.705 00001000 746441091 DMON: Freeing orphaned task 1.1.746441090, opcode=CTL_SWITCH.

2014-09-10 09:16:58.705 00001000 746441091 DMON: start task execution: automatic healthcheck

2014-09-10 09:16:58.720 00001000 746441091 DMON: Start health check

2014-09-10 09:16:58.720 INSV: Received message for inter-instance publication

2014-09-10 09:16:58.720 00001000 746441091 DMON: status from rfi_post_instances() = ORA-00000

2014-09-10 09:16:58.720 req ID 1.1.746441091, opcode HEALTH_CHECK, phase BEGIN, flags 5

2014-09-10 09:16:58.720 00000000 746441091 DMON: Entered rfmhcexinst

2014-09-10 09:16:58.720 00000000 746441091 DMON: rfmhcexinst calling RSMs

2014-09-10 09:16:58.720 RSM0: Received Get Status Request: rid=0x02012001, sid=4

2014-09-10 09:16:58.720 RSM0: HEALTH CHECK ERROR: ORA-16816: incorrect database role

2014-09-10 09:16:58.720 Warning: the given scn 281474976710655 with resetlogs_id = 851952614 is less than the resetlogs_change# 744657 of the same incarnation.

2014-09-10 09:16:58.736 Current IncarnationTable value is:

2014-09-10 09:16:58.736 2,744657,851952614,1,*1,1,851351631,0,#

2014-09-10 09:16:58.736 RSM0: HEALTH CHECK ERROR: ORA-16700: the standby database has diverged from the primary database

2014-09-10 09:16:58.970 00000000 746441091 Operation HEALTH_CHECK canceled during phase 1, error = ORA-16810

2014-09-10 09:16:58.970 RSM0: Received Get Status Request: rid=0x02031001, sid=1

2014-09-10 09:16:58.970 00000000 746441091 DMON: Standby Instance completed health check

2014-09-10 09:16:58.970 INSV: Reply received for message with

2014-09-10 09:16:58.970 req ID 1.1.746441091, opcode HEALTH_CHECK, phase BEGIN

2014-09-10 09:16:58.970 DMON: HEALTH CHECK ERROR: ORA-16766: Redo Apply is stopped

2014-09-10 09:16:58.970 DMON: After H/C aggregation, db 0x02001000 has severity=16501, status=16810

2014-09-10 09:16:58.970 00000000 746441091 Operation HEALTH_CHECK canceled during phase 1, error = ORA-16810

2014-09-10 09:16:58.970 INSV: Received message for inter-instance publication

2014-09-10 09:16:58.970 req ID 1.1.746441091, opcode HEALTH_CHECK, phase BEGIN, flags 20005

2014-09-10 09:16:58.970 DMON: After H/C aggregation, db 0x02001000 has severity=16501, status=16810

2014-09-10 09:16:58.970 INSV: Reply received for message with

2014-09-10 09:16:58.970 req ID 1.1.746441091, opcode HEALTH_CHECK, phase BEGIN

2014-09-10 09:17:02.017 00000000 746441091 DMON: Entered rfm_release_chief_lock() for HEALTH_CHECK

2014-09-10 09:17:13.724 drcx: could not find task req_id=1.1.746441091 for PROBE.

2014-09-10 09:17:28.730 drcx: could not find task req_id=1.1.746441091 for PROBE.

alert.log primary shows the following:

LNS: Standby redo logfile selected for thread 1 sequence 12046 for destination LOG_ARCHIVE_DEST_2

Wed Sep 10 09:13:01 2014

ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY WITH SESSION SHUTDOWN

Wed Sep 10 09:13:01 2014

Thread 1 advanced to log sequence 12047 (LGWR switch)

Current log# 2 seq# 12047 mem# 0: X:\SMART\LOG1\SMART_21.LOG

Current log# 2 seq# 12047 mem# 1: Y:\SMART\LOG2\SMART_22.LOG

Stopping background process QMNC

Wed Sep 10 09:13:01 2014

ARC1: Evaluating archive log 1 thread 1 sequence 12046

CLOSE: killing server sessions.

Active process 15708 user 'SYSTEM' program 'ORACLE.EXE (W000)'