Hi all,
Recently I came across the below scenario with respect to an 11gR2 RAC to RAC Dataguard configuration problem. I would welcome some input on the issue in order to get a better understanding of why it had the impact it did please?
A colleague and I were performing some testing of RAC to RAC Dataguard services failover in the last few days. We hit an issue whereby a switchover operation from the primary DB to the standby DB would fail intermittently. Fail as in, the corresponding standy instances would not re-start up, and the clusterware role configuration would not properly reverse. We had to manually intervene in order to correct.
The errors in the drc* and alert* logs from both sides were quite sparse but we would note the error - "Listener does not currently know of SID given in connection descriptor".
Our dgmgrl Dataguard broker configuration was configured to connect to the primary RAC database via a TNS alias for ‘DB11PRM’ and connect to the standby RAC database via a TNS alias for ‘DB11SBY’. Connectivity from all nodes on both primary and standby works successfully when manually checked via sqlplus...
DGMGRL> show database 'DB11PRM' DGConnectIdentifier;
DGConnectIdentifier = 'db11prm'
DGMGRL> show database 'DB11SBY' DGConnectIdentifier;
DGConnectIdentifier = 'db11sby'
What we we believe to be the case is that the Broker functionality was not able to work properly (i.e. successfully perform a switchover operation in the direction required) when connecting to the DB service name with the defined TNS string alias(for either primary/standby) if it included some of the failover/load balancing features of Oracle Net.
i.e. The first connection string below looks good for usage but the second connection string resolved is the issue.
DB11PRM1:dbrdlprd01:oracle$ tnsping DB11PRM | grep Attempt
Attempting to contact (DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=dbrdlprd-scan)(PORT=1522)))(CONNECT_DATA=(SERVICE_NAME=DB11PRM)))
DB11PRM1:dbrdlprd01:oracle$ tnsping DB11SBY | grep Attempt
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = dbrdlsby-scan)(PORT = 1522)) (LOAD_BALANCE = YES) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = DB11SBY) (FAILOVER_MODE = (TYPE = SELECT)(METHOD = BASIC)(RETRIES = 180)(DELAY = 5))))
By making both connection string resolve to the first listed format above - all switchover operations succeeded (and were completely clean of errors).
My question is - How are the load balance/failover parameters in the second connect string negatively impacting the broker and its work? And has anyway else encountered such a scenario?
Regards