Failover - How to achieve a transparent failover using SQLPlus
760996Jul 8 2010 — edited Jul 8 2010 AIX 5.3 Oracle Clusterware 10.2.0.4.0 Oracle Enterprise Edition 10.2.0.4.0
This is the behavior I see from an Oracle Client session which is to be expected if I read RAC: Frequently Asked Questions [ID 220970.1] see below
(1) SQLPlus session connected to NodeA
(2) NodeA - Clusterware services stopped
(3) NodeA-vip has failed over to Node B
(4) SQLPlus session receives an error
(5) SQLPlus establish new connection to NodeA-vip
My question is how is a transparent SQLPLus session failover achieved as illustrated in [ID 339107.1] see below
*** Dedicated Connections to a Migrated VIP Can Lose their Connection after the VIP is Switched Back [ID 339107.1] ***
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
rac11g1
$ crsctl stop crs
SQL> /
INSTANCE_NAME
----------------
rac11g2
RAC: Frequently Asked Questions [ID 220970.1]
*** Why do we have a Virtual IP (VIP) in Oracle RAC 10g or 11g? Why does it just return a dead connection when its primary node fails? ***
The goal is application availability.
When a node fails, the VIP associated with it is automatically failed over to some other node. When this occurs, the following things happen.
(1) VIP detects public network failure which generates a FAN event.
(2) the new node re-arps the world indicating a new MAC address for the IP.
(3) connected clients subscribing to FAN immediately receive ORA-3113 error or equivalent. Those not subscribing to FAN will eventually time out.
(4) New connection requests rapidly traverse the tnsnames.ora address list skipping over the dead nodes, instead of having to wait on TCP-IP timeouts
Without using VIPs or FAN, clients connected to a node that died will often wait for a TCP timeout period (which can be up to 10 min) before getting an error.
As a result, you don't really have a good HA solution without using VIPs and FAN. The easiest way to use FAN is to use an integrated client with Fast Connection Failover (FCF) such as JDBC, OCI, or ODP.NET.
*** What do the VIP resources do once they detect a node has failed/gone down? Are the VIPs automatically acquired, and published, or is manual intervention required? Are VIPs mandatory? ***
With Oracle RAC 10g or higher, each node requires a VIP. With Oracle RAC 11g Release 2, 3 additional SCAN vips are required for the cluster. When a node fails, the VIP associated with the failed node is automatically failed over to one of the other nodes in the cluster. When this occurs, two things happen:
1. The new node re-arps the world indicating a new MAC address for this IP address. For directly connected clients, this usually causes them to see errors on their connections to the old address;
2. Subsequent packets sent to the VIP go to the new node, which will send error RST packets back to the clients. This results in the clients getting errors immediately.
In the case of existing SQL conenctions, errors will typically be in the form of ORA-3113 errors, while a new connection using an address list will select the next entry in the list. Without using VIPs, clients connected to a node that died will often wait for a TCP/IP timeout period before getting an error. This can be as long as 10 minutes or more. As a result, you don't really have a good HA solution without using VIPs.
With Oracle RAC 11g Release 2, you can delegate the management of the VIPs to the cluster. If you do this, the Grid Naming Service (part of the Oracle Clusterware) will automatically allocated and manage all VIPs in the cluster. This requires a DHCP service on the public network.
Thank you
Steve