Thread: Nics and Switch - single point of failure ?


Permlink Replies: 14 - Pages: 1 - Last Post: Nov 6, 2007 10:56 AM Last Post By: gorbyx
krethp

Posts: 10
Registered: 03/06/00
Nics and Switch - single point of failure ?
Posted: Sep 11, 2007 8:12 AM
Click to report abuse...   Click to reply to this thread Reply
Hi all
we have installed our RAC on two Dell Poweredge 1855 Blades.
These blades have a limitaitton on two nics. The blades are in a chassis with two embedded switches.
Now our problem is: is it possible to configure nics and switches so that the switches will not be a single point of failure ?
We have tried different configurations but we did not see any solution.
My opinion is that three nics are needed to achieve intracluster redundancy with two switches so that this hardware is not a good solution.
What do You think about ?
Regards
Paolo
Dan_Norris

Posts: 159
Registered: 08/15/07
Re: Nics and Switch - single point of failure ?
Posted: Sep 11, 2007 10:25 AM   in response to: krethp in response to: krethp
Click to report abuse...   Click to reply to this thread Reply
I've done a few RAC installs on blade chassis and for the cases where we only had 2 NICs, we just resolved that a NIC failure is essentially a node failure. Everything will work fine, but you have to be comfortable that a network failure will basically disable that node. I'm definitely a fan of lots of redundancy and RAC environments are usually critical enough to require a lot of redundancy. However, I can't remember the last time I saw a NIC fail...

Dan
S Ashok

Posts: 381
Registered: 12/13/98
Re: Nics and Switch - single point of failure ?
Posted: Sep 11, 2007 6:58 PM   in response to: Dan_Norris in response to: Dan_Norris
Click to report abuse...   Click to reply to this thread Reply
Yes. Oracle suggests to have redundancy for nic and switch, two addresses per nic. It should be possible to configure. Network person can help on this.

Ashok
krethp

Posts: 10
Registered: 03/06/00
Re: Nics and Switch - single point of failure ?
Posted: Sep 11, 2007 11:30 PM   in response to: S Ashok in response to: S Ashok
Click to report abuse...   Click to reply to this thread Reply
Hi all
thank You for Your replies.
Indeed i'm not fearing a nic failure, much more a switch failure.
I explain.
Recently we had to update the drivers of the switches in th chassis. We had to ask for a downtime, as isolating the switch would have meant to stop or public network or private interconnect.
Therefore I think that a configuration two switches two nics is not fault tolerant.
Here we have a big discussion on this problem.
Our system and our network engineers are trying different configurations but I don't think they will arrive to avoid this problem.
I hope we will migrate on hardware with three nics in this wat the problem should be solved.
Paolo
goranbg

Posts: 129
Registered: 08/11/03
Re: Nics and Switch - single point of failure ?
Posted: Sep 12, 2007 12:52 AM   in response to: krethp in response to: krethp
Click to report abuse...   Click to reply to this thread Reply
as you already concluded for interconnect redundancy you would need two NIC dedicated to interconnect and bonded.
And as Dan already mentioned I can't remember when I saw NIC failing last time...but have Murphy laws always in mind :-)
if you plan to migrate to new hardware as you mentioned, would be wise to configure bonding for both public and private network.
krethp

Posts: 10
Registered: 03/06/00
Re: Nics and Switch - single point of failure ?
Posted: Sep 12, 2007 12:55 AM   in response to: goranbg in response to: goranbg
Click to report abuse...   Click to reply to this thread Reply
Yes bonding on the interconnect switches is the exact configuration I had in mind.
OK thank You all for the good advices.
Paolo
user602772

Posts: 3
Registered: 10/29/07
Re: Nics and Switch - single point of failure ?
Posted: Oct 29, 2007 10:01 AM   in response to: goranbg in response to: goranbg
Click to report abuse...   Click to reply to this thread Reply
We have an Oracle RAC 2 node cluster with a single NIC (i.e point of failure) for interconnect. We like more reduncancy.

A you mentioned the interconnect redundancy you would need two NIC dedicated to interconnect and bonded.

Our network person does not know how to setup (2) NIC and (2) Switches for bonding public and private network .. could to give me a call to discuss
Oliver
(818) 266-6444 cell
ChandraP

Posts: 1,315
Registered: 06/04/06
Re: Nics and Switch - single point of failure ?
Posted: Oct 29, 2007 11:03 AM   in response to: user602772 in response to: user602772
Click to report abuse...   Click to reply to this thread Reply
Oliver,

Bonding would be OS specific. For example, check ML Note;291958.1 for setting up bonding on SLES.

Thanks
Chandra
user602772

Posts: 3
Registered: 10/29/07
Re: Nics and Switch - single point of failure ?
Posted: Oct 29, 2007 11:21 AM   in response to: ChandraP in response to: ChandraP
Click to report abuse...   Click to reply to this thread Reply
we have HP-UX 11.23 OS .. w/ only ORACLE clusterware
1 NIC for public
1 NIC for private
ChandraP

Posts: 1,315
Registered: 06/04/06
Re: Nics and Switch - single point of failure ?
Posted: Oct 29, 2007 11:49 AM   in response to: user602772 in response to: user602772
Click to report abuse...   Click to reply to this thread Reply
Since it is HP-UX, it is called APA (Auto Port Aggregation). You may like to have your sys admin go over the following docs to get more information on this:

http://docs.hp.com/en/J4240-90035/ch01.html
http://docs.hp.com/en/J4240-90037/J4240-90037.pdf

HTH

Thanks
Chandra Pabba

jefcen

Posts: 46
Registered: 12/16/98
Re: Nics and Switch - single point of failure ?
Posted: Nov 2, 2007 12:52 PM   in response to: ChandraP in response to: ChandraP
Click to report abuse...   Click to reply to this thread Reply
Has anyone actually tried to pull the cable(s) on their public net to test if the VIP fails over as documented? I have had an SR on this open for months.

Jeff
Chris Slattery

Posts: 1,571
Registered: 08/12/04
Re: Nics and Switch - single point of failure ?
Posted: Nov 3, 2007 6:48 AM   in response to: jefcen in response to: jefcen
Click to report abuse...   Click to reply to this thread Reply
I did a poweroff [ I'm using HP blades at the moment so can't pull cables ] and it fails over after 90secs-2 mins as advertised. No real issues , clients figure it out eventually except in some cases .

I, too have a TAR open in this area [ for those Some cases, only TAF-SELECT types ]

Message was edited by:
Chris slattery
gorbyx

Posts: 226
Registered: 10/03/00
Re: Nics and Switch - single point of failure ?
Posted: Nov 3, 2007 11:21 PM   in response to: jefcen in response to: jefcen
Click to report abuse...   Click to reply to this thread Reply
We also test that on every install but usually by disabling ports on a switch.

What exactly the problem you are experiencing?
jefcen

Posts: 46
Registered: 12/16/98
Re: Nics and Switch - single point of failure ?
Posted: Nov 6, 2007 7:34 AM   in response to: gorbyx in response to: gorbyx
Click to report abuse...   Click to reply to this thread Reply
We pull the cable on the public NIC, and the VIP never fails over. This is on Solaris 10.

Jeff
gorbyx

Posts: 226
Registered: 10/03/00
Re: Nics and Switch - single point of failure ?
Posted: Nov 6, 2007 10:56 AM   in response to: jefcen in response to: jefcen
Click to report abuse...   Click to reply to this thread Reply
We pull the cable on the public NIC, and the VIP
never fails over. This is on Solaris 10.

When you pull the interface cable out on node1 can you still ping from this node itself (node1) and its VIP (node1-vip)?

Could you paste output from "crs_stat -p <vip_resource_name>"?
(vip_resource_name is like "ora.node1.vip")

I want to see if VIP monitoring interval is not 0 (in this case it's not monitored).

If not, than I would check logs into $ORA_CRS_HOME/logs/<nodename>/racg/ora.<nodename>.vip.log

I doubt you will see lots of details but you can enable debug tracing by setting USR_ORA_DEBUG=1 (what you see in crs_stat -p is USR_ORA_DEBUG=0 be default).
Legend
Guru Guru : 2500 - 1000000 pts
Expert Expert : 1000 - 2499 pts
Pro Pro : 500 - 999 pts
Journeyman Journeyman : 200 - 499 pts
Newbie Newbie : 0 - 199 pts
Oracle ACE Director
Oracle ACE Member
Oracle Employee ACE
Helpful Answer (5 pts)
Correct Answer (10 pts)

Point your RSS reader here for a feed of the latest messages in all forums