Thread: Single point of failure


Permlink Replies: 4 - Pages: 1 - Last Post: Sep 5, 2007 8:25 PM Last Post By: S Ashok
user588484

Posts: 3
Registered: 08/04/07
Single point of failure
Posted: Aug 4, 2007 5:54 AM
Click to report abuse...   Click to reply to this thread Reply
Hi

does ASM realy eliminate the single point of failure?


Regards.

Keyur Patel

Posts: 265
Registered: 08/09/03
Re: Single point of failure
Posted: Aug 4, 2007 3:28 PM   in response to: user588484 in response to: user588484
Click to report abuse...   Click to reply to this thread Reply
What you mean by Single Point of Failure in your context ? Point of failure is depend on How you configure your ASM and disk groups. Please explain your thoughts more.

Thanks

~Keyur
Christopher Cur...

Posts: 8
Registered: 08/27/07
Re: Single point of failure
Posted: Aug 27, 2007 9:17 AM   in response to: Keyur Patel in response to: Keyur Patel
Click to report abuse...   Click to reply to this thread Reply
I have a question, but rather than start a new thread, I thought my question might be most closely related to this thread, because I'm trying to relate Oracle's "single point of failure" guarantee to disk arrays.

I have been involved with Oracle since version 6 and am very comfortable with the architecture (the relationship of control files, redo log files and data files) which guarantees the recovery of all committed transactions after any single point of failure.

Here's the question:

We are building a data warehouse (est 2TB in size) a genuinely read-only application except for the nightly loads. This is single instance Oracle. The storage array consists of 16 disks, each 500 GB, giving 8TB total physical space.

We have not configured the disk array yet. I'm picking up this task from the previous DBA and reviewing his design which reserves 2 disks for hot spares, mirrors the remainder as RAID 1+0, and then gathers these into a single logical volume, giving 3.5TB of useable space (excluding the hot spares). His design is

Disk 1 thru 14 (500 GB each x 14, RAID 1+0 = 3.5TB useablespace), single logical volume holding:
control01.ctl
control02.ctl
control03.ctl
redo logs
all tablespaces (SYSTEM, SYSAUX, USERS, TEMP, etc.)
Disk 15: hot spare
Disk 16: hot spare

My concern is about the single logical volume holding ALL the control files and ALL the redo log files. Being trained with Oracle 6, I learned to make sure that the three control files were on physically separate drives, and that the redo logs were mirrored across two separate disk drives. But with a single huge logical volume, how can you tell that this has been achieved?

The problem that I foresee in the current storage design is that the logical volume becomes the singe point of failure and that the data warehouse will be unuseable and unrecoverable if ANY of the 14 disks in the single logical volume fails. Yes, the hot spares are supposed to kick in, but this still means that I am relying completely on the disk array to keep the data warehouse running and that Oracle itself will no longer have any control.

I would consider something like this instead:

Disk 1 (500GB), not mirrored:
control01.ctl, redo logs copy 1
Disk 2 (500GB), not mirrored:
control02.ctl, redo logs copy 2,
Disk 3 thru 14 (500 GB each x 12, RAID 1+0 = 3TB useablespace), single logical volume holding:
control03.ctl
all tablespaces (SYSTEM, SYSAUX, USERS, TEMP, etc.)
Disk 15: hot spare
Disk 16: hot spare

Please comment on what has happened to Oracle's recommended disk configurations and guarantee of recovery in this day and age where, it seems, we literally put all of our data eggs into one large data basket. Is it now standard practice to do this? Or would my proposed disk layout be safer? Is there a MetaLink doc I should read?

-- Chris Curzon

Marc Musette

Posts: 461
Registered: 09/18/01
Re: Single point of failure
Posted: Sep 5, 2007 4:45 AM   in response to: Christopher Cur... in response to: Christopher Cur...
Click to report abuse...   Click to reply to this thread Reply
diskgroup is a SPOF if you omit that :
- ASM provides logical redundancy through FAILURE GROUPS
- storage array provides RAID

IMHO, major problem is that you implement storage capabilities through a piece of software ... and thus bugs and/or limitations are not far

and when you know that if ASM instance crashes , all the database storing data via the ASM instance also crash ...

another example, the fact that you can not change the redundancy of a diskgroup online could be problematic if you got all your eggs in the same basket

I think ASM is just the most affordable way to do RAC ; not the best (like CRS for the clusterware will say mys sysadmin colleagues). For sure, you will not use ASM in a single instance instead of a regular file system , except you do not have $$$ to buy a RAID controller

Marc
S Ashok

Posts: 381
Registered: 12/13/98
Re: Single point of failure
Posted: Sep 5, 2007 8:25 PM   in response to: Marc Musette in response to: Marc Musette
Click to report abuse...   Click to reply to this thread Reply
I prefer ASM to use external redundancy and database to enjoy ASM's stripping, rebalancing and adding/removing disks features.

Ashok
Legend
Guru Guru : 2500 - 1000000 pts
Expert Expert : 1000 - 2499 pts
Pro Pro : 500 - 999 pts
Journeyman Journeyman : 200 - 499 pts
Newbie Newbie : 0 - 199 pts
Oracle ACE Director
Oracle ACE Member
Oracle Employee ACE
Helpful Answer (5 pts)
Correct Answer (10 pts)

Point your RSS reader here for a feed of the latest messages in all forums