Skip to Main Content

Infrastructure Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

When do bad blocks make it onto the grown defect list?

807559Jul 25 2006 — edited Aug 2 2006
We have one Ultra320 Maxtor disk in a 4 disk stripe set that decided to start throwing hard read errors associated with 2 consecutive blocks. The disk is apparently not swapping these out for spare blocks or putting them on the "grown" defect list that format -> defects shows. Should it? I'm not clear about when Solaris does this, or if the disk should do it all by itself.

This is on a Sun dual ultra320 scsi controller. There are only disks on this channel.

fsck run on the mounted stripe set and it didn't pick up this bad block and
didn't cause it to appear on the grown list.

Ismartctl -t long run on the underlying disk with the bad block picked up the problem block and reported (with a subsequent smartctl -a -d scsi that
the it had failed in segment
seg#=2, LBA_first_err=10cfbbd0, sk=0x3, asc=0x11, asq=0,0
but, you guessed it, it still didn't go onto the grown list.

Currently I'm running format-> analyze with the block transfer set to 1
(not the default 126, more about that below) with READ. Hopefully
that will finally map this block out. If not, I guess I'll have to add it manually but that seems so, um, primitive.

Anyway, I figured out which file was affected by running:

sum 'filename'

on every filename emitted by: find /vol02. Then I tried copying the
affected file to another location. Sure, it's got two bad blocks, but its
2^32 + 8096 bytes, and I'd like to keep the rest of it. Supposedly this
should have done it:

dd if=infile of=outfile conv=noerror,sync

Unfortunately, no. It started throwing the hard read errors many blocks upstream of the bad block but indicating the same bad block, these show up in /var/adm/messages with the initial block read and the failed block. I think the scsi driver is doing some sort of cluster block read for performance reasons. After it logged a bunch of those it stopped. It never did get past the bad block. I tried adding bs=512 (which should have been the default) but no joy, it still failed to transfer the data.

So that's why I'm running the analyze with a transfer size of 1 block. Hopefully this gets all the way down to the SCSI driver, and eventually sets the block in the grown list, but we'll see.

Thanks
Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Aug 30 2006
Added on Jul 25 2006
3 comments
581 views