Skip to Main Content

Infrastructure Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Very high amount of chksum failures?

Scott S.May 15 2016 — edited May 23 2016

This is a new one for me... Most of these vdevs are new (some brand new). All have hotswap behind an icy dock 4 bay (x2) the rpool has 2 SSDs also hotswapable. For some reason during some disk operations 2 vdevs (2x HDD) failed in the data pool, and one of the rpool SSDs too.

# fmdump -e | wc -l

   75914

# fmdump -e | grep "ereport.fs.zfs." | wc -l

   75277

# fmdump -e | grep "ereport.fs.zfs.checksum" | wc -l

   74718

NAMESTATEREADWRITECKSUM
   data                    DEGRADED000
     raidz2-0              DEGRADED000
       c0t50004CF210AD1C22d0ONLINE  000
       spare-1             DEGRADED00 249
         c0t50004CF210BE51F1d0  DEGRADED00070
         c4t0d0            ONLINE  000
       spare-2             DEGRADED102
         c0t50004CF210BE51F3d0  UNAVAIL 000
         c4t1d0            ONLINE  000
       c0t50004CF210BE5214d0ONLINE  000
       c5t3d0              ONLINE  000
       c4t3d0              ONLINE  000
   spares
     c4t1d0                INUSE 
     c4t0d0                INUSE 

NAMESTATEREADWRITECKSUM
   rpool                 DEGRADED000
     mirror-0            DEGRADED000
       c0t500A0751F0096E9Ed0  DEGRADED0001.00K
       c0t500A0751F0097DA7d0  ONLINE  0000

codes in listed order were

ZFS-8000-D3

ZFS-8000-GH

ZFS-8000-GH

Question is how much more info can I extract to find out why these new devices failed with so many checksum errors? How can I diagnose where the problem lies? In case the drives could be fine from a physical aspect but maybe some bug causing the issue? Perhaps dtrace could assist. I'll change my pool setup in the meantime. Some reason on removing a vdev after detaching from the pool no commands output and ssh times out??

# format

Searching for disks...

^C

hangs... and other commands do as well. Why would that be? Yet I am able to ping and reply is instant. ssh with verbose -vvv stuck on "debug1: Local version string ..." am I missing something obviously wrong or silly? (vdev names vary due to 4 being in LSI SAS 3008 Controller, other 4 are direct in motherboard), I'll plug all into the LSI SAS after.

Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Jun 20 2016
Added on May 15 2016
7 comments
1,413 views