Infrastructure Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Solaris crash and ZFS problem

807557Apr 12 2008 — edited Apr 13 2008

Hi,

Recently my Solaris10 box panic and crashed a couple of times. After crash I encountered strange error on my ZFS partition. Unfortunately, I haven't enough space for crash dump to be generated. However, here what I've got in messages after boot:

Apr 13 04:31:16 core0 savecore: [ID 570001 auth.error] reboot after panic: BAD TRAP: type=e (#pf Page fault) rp=fffffe800087ea20 addr=10000018 occurred in module "zfs" due to an illegal access to a user address

Then I checked my ZFS and:

# zpool status -vx
  pool: box5
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: none requested
config:
 
        NAME        STATE     READ WRITE CKSUM
        box5        ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c1d0    ONLINE       0     0     0
            c2d0    ONLINE       0     0     0
          mirror    ONLINE       0     0     0
            c2d1    ONLINE       0     0     0
            c1d1    ONLINE       0     0     0
 
errors: Permanent errors have been detected in the following files:
 
        box5:<0x0>

I decided to run scrubbing to get rid of this message which didn't make any sense to me. However scrub revealed even more strange results:

# zpool status -xv
  pool: box5
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress, 1.20% done, 176h37m to go
config:
 
        NAME        STATE     READ WRITE CKSUM
        box5        ONLINE       0     0     4
          mirror    ONLINE       0     0     2
            c1d0    ONLINE       0     0     4
            c2d0    ONLINE       0     0     4
          mirror    ONLINE       0     0     2
            c2d1    ONLINE       0     0     4
            c1d1    ONLINE       0     0     4
 
errors: Permanent errors have been detected in the following files:
 
        box5:<0x0>

As you can see it's quite large partition and it will take another week to complete. However, synchronous checksum errors on all disks are really confusing.

I've Solaris 10U4 (Generic_127112-07) with 127729-07 patch applied which basically address some panic ZFS/NFS problems. I'll post more crash dump analysis as soon as it crashes again (hope it will not happen though).

Any ideas? What is 0x0? Why I've got similar checksum errors on all disks? How to fix all that mess?

--
Rustam

Locked Post

New comments cannot be posted to this locked post.

Locked on May 11 2008

Added on Apr 12 2008

#oracle-solaris, #solaris-10

1 comment

1,199 views