Skip to Main Content

Infrastructure Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

zpool import hangs

mikemncMar 2 2024

I suffered a complete pool failure – the fan in the array chassis stopped working and all drives overheated. After cooling down, all drives checked out ok using seagate's seatools test utility and smartctl showed them ok. I was also able to read directly from each device and the system sees each device.

But… when I tried to bring the pool back online, it's status was UNAVAIL. This is a raidz1 pool with 4 total drives. The error I was getting stated:

zpool cannot import pool one or more devices is currently unavailable

and zpool status showed:

NAME STATE READ WRITE CKSUM
pool_01 UNAVAIL 0 0 1
raidz1-0 DEGRADED 0 0 4
c0t5000C500A24AD833d0 ONLINE 0 0 4
c0t5000C500A232AFA6d0 ONLINE 0 0 4
3743539469189005045 UNAVAIL 0 0 0
c0t5000C500A243C8DEd0 ONLINE 0 0 4
logs
mirror-1 ONLINE 0 0 0
c0t5F8DB4C095690612d0s0 ONLINE 0 0 0
c0t5F8DB4C095691282d0s0 ONLINE 0 0 0

zpool clear would clear the checksums but not make the pool available. I'd seen similar behavior before and exporting the pool and re-importing it would fix the problem. That was not the case this time. This time, the system refused to import the pool. I could import it with -F, but that brought it back in the same unusable state.

One odd thing… if you look at the output above, there is a device called “3743539469189005045” – I have no idea what this is or why it would be associated with the pool. The actual device that should be listed there is “c0t5000C500A22D9330d0”, which is available and can be directly read from.

Ignoring that for a minute, I figured, this is a raidz1 pool, so I should be able to import it with 3 “ONLINE” drives. Still no. So, in trying everything I could think of, I ran this command:

zpool import -FX pool_01

I'm not sure where I got the “-X" from… it was in my notes, but I can't find any reference to it in the man pages. And, Google isn't much help because now every search about ZFS brings up Linux or FreeBSD instead of Solaris.

The pool consisted of 4x 6TB drives. So, it should be trying to import three of those drives for a total of 18TB of data if it had to scan every block.

The import has been running for 14 days. It's not producing any errors and I can see read/write activity on each of the drives. In general, iostat looks like this:

extended device statistics
r/s w/s Mr/s Mw/s wait actv wsvc_t asvc_t %w %b device
210.3 70.1 6.7 0.1 0.0 0.2 0.0 0.6 0 15 c0t5000C500A232AFA6d0
338.5 129.2 8.1 0.1 0.0 0.4 0.0 0.8 0 28 c0t5000C500A24AD833d0
210.3 70.1 6.7 0.1 0.0 0.2 0.0 0.6 0 13 c0t5000C500A243C8DEd0

One drive is consistently busier than the other two, but all of the stats change every second, which leads me to believe it's doing something.

If I try to truss the import PID, truss just hangs. The zpool-pool_01 process respawns every second and apparently isn't doing anything since it dies too fast to even get a trace, plus I have no idea what the PID will be.

My question at this point is, what does the -X option do? Before I used it, the import would either fail immediately, or import in an UNAVAIL state. Now, the import is running indefinitely. I don't mind letting it run if it's actually doing something and there's a chance of a successful import.

Can I stop it without doing more damage? Is there another way to get this pool online? I'm still confused as to how a 4-disk pool with 3 available disks can fail to come online.

Is there a way to get the 3743539469189005045 device remapped to the actual device (c0t5000C500A22D9330d0), and would that help the pool correct itself?

The system is Solaris x64 11.4.27.0.1.82.1

I really could use some help here. This array had a lot of data on it that, by my own failure, wasn't backed-up. Critical data was, but still, losing so much non-critical data is going to be painful. 18TB is a lot to back-up and my resources were limited from a backup perspective. Anyway, lesson learned there.

Any help would be greatly appreciated.

Comments
Post Details
Added on Mar 2 2024
0 comments
740 views