Maxtor OneTouch goes ofline after a week
807557Mar 15 2009 — edited Mar 17 2009Hi,
I am using a Maxtor OneTouch 750GB USB disk as a backup device on an Ultra 60 which I am using as a destination for ZFS snapshots from another system.
The Ultra 60 is running Solaris 10 10/08.
I am using a Belkin PCI/USB card (F5U508).
The system has been fine for the past 2 weeks and each day for the last week I have been sending incremental snapshots e.g.
ptime zfs send -R -i @daily-13-03-2009 rpool/export/home@daily-14-03-2009|ssh root@host2 zfs recv -F -d backup/new
Then today I sent another and it just seemed to hang so I killed it after seeing a problem with the destination pool:-
host1{root}68: ptime zfs send -R -I @090306 rpool/export/projects@daily-13-03-2009|ssh root@host2 zfs recv -F -d backup
^CKilled by signal 2.
real 18:56.687
user 0.007
sys 0.019
Looking at the messages file on the destination server host2 I and see the drive has gone off-line:-
host2{rich}44: cat /var/adm/messages
Mar 15 16:28:38 host2 scsi: [ID 107833 kern.warning] WARNING: /pci@1f,2000/usb@1,3/storage@3/disk@0,0 (sd30):
Mar 15 16:28:38 host2 Command failed to complete...Device is gone
Mar 15 16:34:09 host2 fmd: [ID 441519 daemon.error] SUNW-MSG-ID: ZFS-8000-FD, TYPE: Fault, VER: 1, SEVERITY: Major
Mar 15 16:34:09 host2 EVENT-TIME: Sun Mar 15 16:34:07 GMT 2009
Mar 15 16:34:09 host2 PLATFORM: sun4u, CSN: -, HOSTNAME: host2
Mar 15 16:34:09 host2 SOURCE: zfs-diagnosis, REV: 1.0
Mar 15 16:34:09 host2 EVENT-ID: c8d7d6b2-0627-4353-f65e-83d992d26b2b
Mar 15 16:34:09 host2 DESC: The number of I/O errors associated with a ZFS device exceeded
Mar 15 16:34:09 host2 acceptable levels. Refer to http://sun.com/msg/ZFS-8000-FD for more information.
Mar 15 16:34:09 host2 AUTO-RESPONSE: The device has been offlined and marked as faulted. An attempt
Mar 15 16:34:09 host2 will be made to activate a hot spare if available.
Mar 15 16:34:09 host2 IMPACT: Fault tolerance of the pool may be compromised.
Mar 15 16:34:09 host2 REC-ACTION: Run 'zpool status -x' and replace the bad device.
host2{rich}43: zpool status
pool: backup
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: http://www.sun.com/msg/ZFS-8000-8A
scrub: none requested
config:
NAME STATE READ WRITE CKSUM
backup ONLINE 80 894K 0
c2t0d0 ONLINE 80 894K 0
errors: 281216 data errors, use '-v' for a list
The zpool status now hangs.
rmformat now takes ages to return no devices :-(
Last week when the server was re-booted the zpool can on-line OK:-
Mar 11 16:53:04 host2 pcipsy: [ID 370704 kern.info] PCI-device: usb@2,3, ehci0
Mar 11 16:53:04 host2 genunix: [ID 936769 kern.info] ehci0 is /pci@1f,4000/usb@2,3
Mar 11 16:53:04 host2 rootnex: [ID 349649 kern.info] pcipsy1 at root: UPA 0x1f 0x2000
Mar 11 16:53:04 host2 genunix: [ID 936769 kern.info] pcipsy1 is /pci@1f,2000
Mar 11 16:53:05 host2 pcipsy: [ID 370704 kern.info] PCI-device: usb@1,3, ehci1
Mar 11 16:53:05 host2 genunix: [ID 936769 kern.info] ehci1 is /pci@1f,2000/usb@1,3
Mar 11 16:53:06 host2 pcipsy: [ID 370704 kern.info] PCI-device: usb@2, ohci0
Mar 11 16:53:06 host2 genunix: [ID 936769 kern.info] ohci0 is /pci@1f,4000/usb@2
Mar 11 16:53:06 host2 pcipsy: [ID 370704 kern.info] PCI-device: usb@2,1, ohci1
Mar 11 16:53:06 host2 genunix: [ID 936769 kern.info] ohci1 is /pci@1f,4000/usb@2,1
Mar 11 16:53:06 host2 usba: [ID 912658 kern.info] USB 2.0 device (usbd49,7310) operating at hi speed (USB 2.x) on USB 2.0 r
oot hub: storage@3, scsa2usb0 at bus address 2
Mar 11 16:53:06 host2 usba: [ID 349649 kern.info] Maxtor OneTouch 2HA1NDTV
Mar 11 16:53:06 host2 genunix: [ID 936769 kern.info] scsa2usb0 is /pci@1f,2000/usb@1,3/storage@3
Mar 11 16:53:06 host2 genunix: [ID 408114 kern.info] /pci@1f,2000/usb@1,3/storage@3 (scsa2usb0) online
Mar 11 16:53:06 host2 scsi: [ID 193665 kern.info] sd30 at scsa2usb0: target 0 lun 0
Mar 11 16:53:06 host2 genunix: [ID 936769 kern.info] sd30 is /pci@1f,2000/usb@1,3/storage@3/disk@0,0
Mar 11 16:53:06 host2 genunix: [ID 340201 kern.warning] WARNING: Page83 data not standards compliant Maxtor OneTouch 0125
Mar 11 16:53:06 host2 genunix: [ID 408114 kern.info] /pci@1f,2000/usb@1,3/storage@3/disk@0,0 (sd30) online
Mar 11 16:53:06 host2 pcipsy: [ID 370704 kern.info] PCI-device: usb@1, ohci2
Am I missing a patch for USB or ZFS which will fix this problem?
I am assuming the drive is OK but the system is remote and I will have to wait till the morning to call the office.
I could reboot the Ultra 60 and see if the zpool comes back to life but I wanted to see if anyone else had seen this problem.
Cheers
Richard