Recently i replaced a HDD on a Backup-Server on a customer site with a bigger one. Its a freenas box running on a HP SE326M1*.
While ZFS was replacing the drive a major power outage occured.
Was no big Problem - the box started again and the resilver too.
Today i found something interesting: another drive got the resilvering status too. But there are no read/write/cksum-errors in zpool status -v
pool: Tank state: ONLINE
status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state.
action: Wait for the resilver to complete. scan: resilver in progress since Sat Mar 4 15:37:53 2017 3.41T scanned out of 4.51T at 24.7M/s, 13h3m to go 302G resilvered, 75.43% done
config: NAME STATE READ WRITE CKSUM Tank ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 gptid/ad81fb87-96b5-11e6-af9c-d8d385e6539a ONLINE 0 0 0 gptid/b30f664d-96b5-11e6-af9c-d8d385e6539a ONLINE 0 0 0 gptid/09b6b651-ff29-11e6-9a6a-d8d385e6539a ONLINE 0 0 0 (resilvering) <-- drive was replaced gptid/beb62f74-96b5-11e6-af9c-d8d385e6539a ONLINE 0 0 0 gptid/c2c97ff3-96b5-11e6-af9c-d8d385e6539a ONLINE 0 0 0 raidz2-1 ONLINE 0 0 0 gptid/41b1a629-9785-11e6-894c-d8d385e6539a ONLINE 0 0 0 gptid/4756cd8e-9785-11e6-894c-d8d385e6539a ONLINE 0 0 0 gptid/508e74c7-9785-11e6-894c-d8d385e6539a ONLINE 0 0 0 gptid/575939b3-9785-11e6-894c-d8d385e6539a ONLINE 0 0 0 (resilvering) <-- drive in question gptid/5d914967-9785-11e6-894c-d8d385e6539a ONLINE 0 0 0
errors: No known data errorsThe status "(resilvering)" appeared in the last hours - while the resilver of the first drive was in progress.
Is there any way to determie why the second drive is resilvered too? Why ZFS is showing the resilver status on gptid/575939b3-9785-11e6-894c-d8d385e6539a regardless of errors shown in zpool status?
--
*The HP-Server has a P410 raid controller in write-throug-mode. Every drive is configured as its own Raid-0
61 Answer
It happend again and this time i have the logs. it turns out that user121391 was totally right and the drive in question had an "unretryable" error.
Mar 12 08:22:14 freenas ciss0: *** Fatal drive error, Port=1I Box=1 Bay=14
Mar 12 08:22:14 freenas ciss0: (da9:ciss0:0:9:0): READ(10). CDB: 28 00 1b 50 ff 98 00 00 08 00
Mar 12 08:22:14 freenas FATAL I/O ERROR on logical drive 9 (), SCSI port 0 ID 21
Mar 12 08:22:14 freenas (da9:ciss0:0:9:0): CAM status: SCSI Status Error
Mar 12 08:22:14 freenas (da9:ciss0:0:9:0): SCSI status: Check Condition
Mar 12 08:22:14 freenas (da9:ciss0:0:9:0): SCSI sense: MEDIUM ERROR asc:11,0 (Unrecovered read error)
Mar 12 08:22:14 freenas (da9:ciss0:0:9:0): Error 5, Unretryable errorIt seems that the raid-controller reattached the drive after this error and this triggered the resilver.
2