The resilver went as planned, replacing one drive after another serially, but taking care to only do one drive of a pair at a time. Near the end, I started to get greedy. With 98% done on one resilver, I detached a drive in another mirrored pair on the same volume, planning on at least placing the new drive into the chassis so I could start the final drive resilver remotely. To my surprise, the resilver restarted from scratch, so I had another 24 hours of delay to go. So, any ZFS drive removals will reset in progress scrubs/resilvers!
I then decided just to go ahead with the second resilver. This is where it got really strange. The two mirrored pairs started to resilver, and the speed was seemingly faster. After 12 hours, both pairs had about 400GB resilvered and the status of the volume indicated it was 100% done and was finishing. Hours later, it was still at 100%, but the resilver counter per drive kept climbing. Finally, after the more typical 24 hours or so, it noted it was completed.
pool: data
state: ONLINE
scrub: resilver completed after 26h39m with 0 errors on Tue Nov 24 22:33:46 2009
config:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t1d0 ONLINE 0 0 0
c2t0d0 ONLINE 0 0 0
mirror ONLINE 0 0 0
c2t3d0 ONLINE 0 0 0
c2t2d0 ONLINE 0 0 0 783G resilvered
mirror ONLINE 0 0 0
c2t5d0 ONLINE 0 0 0
c2t4d0 ONLINE 0 0 0 781G resilvered
Yes, it looks like at least with this B104+ kernel in NexentaStor, the resilver counters lie. When you have two ongoing resilvers, each counter is nominally the total data resilvered across the whole pool. You'll thus need to wait for double the expected data amount before it completes. Thus, its very important to not reset the system until 100% turns into a "resilver completed..." statement in the status report.
1 comment:
I just had an incident where a resilver of a mirror was at 97% after 7 hours. Then I zfs destroyed a snapshot and behold - resilvering restarted at 0%.
Post a Comment