Fixing a 7-Year-Old UX Bug in OpenZFS

Here’s a scenario most ZFS users have run into at least once. You reboot your server, maybe a drive didn’t spin up in time, or mdadm grabbed a partition before ZFS could — and zpool import hits you with this:

The pool metadata is corrupted.

Action: Destroy and re-create the pool.

Your stomach drops. Corrupted? You start mentally cataloging your backups. Maybe you even reach for zpool destroy.

Except… the metadata isn’t corrupted. ZFS just couldn’t see all the disks. The data is fine. The pool is fine. The error message is the problem.

I’ve hit this myself on my TrueNAS box when a drive temporarily disappeared after a reboot. The first time I saw it I genuinely panicked. After digging into the source code, I realized that ZPOOL_STATUS_CORRUPT_POOL is basically a catch-all. Anytime the root vdev gets tagged with VDEV_AUX_CORRUPT_DATA — whether from actual corruption or simply missing devices — you get the same scary message. No distinction whatsoever.

This has been a known issue since 2018. Seven years. Plenty of people complained about it, but nobody got around to fixing it.

So I did. The PR is pretty straightforward — it touches four user-facing strings across the import and status display code paths. The core change:

Before
“The pool metadata is corrupted.”
→ Destroy and re-create the pool.

After
“The pool metadata is incomplete or corrupted.”
→ Check that all devices are present first.

The recovery message also changed. Instead of jumping straight to “destroy the pool”, it now tells you to make sure your devices aren’t claimed by another subsystem (mdadm, LVM, etc.) and try the import again. You know, the thing you should actually try first before nuking your data.

Brian Behlendorf reviewed it, said it should’ve been cleaned up ages ago, and merged it into master today. Not a glamorous contribution — no new features, no performance gains, just four strings. But if it saves even one person from destroying a perfectly healthy pool because of a misleading error message, that’s a win.

PR: openzfs/zfs#18251 — closes #8236

/root

Fixing a 7-Year-Old UX Bug in OpenZFS

Leave a comment Cancel reply

Fixing a 7-Year-Old UX Bug in OpenZFS

Share this:

Leave a comment Cancel reply

Discover more from /root