How a TrueNAS Nightly Update Bug Left My Server Unbootable

How a failed nightly update left my TrueNAS server booting into an empty filesystem — and the two bugs responsible.

I run TrueNAS Scale on an Aoostar WTR Max as my homelab server, with dozens of Docker containers for everything from Immich to Jellyfin. I like to stay on the nightly builds to get early access to new features and contribute bug reports when things go wrong. Today, things went very wrong.

The Update Failure

It started innocently enough. I kicked off the nightly update from the TrueNAS UI, updating from 26.04.0-MASTER-20260210-020233 to the latest 20260213 build. Instead of a smooth update, I got this:

error[EFAULT] Error: Command ['zfs', 'destroy', '-r',
  'boot-pool/ROOT/26.04.0-MASTER-20260213-020146-1']
  failed with exit code 1:
  cannot unmount '/tmp/tmpo8dbr91e': pool or dataset is busy

The update process was trying to clean up a previous boot environment but couldn’t unmount a temporary directory it had created. No big deal, I thought — I’ll just clean it up manually.

Down the Rabbit Hole

I checked what was holding the mount open:

bash$ fuser -m /tmp/tmpo8dbr91e    # nothing
$ lsof +D /tmp/tmpo8dbr91e     # nothing (just Docker overlay warnings)

Nothing was using it. A force unmount also failed:

bash$ sudo umount -f /tmp/tmpo8dbr91e
umount: /tmp/tmpo8dbr91e: target is busy.

Only a lazy unmount worked:

bash$ sudo umount -l /tmp/tmpo8dbr91e

So I unmounted it and destroyed the stale boot environment manually. Then I retried the update. Same error, different temp path. Unmount, destroy, retry. Same error again. Each attempt, the updater would mount a new temporary directory, fail to unmount it, and bail out.

I even tried stopping Docker before the update, thinking the overlay mounts might be interfering. No luck.

The Real Problem

Frustrated, I rebooted the server thinking a clean slate might help. The server didn’t come back. After 10 minutes of pinging with no response, I plugged in a monitor and saw this:

consoleMounting 'boot-pool/ROOT/26.04.0-MASTER-20260213-020146' on '/root/' ... done.
Begin: Running /scripts/local-bottom ... done.
Begin: Running /scripts/nfs-bottom ... done.
run-init: can't execute '/sbin/init': No such file or directory
Target filesystem doesn't have requested /sbin/init.
run-init: can't execute '/etc/init': No such file or directory
run-init: can't execute '/bin/init': No such file or directory
run-init: can't execute '/bin/sh': No such file or directory
No init found. Try passing init= bootarg.

BusyBox v1.37.0 (Debian 1:1.37.0-6+b3) built-in shell (ash)
Enter 'help' for a list of built-in commands.

(initramfs)

The system had booted into the incomplete boot environment from the failed update — an empty shell with no operating system in it. The update process had set this as the default boot environment before it was fully built.

The Recovery

Fortunately, ZFS boot environments make this recoverable. I rebooted again, caught the GRUB menu, and selected my previous working boot environment (20260210-020233). After booting successfully, I locked in the correct boot environment as the default:

bash$ sudo zpool set bootfs=boot-pool/ROOT/26.04.0-MASTER-20260210-020233 boot-pool

Then cleaned up the broken environment:

bash$ sudo zfs destroy -r boot-pool/ROOT/26.04.0-MASTER-20260213-020146

Server back to normal.

Two Bugs, One Update

There are actually two separate bugs here:

Bug 1 — Stale Mount Cleanup The update process mounts the boot environment into a temp directory but can’t clean it up when something fails. umount -f doesn’t work; only umount -l does. And since each retry creates a new temp mount, the problem is self-perpetuating.
Bug 2 — Premature Bootfs Switch (Critical) This is the dangerous one. The updater sets the new boot environment as the GRUB default before it’s fully populated. When the update fails mid-way, you’re left with a system that will boot into an empty filesystem on the next reboot. If you don’t have physical console access and a keyboard handy, you could be in serious trouble.

What Happens During a Failed Update

Update starts
Sets new bootfs
Build fails
Reboot = initramfs

The Fix Should Be Simple

The updater should only set the new boot environment as the default after the update is verified complete. And it should use umount -l as a fallback when umount -f fails, since the standard force unmount clearly isn’t sufficient here.

I’ve filed this as NAS-139794 on the TrueNAS Jira. If you’re running nightly builds, be aware of this issue — and make sure you have console access to your server in case you need to select a different boot environment from GRUB.

Lessons Learned

Running nightly builds is inherently risky, and I accept that. But an update failure should never leave a system unbootable. The whole point of ZFS boot environments is to provide a safety net — but that net has a hole when the updater switches the default before the new environment is ready.

In the meantime, keep a monitor and keyboard accessible for your TrueNAS box, and remember: if you ever drop to an initramfs shell after an update, your data is fine. Just reboot into GRUB and pick the previous boot environment.

  • TrueNAS
  • ZFS
  • Homelab
  • Boot Environments
  • Bug Report
,

Leave a comment

Discover more from /root

Subscribe now to keep reading and get access to the full archive.

Continue reading