/root

Tag: OpenZFS

unit/btree

June 20th, 2026

Just added a new unit test suite that covers basic B-Tree operations including empty trees, add_find (), remove () and iteration functions.

			
> make unit T=btree
  CC       tests/unit/test_btree-test_btree.o
  CCLD     tests/unit/test_btree
  UNITTEST tests/unit/test_btree
Running test suite with seed 0xd9494352...
btree.empty                          [ OK    ] [ 0.00000157 / 0.00000060 CPU ]
btree.add_find                       [ OK    ] [ 0.00000719 / 0.00000690 CPU ]
btree.remove                         [ OK    ] [ 0.00000606 / 0.00000575 CPU ]
btree.walk                           [ OK    ] [ 0.00000379 / 0.00000380 CPU ]
4 of 4 (100%) tests successful, 0 (0%) test skipped.

		

https://github.com/openzfs/zfs/pull/18690

OpenZFS 2.4.3

June 13th, 2026

“OpenZFS 2.4.3 is out today as the newest stable point release to this open-source ZFS file-system implementation as well as point releases for the OpenZFS 2.3 and 2.2 series too.

One month after OpenZFS 2.4.2, OpenZFS 2.4.3 is now available with additional fixes. On the Linux side the kernel support still extends up through the Linux 7.0 stable kernel even with Linux v7.1 expected for release this coming Sunday. Hopefully another OpenZFS point release will be out shortly thereafter with blessed Linux 7.1 kernel support.

OpenZFS 2.4.3 adds an encryption key check for block cloning in ZVOL, some FreeBSD-specific work like being able to build the kernel module with sanitizers, fixing some double free conditions, fixing a possible panic, some Linux compatibility updates, a number of continuous integration (CI) updates, and various other minor fixes throughout.

Details on the OpenZFS 2.4.3 changes in full and downloads via GitHub.

In addition to the OpenZFS 2.4.3 point release, OpenZFS 2.3.8 and OpenZFS 2.2.10 are available with many of the same bug fixes back-ported to those prior series plus other relevant fixes.”

https://www.phoronix.com/news/OpenZFS-2.4.3-Released
OpenZFS 2.4.3 is out!!!

June 12th, 2026

Along with zfs-2.2.10 and zfs-2.3.8.

unit/namecheck

June 11th, 2026

Unit tests are deterministic tests that complement the ZTS testing infrastructure. They were first implemented by robn in test_zap.c to cover the ZAP API including microzap and fatzap.

My commit introduces a new namecheck validity testing framework for zfs pools, datasets, snapshots etc. It covers the full namecheck.c functions.

To run the test, execute make unit T=namecheck

			
> make unit T=namecheck
  UNITTEST tests/unit/test_namecheck
Running test suite with seed 0x5842ef3e...
namecheck.pool                       [ OK    ] [ 0.00000954 / 0.00000873 CPU ]
namecheck.dataset                    [ OK    ] [ 0.00001064 / 0.00000984 CPU ]
namecheck.snapshot                   [ OK    ] [ 0.00000589 / 0.00000590 CPU ]
namecheck.bookmark                   [ OK    ] [ 0.00000654 / 0.00000605 CPU ]
namecheck.component                  [ OK    ] [ 0.00000508 / 0.00000508 CPU ]
namecheck.permset                    [ OK    ] [ 0.00000608 / 0.00000561 CPU ]
namecheck.mountpoint                 [ OK    ] [ 0.00000337 / 0.00000329 CPU ]
namecheck.depth                      [ OK    ] [ 0.00000204 / 0.00000159 CPU ]
8 of 8 (100%) tests successful, 0 (0%) test skipped.

		

https://github.com/chrislongros/zfs/commit/7e054b2e7ea80c7c838f7fd44b7d517eea5c9d18

One step away from OpenZFS 2.4.3

June 11th, 2026

The patch set with 64 commits has been created!

https://github.com/openzfs/zfs/pull/18651#event-26637090181
FreeBSD June ZFS vendor import

June 8th, 2026

Happy to see my ZFS commits get upstream to FreeBSD 🙂

https://cgit.freebsd.org/src/log/?h=vendor/openzfs/master&showmsg=1

unit/zap: uint64 keys

June 6th, 2026

robn introduced a test suite for the ZFS Attribute Processor (ZAP) with this commit: https://github.com/robn/zfs/commit/1d601eb83b1b849edba047feae5137f0adb93ee2

Since then several API functions of ZAP were implemented as unit tests. With my new commit I introduce a uint64 keys test that provides coverage for binary uint64-array keys that are also used by the dedup table (DDT) and the block reference table (BRT).

The test runs dnode operations including: add, lookup, length, lookup_length, update and remove as they are implemented in module/zfs/zap.c

To build and run the test use the instructions here: https://github.com/openzfs/zfs/blob/master/tests/unit/README.md

The results:

			
> make unit
  UNITTEST tests/unit/test_zap
Running test suite with seed 0x8f27c767...
zap.mock_microzap_sanity             [ OK    ] [ 0.00001072 / 0.00001014 CPU ]
zap.mock_fatzap_sanity               [ OK    ] [ 0.00002379 / 0.00002299 CPU ]
zap.zap_basic
  type=micro                         [ OK    ] [ 0.00002169 / 0.00002172 CPU ]
  type=fat                           [ OK    ] [ 0.00002119 / 0.00002120 CPU ]
zap.zap_add
  type=micro                         [ OK    ] [ 0.00000964 / 0.00000965 CPU ]
  type=fat                           [ OK    ] [ 0.00002727 / 0.00002665 CPU ]
zap.zap_update
  type=micro                         [ OK    ] [ 0.00001184 / 0.00001185 CPU ]
  type=fat                           [ OK    ] [ 0.00001811 / 0.00001806 CPU ]
zap.zap_remove
  type=micro                         [ OK    ] [ 0.00001190 / 0.00001192 CPU ]
  type=fat                           [ OK    ] [ 0.00002998 / 0.00002978 CPU ]
zap.zap_count
  type=micro                         [ OK    ] [ 0.00001882 / 0.00001843 CPU ]
  type=fat                           [ OK    ] [ 0.00001707 / 0.00001706 CPU ]
zap.zap_contains
  type=micro                         [ OK    ] [ 0.00001400 / 0.00001359 CPU ]
  type=fat                           [ OK    ] [ 0.00001650 / 0.00001651 CPU ]
zap.zap_length
  type=micro                         [ OK    ] [ 0.00001039 / 0.00001038 CPU ]
  type=fat                           [ OK    ] [ 0.00002450 / 0.00002413 CPU ]
zap.zap_increment
  type=micro                         [ OK    ] [ 0.00001459 / 0.00001457 CPU ]
  type=fat                           [ OK    ] [ 0.00002773 / 0.00002737 CPU ]
zap.zap_int
  type=micro                         [ OK    ] [ 0.00002339 / 0.00002288 CPU ]
  type=fat                           [ OK    ] [ 0.00003203 / 0.00003164 CPU ]
zap.zap_int_keys
  type=micro                         [ OK    ] [ 0.00001585 / 0.00001579 CPU ]
  type=fat                           [ OK    ] [ 0.00004489 / 0.00004479 CPU ]
zap.microzap_stats                   [ OK    ] [ 0.00001210 / 0.00001212 CPU ]
zap.fatzap_stats                     [ OK    ] [ 0.00001901 / 0.00001899 CPU ]
zap.uint64_keys                      [ OK    ] [ 0.00001770 / 0.00001764 CPU ]
zap.cursor
  type=micro                         [ OK    ] [ 0.00002167 / 0.00002166 CPU ]
  type=fat                           [ OK    ] [ 0.00003274 / 0.00003226 CPU ]
zap.cursor_serialize
  type=micro                         [ OK    ] [ 0.00002177 / 0.00002173 CPU ]
  type=fat                           [ OK    ] [ 0.00002936 / 0.00002924 CPU ]
zap.cursor_release_unused
  type=micro                         [ OK    ] [ 0.00001119 / 0.00001115 CPU ]
  type=fat                           [ OK    ] [ 0.00001867 / 0.00001856 CPU ]
zap.cursor_release_advance
  type=micro                         [ OK    ] [ 0.00001058 / 0.00001056 CPU ]
  type=fat                           [ OK    ] [ 0.00001544 / 0.00001544 CPU ]
zap.cursor_release_empty
  type=micro                         [ OK    ] [ 0.00000918 / 0.00000907 CPU ]
  type=fat                           [ OK    ] [ 0.00001564 / 0.00001563 CPU ]
zap.cursor_release_one
  type=micro                         [ OK    ] [ 0.00001696 / 0.00001662 CPU ]
  type=fat                           [ OK    ] [ 0.00001843 / 0.00001841 CPU ]
zap.zap_value_search
  type=micro                         [ OK    ] [ 0.00001555 / 0.00001551 CPU ]
  type=fat                           [ OK    ] [ 0.00002377 / 0.00002379 CPU ]
zap.zap_value_search_mask
  type=micro                         [ OK    ] [ 0.00001525 / 0.00001524 CPU ]
  type=fat                           [ OK    ] [ 0.00002293 / 0.00002288 CPU ]
41 of 41 (100%) tests successful, 0 (0%) test skipped.

		

n8n CI failure automation

June 4th, 2026

This workflow automates the detection of OpenZFS CI failure detection via Gotify notifications !!
ARC analysis on my TrueNAS: 98% Hit Ratio!!

June 1st, 2026
My home server configuration:

The server runs TrueNAS SCALE (release 26.0.0-BETA.1) on an AMD Ryzen 7 PRO 8845HS, with 32 GiB of memory and no swap configured. Alongside ZFS it hosts a substantial collection of services including Portainer-managed containers, immich, Forgejo, FreshRSS, Jellyfin an automation platform, several PostgreSQL databases, an identity provider, and a number of smaller tools. At the moment of measurement, the operating system reported the following:
```
            total   used   free   buff/cache   available
Mem:         30Gi   27Gi   1.2Gi      3.8Gi        3.3Gi
```
Uptime stats:

The following figures come from /proc/spl/kstat/zfs/arcstats, accumulated over an uptime of 53 days:

Metric Value
ARC size 5.04 GiB
ARC target (c) 5.09 GiB
Maximum (c_max) 29.68 GiB
Minimum (c_min) 0.96 GiB
L2ARC none configured

The hit and miss rates, derived from the raw counters, break down as follows:

Access class Hit rate Miss rate
Overall 96.4% 3.6%
Demand data 98.05% 1.95%
Demand metadata 97.91% 2.09%
Prefetch data 6.0% 94.0%
Prefetch metadata 66.9% 33.1%

Live view of hit/miss performance:

Because the counters above are cumulative since boot, they represent an average spanning nearly two months and cannot, on their own, describe the system’s present behaviour. To capture that, I sampled the cache once per second under active load using arcstat:
```
    time  read  ddread  ddh%  dmread  dmh%  pread  ph%   size      c  avail
22:23:43   949     867  98.8      77  98.7      5    0   5.3G   5.3G   118M
22:23:44   668     384  97.7     284   100      0    0   5.3G   5.3G   425M
22:23:45  1.5K    1.1K  98.5     432   100     39  2.6   5.3G   5.3G   367M
22:23:46   686     421  98.6     248   100     17    0   5.3G   5.3G   344M
```
With reads peaking at roughly 1500 per second, the demand-data hit rate held steady at approximately 98% matching that of the uptime, thus confirming that the long-term average is not concealing a recent decline in performance. The cache is presently serving requests just as effectively as it has, on average, throughout its uptime.

ddh% : Demand-data hit

dmh% : Demand-metadata hit

Stats were obtained from /proc/spl/kstat/zfs/arcstats, with live sampling via arcstat.

The equivalent figures are available on FreeBSD under kstat.zfs.misc.arcstats.
When for God’s sake !!!!

May 31st, 2026
Updating package index before package removal in QEMU VM hosts

May 31st, 2026

One line of code can change a bunch of failed CI matrix runs!!
In the top 2 OpenZFS contributors over the last 3 months

May 25th, 2026
Special VDEVs be like :)

May 25th, 2026
Adding label, object, delay and panic support to the zinject ZTS test

May 25th, 2026

This PR adds test support for label, object, delay and panic error injection modes in the ZTS testing suite. It also contains negative tests verifying the function arguments. A new zinject_counter function is used as a helper to identify if delay, panic error modes are executed in the test.

https://github.com/openzfs/zfs/pull/18579/changes
ZFS checksum self-heal

May 24th, 2026
The setup

In April I expanded my main pool from four-wide RAIDZ2 to five-wide by adding a single 10 TB Seagate IronWolf to four existing 4 TB WD Red Plus drives. OpenZFS 2.3+ supports RAIDZ expansion: the new column gets added, the existing data keeps its old parity layout until rewritten, and the pool stays online throughout. The expansion completed normally.

About five weeks later, a scrub showed the following CKSUM error:
$ sudo zpool status zfs_tank pool: zfs_tank state: ONLINE status: One or more devices has experienced an unrecoverable error. An attempt was made to correct the error. Applications are unaffected. action: Determine if the device needs to be replaced, and clear the errors using 'zpool clear' or replace the device with 'zpool replace'. config: NAME STATE READ WRITE CKSUM zfs_tank ONLINE 0 0 0 raidz2-0 ONLINE 0 0 0 6a169351-6031-41d5-ad2a-9681142190c5 ONLINE 0 0 0 a006e053-c865-4330-8861-e21e4a3e37a6 ONLINE 0 0 0 b7d78b79-cb70-4afe-b9d0-8e4b2282fb18 ONLINE 0 0 0 707c32af-4e4e-4fc7-b000-dd5b52f75158 ONLINE 0 0 1 ff3c3a00-9f71-4b6b-87e6-c56deb4c6854 ONLINE 0 0 1 errors: No known data errors
One checksum error on each of two disks. The pool itself reports zero errors: errors: No known data errors. So no data were degraded.

RAIDZ2 carries two parity columns, the scrub detected bad blocks, and ZFS reconstructed them from parity so the pool status remains healthy. But zpool status only tells you that an error happened and not if it got corrected.

So where is the healing actually recorded?

What zpool status shows, and what it doesn’t

The four columns in zpool status map directly to four counters in the kernel’s vdev_stat_t structure (include/sys/vdev_impl.h):
- vs_read_errors
- vs_write_errors
- vs_checksum_errors
- the implicit STATE
zpool status parses each leaf vdev’s stats and prints those four numbers. It does not print any of the other ~30 fields in the structure — including this one:
uint64_t vs_self_healed; /* total bytes self-healed */
vs_self_healed is incremented in vdev_stat_update() whenever ZFS issues a write with the ZIO_FLAG_SELF_HEAL flag set, which happens after a successful parity reconstruction. The kernel knows exactly how many bytes were healed on each disk. It just doesn’t tell you via the standard zpool status output.

Three places the heal counter does surface

1. Raw kstats (Linux only)

The OpenZFS Linux module exposes every leaf vdev’s full vdev_stat_t under /proc/spl/kstat/zfs/<pool>/. The filenames use the leaf vdev GUID. Pull those GUIDs out of the pool query:
$ sudo ls /proc/spl/kstat/zfs/zfs_tank/ | head io state txgs vdev_395717205876781294 vdev_4003307236673040230 vdev_7306733904703790705 vdev_803393823450321549 vdev_9021081546382363770
The 9021... and 4003... files are the two disks with errors. Inside:
$ sudo cat /proc/spl/kstat/zfs/zfs_tank/vdev_9021081546382363770 ... name type data vdev_state 3 7 vdev_guid 4 9021081546382363770 read_errors 4 0 write_errors 4 0 checksum_errors 4 1 self_healed 4 4096 ...
self_healed 4096 — four kilobytes. The pool’s ashift is 12, so one block. Exactly one block was reconstructed and rewritten on this disk. Same value on the other affected disk.

2. The TrueNAS middleware API

If you’re on TrueNAS, the same field comes back as JSON from the pool.query endpoint:
{ "name": "707c32af-4e4e-4fc7-b000-dd5b52f75158", "stats": { "checksum_errors": 1, "self_healed": 4096, "read_errors": 0, "write_errors": 0 } }
This is how I first saw the number. The middleware just unpacks vdev_stat_t into JSON.

3. zpool events — the actual heal log

Counters tell you how much. To see when and where, look at the ZFS event ring buffer:
$ sudo zpool events -v zfs_tank | grep -A 30 'ereport.fs.zfs.checksum' May 23 2026 05:42:14.823145112 ereport.fs.zfs.checksum class = "ereport.fs.zfs.checksum" ena = 0x... detector = (embedded nvlist) version = 0x0 scheme = "zfs" pool = 0x6794d4c... vdev = 0x7d24a0... (end detector) pool = "zfs_tank" pool_guid = 0x6794d4cc9d3a8916 vdev_guid = 0x7d24a0aaa18cb6ba vdev_type = "disk" vdev_path = "/dev/disk/by-partuuid/707c32af-4e4e-4fc7-b000-dd5b52f75158" zio_err = 0 zio_offset = 0x... zio_size = 0x1000 zio_objset = 0x... zio_object = 0x... zio_blkid = 0x... cksum_expected = ... cksum_actual = ...
Each event names the affected disk, the byte offset on that disk, the size (here 0x1000 = 4 KiB), the dataset and object, and both checksums. With this you can compute exactly which file (if any) the block belonged to. The buffer holds ~1000 events by default (zfs_zevent_len_max), so old events roll out unless ZED has persisted them to /var/log/zfs/zed.log.

This is the closest thing ZFS has to a “self-heal log.”

What I actually had

Two disks, one block each, both healed. Different manufacturers (WD Red Plus 4 TB / Seagate IronWolf 10 TB), different uptime (6124 h / 3724 h), so a shared hardware fault was unlikely. SMART on both was clean:
$ sudo smartctl -a /dev/sde | grep -E '^( 5|197|198|199) ' 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0
Same story on the Seagate. No reallocated sectors, no pending sectors, no UDMA CRC errors.

UDMA_CRC_Error_Count is the SATA-link error counter. If a cable, backplane, or HBA channel is marginal, this is where it shows up. Both at zero rules out the data path between disk and controller.

What it doesn’t rule out is RAM. This system runs 32 GB of non-ECC DDR5. A single bit-flip in a write buffer leaves a permanently-bad block on disk that scrubs will keep detecting and healing on every pass. The block stays wrong because the heal write reads the (correct) reconstructed buffer from the same RAM that may flip again. Without ECC, you can’t fully exclude this; with non-ECC, you also can’t measure it.

zpool clear vs zpool scrub

The counters in zpool status and vs_self_healed are cumulative since the last zpool clear (or since pool creation). A scrub does not reset them.

So when I ran a scrub after the original event, the 1s in the CKSUM column did not go away — they were the same 1s from before.
$ sudo zpool clear zfs_tank $ sudo zpool status zfs_tank pool: zfs_tank state: ONLINE config: NAME STATE READ WRITE CKSUM zfs_tank ONLINE 0 0 0 ... errors: No known data errors
The status line and the per-disk counters reset together. vs_self_healed resets too. After that, the next scrub starts from zero — if the same blocks show up healed again, you know the corruption is persistent on-disk (and the suspicion shifts toward RAM); if they don’t, the original event was probably a one-shot.

After the zpool clear