Major features or significant feature enhancements by kernel version. For more information look below.
The version states at which version a feature has been merged into the mainline kernel. It does not tell anything about at which kernel version it is considered mature enough for production use. For an estimation on stability of features see Status page.
Send protocol update that adds new commands and extends existing functionality to write large data chunks. Compressed (and encrypted) extents can be optionally emitted and transferred as-is without the need to re-compress (or re-encrypt) on the receiving side.
The file /sys/fs/btrfs/FSID/commit_stats
shows number of commits and
various time related statistics.
Chunk size value can be read from
/sys/fs/btrfs/FSID/allocation/PROFILE/chunk_size
.
The zoned mode has been supported since 5.10 and adding functionality. Now it’s advertised among features.
When a filesystem is mounted the implementation backing the checksums
is logged. The information is also accessible in
/sys/fs/btrfs/FSID/checksum
.
Allow user override of qgroup accounting and make it temporarily out of date e.g. in case when there are several subvolumes deleted and the qgroup numbers need to be updated at some cost, an update after that can amortize the costs.
An improvement to scrub in case the superblock is detected to be corrupted, the repair happens immediately. Previously it was delayed until the next transaction commit for performance reasons that would store an updated and correct copy eventually.
An incompatible change that has to be enabled at mkfs time. Add a new b-tree item that stores information about block groups in a compact way that significantly improves mount time that’s usually long due to fragmentation and scattered b-tree items tracking the individual block groups. Requires and also enables the free-space-tree and no-holes features.
The directory /sys/fs/btrfs/FSID/discard
exports statistics and
tunables related to discard.
The overall status of qgroups are exported in
/sys/sys/fs/btrfs/FSID/qgroups/
.
Do full check of super block once a filesystem is thawed. This namely happens when system resumes from suspend or hibernation. Accidental change by other operating systems will be detected.
Devices that support trim/discard will enable the asynchronous discard for the whole filesystem.
The default IOPS limit has changed from 100 to 1000 and writing value 0
to /sys/fs/btrfs/FSID/discard/iops_limit
newly means to not do any
throttling.
Pack files by size (up to 128k, up to 8M, more) to avoid fragmentation
in block groups, assuming that file size and life time is correlated,
in particular this may help during balance. The stats about the number
of used classes per block group type is exported in
/sys/fs/btrfs/FSID/allocation/*/size_classes
.
A seeding device could have a different FSID, available in sysfs and now available via DEV_INFO ioctl.
Utimes for directories are emitted into the send stream only when finalizing the directory, the cache also gains significant speedups (up to 10x).
New tree for logical mapping, allows some RAID modes for zoned mode.
A simplified mode of qgroups accounting
Mount of cloned devices is now possible, the filesystem will get a new randomly generated UUID on mount
Use new mount API (https://lwn.net/Articles/753473/)
The extendable syscall statx also returns the subvolume id and sets the result_mask bit STATX_SUBVOL.
An optimization for concurrent access to a range that is reflinked and read at the same time, the read latency is decreased due to reduced locking.
This applies to 0 level qgroups (the one automatically created for a subvolume), once the subvolume is deleted the respective qgroup is also deleted. This may take some time until the qgroup accounting is correct and consistent again as the subvolume deletion is delayed.
This is also affected by presence of the subvolume qgroup in higher level qgroups or the sysfs setting of drop_subtree_threshold that will need a quota rescan.
A per-filesystem report of background reclaim status, file names matching reclaim_ in the space info directory.
Run background block group reclaim (using the relocation/balance mechanism) if the used size is above the configured value and the dynamic reclaim is enabled (not by default). When enabled, there’s a heuristic that ties to avoid increasing system load if there’s enough unallocated space but will try hard (but cannot be perfect) to avoid a situation when there’s last chunk remaining to make the relocation possible.
When enabled, any metadata checksum mismatch is ignored (in read-only mount), this may be useful in an interrupted checksum type conversion (btrfstune(8)).
An option to ignore unknown super block flags, at this point applies only to the interrupted checksum conversion, but can be useful for similar operations in the future.
Properly verify all types of directory items and reject unknown ones. Do relevant device item checks.
Check if the last inode extent (not a full block length) can be cloned and do it, this fixes a problem in send/receive.
Mandated by POSIX (https://pubs.opengroup.org/onlinepubs/9699919799/functions/unlink.html), the link count is changed.
Avoid indirection when the BTRFS_IOC_SYNC ioctl is called and wake up the cleaner thread which is among other things responsible to clean deleted subvolumes.
Improve concurrency by reducing scope of locking around buffered reads. The direct io is still locked but this should not be mixed with buffered writes.
Add more points where the discard can be interrupted by signals before it finishes the whole operation.
Add separate config option to distinguish purely debugging features (like extended safety checks) and features that still need some refinements (and were hidden under the debugging config option not to expose them to users). When enabled this namely covers extent tree v2, raid stripe tree, send protocol version 3 and checksum offloading strategy.
The io_uring subsystem understands a command that is directed to Btrfs encoded read ioctl.
Add specialized ioctl to wait for deleted (and maybe not yet cleaned) subvolumes, available to any user. The related command btrfs subvolume sync uses the privileged SEARCH_TREE ioctl otherwise.
The sprout device (the writable one added to the seeding device) does not touch the superblock read-only status, preventing removal of accumulated deleted snapshots to be cleaned.
Update tree-checker to detect more wrong inline extent references.
Add support for FS_IOC_READ_VERITY_METADATA to directly query the Merkle tree, descriptor and signature blocks for fs-verity enabled files.
Add more read balancing policies, configurable in /sys/fs/btrfs/FSID/read_policy
or as module parameter read_policy. Newly added round-robin,
devid:N (select a specific mirror number N).
The io_uring subsystem understands a command that is directed to Btrfs encoded write ioctl.
Direct IO may lead to data and their checksums mismatch. Use the direct to buffered fallback in case the file has checksums. This has a negative performance impact.
Mount options for zstd compression accept negative values -1..-15 match the levels. They provide faster compression at the cost of worse ratio.
For testing the subpage block size feature, the size of 2K is accepted on x86_64 which has 4K pages.
The defrag ioctl also accepts the negative zstd levels that can be set as mount option.
Add entry to commit_stats
to detect commit stalls, for
debugging or monitoring purposes.
Large folios abstract contiguous page ranges representing some filesystem data or metadata as one structure instead of several ones. This simplifies code and has a positive impact on performance. As it touches the core data structure it is not enabled by default.
Any btrfs mounted device cannot be opened for writes.
The defrag ioctl was not able to uncompress a given range, now it’s possible.
File holes, ranges not representing data, were emulated by a zero filled data. This is less efficient than puching holes.
With some limitations where COW design does not work well with the swap implementation (nodatacow file, no compression, cannot be snapshotted, not possible on multiple devices, …), as this is the most restricted but working setup, we’ll try to improve that in the future
An optional incompat feature to assign a new filesystem UUID without overwriting all metadata blocks, stored only in superblock, unlike what btrfstune -u
Unregister devices previously added by the scan ioctl, same effect as if the kernel module is reloaded.
Allow to set the ZSTD compression level via mount option, e.g. like compress=zstd:9. The levels match the default ZSTD compression levels. The default is 3, maximum is 15.
Verify metadata blocks before submitting them to the devices. This can catch consistency problems or bitflips.
New checksum algorithms: xxhash (64b), SHA256 (256b), BLAKE2b (256b).
RAID1 with 3- and 4- copies (over all devices).
Mode of discard (mount -o discard=async) that merges freed extents to larger chunks and submits them for discard in a less intrusive way
More information about device state can be found in per-filesystem sysfs directory.
Inline files can be reflinked to the tail extent of other files
More cancellation points in balance that will shorten the time to stop processing once btrfs balance cancel is called.
Remove support of flag BTRFS_SUBVOL_CREATE_ASYNC from subvolume creation ioctl.
New ioctl BTRFS_IOC_SNAP_DESTROY_V2, deletion by subvolume id is now possible.
Unified mount option for actions that may help to access a damaged filesystem. Now supports: nologreplay, usebackuproot
The information about qgroup status and relations is exported in /sys/fs/UUID/qgroups
Export more information: checksum type, checksum size, generation, metadata_uuid
Export which filesystem exclusive operation is running (balance, resize, device add/delete/replace, …)
Remove inode number caching feature (mount -o inode_cache)
Additional modes for mount option rescue=: ignorebadroots/ibadroots,
ignoredatacsums/idatacsums. All are exported in
/sys/fs/btrfs/features/supported_rescue_options
.
Support for zoned devices with special allocation/write mode to fixed-size zones. See Zoned.
List supported sector sizes in sysfs file /sys/fs/btrfs/features/supported_sectorsizes
.
Tunable bandwidth limit
/sys/fs/btrfs/FSID/devinfo/DEVID/scrub_speed_max
for scrub (and
device replace) for a given device.
The device stats can be also found in /sys/fs/btrfs/FSID/devinfo/DEVID/error_stats
.
The filesystem resize and device delete operations can be cancelled by specifying cancel as the device name.
Change how empty value is interpreted. New behaviour will delete the value and reset it to default. This affects btrfs.compression where value no sets NOCOMPRESS bit while empty value resets all compression settings (either compression or NOCOMPRESS bit).
The fs-verity is a support layer that filesystems can hook into to support transparent integrity and authenticity protection of read-only files. https://www.kernel.org/doc/html/latest/filesystems/fsverity.html
Support mount with UID/GID mapped according to another namespace. https://lwn.net/Articles/837566/
Zoned namespaces. https://zonedstorage.io/docs/introduction/zns , https://lwn.net/Articles/865988/
Send and relocation (balance, device remove, shrink, block group reclaim) can now work in parallel.
It is possible to add a device with paused balance.
Note
Since kernel 5.17.7 and btrfs-progs 5.17.1
Mounting with -o flushoncommit does not trigger the (harmless) warning at each transaction commit.
Note
Also backported to 5.15.27 and 5.16.13
DUP metadata works with zoned mode.
New ioctls to read and write pre-encoded data (i.e. no transformation and directly written as extents), now works for compressed data.
The support for ioctl BTRFS_IOC_BALANCE has been removed, superseded by BTRFS_IOC_BALANCE_V2 long time ago.
The VFS limitation to reflink files on separate subvolume mounts of the same filesystem has been removed.
Messages are printed with a one letter tag (“state: X”) that denotes in which state the filesystem was at this point:
A - transaction aborted (permanent)
E - filesystem error (permanent)
M - remount in progress (transient)
R - device replace in progress (transient)
C - checksum checks disabled by mount option (rescue=ignoredatacsums)
L - log tree replay did not complete due to some error
Metadata buffer to be written gets an extra check if the stored transaction number matches the current state of the filesystem.
Metadata node size is supported regardless of the CPU page size (minimum size is 4KiB), data sector size is supported <= page size. Additionally subpage also supports RAID56.
Add sysfs tunable for background reclaim threshold for all block group types (data, metadata, system).
Device information is stored in two places, the number in the super block and items in the device tree. When this is goes out of sync, e.g. by device removal short before unmount, the next mount could fail. The b-tree is an authoritative information an can be used to override the stale value in the superblock.
The logic has been changed so that inline files are considered for defragmentation even if the mount option max_inline would prevent that. No defragmentation might happen but the inlined files are not skipped.
Set the minimum limit of zone on zoned devices to 4MiB. Real devices zones are much larger, this is for emulated devices.
Add possibility to set a threshold to automatically reclaim block groups
also in non-zoned mode. By default completely empty block groups are
reclaimed automatically but the threshold can be tuned in
/sys/fs/btrfs/FSID/allocation/PROFILE/bg_reclaim_threshold
.
Additional check done by tree-checker to verify relationship between a tree block and it’s tree root owner.
Save creation time (otime) for all new files and directories. For future use, current tool cannot read it directly.
The INO_LOOKUP will return root id (id of the containing subvolume), unrestricted and to all users if the treeid is 0.
The EXTENT_SAME ioctl will accept the same inode as source and destination (ranges must not overlap).
Trim will be performed also on the space that’s not allocated by the chunks, not only free space within the allocated chunks.
Enhanced syntax and new balance filters:
limit=min..max
usage=min..max
stripes=min..max
Improved implementation of free space cache (aka v2), using b-trees.
Note
Default since btrfs-progs 5.15, Kernel 4.9 fixes endianness bugs on big-endian machines, x86* is ok
Conversion to data/DUP profile possible through balance filters -- on single-device filesystem.
Note
mkfs.btrfs allows creating DUP on single device in the non-mixed mode since 4.4
The default value of max_inline changed to 2048.
The existing ioctl GET_SUPPORTED_FEATURES can be now used on the
control device (/dev/btrfs-control
) and returns the supported features
without any mounted filesystem.
Add new ioctl RM_DEV_V2, pass device to be deleted by its ID.
Add support for RENAME_EXCHANGE and RENAME_WHITEOUT to renameat2 syscall. This also means that overlayfs is now supported on top of btrfs.
Conversion to data/DUP profile possible through balance filters -- on multiple-device filesystems.
Note
mkfs.btrfs allows creating DUP on multiple devices since 4.5.1
Scrub will attempt auto-repair (similar to raid1/raid10)
Support for the enhanced statx syscall; file creation timestamp
qgroups: new sysfs control file to allow temporary quota override with CAP_SYS_RESOURCE
That was a debugging helper, not used and not supposed to be used nowadays.
New compression algorithm ZSTD, supposedly better ratio/speed performance.
Allow degraded mount based on the chunk constraints, not device number constraints. E.g. when one device is missing but the remaining one holds all single chunks.
BTRFS_IOC_TRANS_START and BTRFS_IOC_TRANS_END, no known users, tricky to use; scheduled to be removed in 4.17
The mount option ssd does not make any assumptions about block layout or management by the device anymore, leaving only the speedups based on low seek cost active. This could avoid some corner cases leading to excessive fragmentation. https://git.kernel.org/linus/583b723151794e2ff1691f1510b4e43710293875 The story so far.
Overlayfs can now use btrfs as the lower filesystem.
Debugging functionality to verify extent references. New mount option ref-verify, must be built with CONFIG_BTRFS_FS_REF_VERIFY.
Allow to set the ZLIB compression level via mount option, e.g. like compress=zlib:9. The levels match the default ZLIB compression levels. The default is 3.
An enhanced version of ioctl that can translate logical extent offset to inode numbers, “who owns this block”. For certain use cases the V1 performs bad and this is addressed by V2. See for more https://git.kernel.org/linus/d24a67b2d997c860a42516076f3315c2ad2d2884 .
Apply a few heuristics to the data before they’re compressed to decide if it’s likely to gain any space savings. The methods: frequency sampling, repeated pattern detection, Shannon entropy calculation.
Mode of the fallocate syscall to zero file range.
Deprecated in 4.14, see above.
Allow rmdir to delete an empty subvolume.
Add support for ioctl FS_IOC_FSSETXATTR/FS_IOC_FSGETXATTR, successor of FS_IOC_SETFLAGS/FS_IOC_GETFLAGS ioctl. Currently supports: APPEND, IMMUTABLE, NOATIME, NODUMP, SYNC. Note that the naming is very confusing, though it’s named xattr, it does not mean the extended attributes. It should be referenced as extended inode flags or xflags.
The range for out-of-band deduplication implemented by the EXTENT_SAME ioctl will split the range into 16MiB chunks. Up to now this was the overall limit and effectively only the first 16MiB was deduplicated.
New ioctl to read subvolume information (id, directory name, generation, flags, UUIDs, time). This does not require root permissions, only the regular access to to the subvolume.
New ioctl to enumerate subvolume references of a given subvolume. This does not require root permissions, only the regular access to to the subvolume.
New ioctl to lookup path by inode number. This does not require root permissions, only the regular access to to the subvolume, unlike the INO_LOOKUP ioctl.
Allow to run defrag on files that are normally accessible for read-write, but are currently opened in read-only mode.
Read all data and verify checksums, repair if possible.
Automatic repair of broken data from a good copy
Save a few previous versions of the most important tree roots at commit time, used by -o recovery
Optional infrastructure to verify integrity of written metadata blocks
Groundwork to allow tracking owner of blocks, used via inspect-internal
RAID profiles can be changed on-line, balance filters
Support for metadata blocks larger than page size
Note
Default nodesize is 16KiB since btrfs-progs 3.12
Generic infrastructure for graceful error handling (EIO)
Persistent statistics about device errors
Noticeable improvements in fsync() implementation
Subvolume-aware quotas
Ability to transfer one filesystem via a data stream (full or incremental) and apply the changes on a remote filesystem.
Hardlink count limit is lifted to 65536.
Note
Default since btrfs-progs 3.12
Implement the FALLOC_FL_PUNCH_HOLE mode of fallocate.
Efficient replacement of existing device (add/remove in one go).
Basic support for RAID5/6 profiles, no crash resiliency, replace and scrub support.
Defrag does not break links between shared extents (snapshots, reflinked files).
Note
Disabled since 3.14 (and backported to some stable kernel versions) due to problems. Has been completely removed in 5.6.
A mode of send that does not add the actual file data to the stream.
Label editable on mounted filesystems.
Reduced metadata size (format change) of extents.
Note
Default since btrfs-progs 3.18
Sync qgroups with existing filesystem data.
A map of subvolume/UUID that vastly speeds up send/receive.
Support for deduplicating extents on a given set of files.
No extent representation for file holes (format change), may reduce overall metadata consumption
/sys/fs/btrfs
exports various bits about filesystem
capabilities and feature support
Mode of open() to safely create a temporary file
The extended SEARCH_TREE ioctl able to get more than a 4k data
Automatically remove block groups (aka. chunks) that become completely empty.
Scrub and device replace works on RAID56 filesystems.