Difference between revisions of "Features"

From OpenZFS
Jump to: navigation, search
(Created page with "== Feature Comparison: Open ZFS vs Oracle ZFS == This is where the table goes! I'm not sure what that table will look like, but here are a couple of possibilities... {| borde...")
 
m (CLI Usability)
 
(114 intermediate revisions by 12 users not shown)
Line 1: Line 1:
== Feature Comparison: Open ZFS vs Oracle ZFS ==
+
This page describes some of the more important features and performance improvements that are part of OpenZFS.
This is where the table goes! I'm not sure what that table will look like, but here are a couple of possibilities...
+
  
{| border=1
+
Help would be appreciated in porting features between platforms whose status is "not yet".
!Open ZFS
+
 
!Oracle ZFS
+
== Feature Flags ==
 +
 
 +
See the [[Feature_Flags|Feature Flags]] wiki page.
 +
 
 +
== libzfs_core ==
 +
 
 +
See this [http://blog.delphix.com/matt/2012/01/17/the-future-of-libzfs/ blog post (Jan 2012)] and associated [http://blog.delphix.com/matt/files/2012/01/The_Future_of_LibZFS.pdf slides] and [http://www.youtube.com/watch?feature=player_embedded&list=PL1A94C8EECCAF7340&v=iJ0S91ygErE video] for more details.
 +
 
 +
First introduced in:
 +
{| class="wikitable"
 
|-
 
|-
|Feature A<br>Feature B<br>Feature C<br>
+
|'''illumos'''
|Feature A
+
|[https://github.com/illumos/illumos-gate/commit/4445fffbbb1ea25fd0e9ea68b9380dd7a6709025 June 2012]
 +
|-
 +
|'''FreeBSD'''
 +
|[https://github.com/freebsd/freebsd/commit/f1985e5cd725ff1816cd4afbdee2d95b661883f0 March 2013]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/6f1ffb06655008c9b519108ed29fbf03acd6e5de August 2013]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/50b0b2d9ea604d29ed729be8fa61bb77ae3ff4e9 October 2013]
 
|}
 
|}
  
<br>
+
== CLI Usability ==
  
{| border=1
+
These are improvements to the command line interface. While the end result is a generally more friendly user interface, getting the desired behavior often required modifications to the core of ZFS.
!Feature
+
 
!Open ZFS
+
''Listed in chronological order (oldest first).''
!Oracle ZFS
+
 
 +
==== Pool Comment ====
 +
 
 +
OpenZFS has a per-pool comment property which can be set with the <tt>zpool set</tt> command and can be read even if the pool is not imported, so it is accessible even if pool import fails.
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|[https://github.com/illumos/illumos-gate/commit/8704186e373c9ed74daa395ff3f7fd745396df9e Nov 2011]
 +
|-
 +
|'''FreeBSD'''
 +
|[https://github.com/freebsd/freebsd/commit/73cab7197174153a718dedc97ff344341bcf6098 Nov 2011]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/d96eb2b1538db13ee7a716ec0e1162f5735edc12 Aug 2012]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/d96eb2b1538db13ee7a716ec0e1162f5735edc12 Aug 2012]
 +
|}
 +
 
 +
==== Size Estimates for <tt>zfs send</tt> and <tt>zfs destroy</tt> ====
 +
 
 +
This feature enhances OpenZFS's internal space accounting information. This new accounting information is used to provide a <tt>-n</tt> (dry-run) option for <tt>zfs send</tt> which can instantly calculate the amount of send stream data a specific <tt>zfs send</tt> command would generate. It is also used for a <tt>-n</tt> option for <tt>zfs destroy</tt> which can instantly calculate the amount of space that would be reclaimed by a specific <tt>zfs destroy</tt> command.
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|[https://github.com/illumos/illumos-gate/commit/e5351341b58845eee9d722bd71543d5a7c26b6cc Nov 2011]
 +
|-
 +
|'''FreeBSD'''
 +
|[https://github.com/freebsd/freebsd/commit/73cab7197174153a718dedc97ff344341bcf6098 Nov 2011]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/330d06f90d143b41b276796526a66a1c1fff046d Jul 2012]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/330d06f90d143b41b276796526a66a1c1fff046d Jul 2012]
 +
|}
 +
 
 +
==== vdev Information in <tt>zpool list</tt> ====
 +
 
 +
OpenZFS adds a <tt>-v</tt> option to the <tt>zpool list</tt> command which shows detailed sizing information about the vdevs in the pool:
 +
 
 +
$ zpool list -v
 +
NAME          SIZE  ALLOC  FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
 +
dcenter      5.24T  3.85T  1.39T        -    73%  1.00x  ONLINE  -
 +
  mirror      556G  469G  86.7G        -
 +
    c2t1d0      -      -      -        -
 +
    c2t0d0      -      -      -        -
 +
  mirror      556G  493G  63.0G        -
 +
    c2t3d0      -      -      -        -
 +
    c2t2d0      -      -      -        -
 +
  mirror      556G  493G  62.7G        -
 +
    c2t5d0      -      -      -        -
 +
    c2t4d0      -      -      -        -
 +
  mirror      556G  494G  62.5G        -
 +
    c2t8d0      -      -      -        -
 +
    c2t6d0      -      -      -        -
 +
  mirror      556G  494G  62.2G        -
 +
    c2t10d0      -      -      -        -
 +
    c2t9d0      -      -      -        -
 +
  mirror      556G  494G  61.9G        -
 +
    c2t12d0      -      -      -        -
 +
    c2t11d0      -      -      -        -
 +
  mirror    1016G  507G  509G        -
 +
    c1t1d0      -      -      -        -
 +
    c1t5d0      -      -      -        -
 +
  mirror    1016G  496G  520G        -
 +
    c1t3d0      -      -      -        -
 +
    c1t4d0      -      -      -        -
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|[https://github.com/illumos/illumos-gate/commit/4263d13f00c9691fa14620eff82abef795be0693 Jan 2012]
 +
|-
 +
|'''FreeBSD'''
 +
|[https://github.com/freebsd/freebsd/commit/38ef930079cbb1df5de9df3c1064426dba3976b1 May 2012]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/1bd201e70d57464fd26bf9089ea4b44fd49e4f2d Sept 2012]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/1bd201e70d57464fd26bf9089ea4b44fd49e4f2d Sept 2012]
 +
|}
 +
 
 +
==== ZFS list snapshot property alias ====
 +
 
 +
Functionally identical to Solaris 11 extension <code>zfs list -t snap</code>.
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|not yet
 +
|-
 +
|'''FreeBSD'''
 +
|[http://svnweb.freebsd.org/changeset/base/256999 Oct 2013]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/cf81b00a73fe47fdb21586ac1cc179b734540973 Apr 2012]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/cf81b00a73fe47fdb21586ac1cc179b734540973 Apr 2012]
 +
|}
 +
 
 +
==== ZFS snapshot alias ====
 +
 
 +
Functionally identical to Solaris 11 extension <code>zfs snap</code>.
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|not yet
 +
|-
 +
|'''FreeBSD'''
 +
|[http://svnweb.freebsd.org/changeset/base/256999 Oct 2013]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/10b75496bb0cb7a7b8146c263164adc37f1d176a Apr 2012]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/10b75496bb0cb7a7b8146c263164adc37f1d176a Apr 2012]
 +
|}
 +
 
 +
==== <tt>zfs send</tt> Progress Reporting ====
 +
 
 +
OpenZFS introduces a <tt>-v</tt> option to <tt>zfs send</tt> which reports per-second information on how much data has been sent, how long it has taken, and how much data remains to be sent.
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|[https://github.com/illumos/illumos-gate/commit/4e3c9f4489a18514e5e8caeb91d4e6db07c98415 May 2012]
 +
|-
 +
|'''FreeBSD'''
 +
|[https://github.com/freebsd/freebsd/commit/046ff8962602e8d65b6b3fae48573513ab7e433f May 2012]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/37abac6d559a1da8ab8e5379442f491b73998f6a Sept 2012]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/37abac6d559a1da8ab8e5379442f491b73998f6a Sept 2012]
 +
|}
 +
 
 +
==== Arbitrary Snapshot Arguments to <tt>zfs snapshot</tt> ====
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|[https://github.com/illumos/illumos-gate/commit/4445fffbbb1ea25fd0e9ea68b9380dd7a6709025 June 2012]
 +
|-
 +
|'''FreeBSD'''
 +
|[https://github.com/freebsd/freebsd/commit/f1985e5cd725ff1816cd4afbdee2d95b661883f0 March 2013]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/6f1ffb06655008c9b519108ed29fbf03acd6e5de August 2013]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/6f1ffb06655008c9b519108ed29fbf03acd6e5de September 2013]
 +
|}
 +
 
 +
==== Native data and metadata encryption for zfs ====
 +
 
 +
 
 +
Provides the ability to encrypt, decrypt, and authenticate protected datasets. This feature also adds the ability to do raw, encrypted sends and receives. The idea here is to send raw encrypted and compressed data and receive it exactly as is on a backup system. This means that the dataset on the receiving system is protected using the same user key that is in use on the sending side. By doing so, datasets can be efficiently backed up to an untrusted system without fear of data being compromised.
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|[https://github.com/illumos/illumos-gate/commit/eb633035c80613ec93d62f90482837adaaf21a0a Jun 2019]
 +
|-
 +
|'''FreeBSD'''
 +
|[https://github.com/openzfs/zfs/commit/b52563034230b35f0562b6f40ad1a00f02bd9a05 OpenZFS v2]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/openzfs/zfs/commit/b52563034230b35f0562b6f40ad1a00f02bd9a05 Aug 2017]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/4644f6879971c34a47d4f8e8417c760640716125 Aug 2017]
 +
|}
 +
 
 +
==== ZFS Channel Programs ====
 +
 
 +
The ZFS channel program interface allows ZFS administrative operations to be run programmatically as a Lua script. The entire script is executed atomically, with no other administrative operations taking effect concurrently. A library of ZFS calls is made available to channel program scripts. Channel programs may only be run with root privileges.
 +
 
 +
See also [http://www.slideshare.net/MatthewAhrens/openzfs-channel-programs slides] and [https://www.youtube.com/watch?v=EGKek5sZ2Xw&list=PLaUVvul17xSdWMBt5tAC8Hu7bbeWskD_q video] from talk at [[OpenZFS Developer Summit 2013]], and [http://open-zfs.org/w/images/d/db/Channel_Programs-Chris_Siden.pdf slides] and [http://www.youtube.com/watch?v=RMTxyqcomPA&list=PLaUVvul17xSdOhJ-wDugoCAIPJZHloVoq&index=14 video] from the [[OpenZFS Developer Summit 2014]]
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|[https://github.com/illumos/illumos-gate/commit/dfc115332c94a2f62058ac7f2bce7631fbd20b3d Jun 2017]
 +
|-
 +
|'''FreeBSD'''
 +
|[https://github.com/openzfs/zfs/commit/d99a015343425a1c856c900aa8223016400ac2dc OpenZFS v2]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/openzfs/zfs/commit/d99a015343425a1c856c900aa8223016400ac2dc Feb 2018]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/ec9c12a0ff736c8e2dea1b9f482a07d0b974aa94 Oct 2018]
 +
|}
 +
 
 +
== Performance ==
 +
 
 +
These are significant performance improvements, often requiring substantial restructuring of the source code.
 +
 
 +
''Listed in chronological order (oldest first).''
 +
 
 +
==== SA based xattrs ====
 +
 
 +
Improves performance of linux-style (short) xattrs by storing them in the dnode_phys_t's bonus block.  (Not to be confused with [http://en.wikipedia.org/wiki/Extended_file_attributes#Solaris Solaris-style Extended Attributes] which are full-fledged files or "forks", like NTFS streams.  This work could be extended to also improve the performance on illumos of small Extended Attributes whose permissions are the same as the containing file.)
 +
 
 +
Requires a disk format change and is off by default until Filesystem (ZPL) Feature Flags are implemented (not to be confused with [[Features#Feature_Flags_Overview | zpool Feature Flags]]).
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|not yet (needs additional functionality)
 +
|-
 +
|'''FreeBSD'''
 +
|??
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/82a37189aac955c81a59a5ecc3400475adb56355 Oct 2011]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/6a06ef26abc87c6ede9bec8246713dc94c98fa78 May 2015]
 +
|}
 +
 
 +
Note that SA based xattrs are [https://github.com/zfsonlinux/zfs/commit/6a7c0ccca44ad02c476a111d8f7911fc8b12fff7 no longer used on symlinks] as of Aug 2013 until an issue is resolved.
 +
 
 +
==== Use the slog even with logbias=throughput ====
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|??
 +
|-
 +
|'''FreeBSD'''
 +
|??
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/5d7a86d114c2706a8d14d94b71f81ad5cdf066c5 Oct 2011]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/5d7a86d114c2706a8d14d94b71f81ad5cdf066c5 Oct 2011]
 +
|}
 +
 
 +
==== Asynchronous Filesystem and Volume Destruction ====
 +
 
 +
Destroying a filesystem requires traversing all of its data in order to return its used blocks to the pool's free list. Before this feature the filesystem was not fully removed until all blocks had been reclaimed. If the destroy operation was interrupted by a reboot or power outage the next attempt to import the pool (probably during boot) would need to complete the destroy operation synchronously, possibly delaying a boot for long periods of time.
 +
 
 +
With asynchronous destruction the filesystem's data is immediately moved to a "to be freed" list, allowing the destroy operation to complete without traversing any of the filesystem's data. A background process reclaims blocks from this "to be freed" list and is capable of resuming this process after reboots without slowing the pool import process.
 +
 
 +
The new freeing algorithm also has a significant performance improvement when destroying clones. The old algorithm took time proportional to the number of blocks ''referenced'' by the clone, even if most of those blocks could not be reclaimed because they were still referenced by the clone's origin. The new algorithm only takes time proportional to the number of blocks unique to the clone.
 +
 
 +
See this [http://blog.delphix.com/matt/2012/07/11/performance-of-zfs-destroy/ blog post] for more detailed performance analysis.
 +
 
 +
Note: The <tt>async_destroy</tt> feature flag must be enabled to take advantage of this.
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|[https://github.com/illumos/illumos-gate/commit/ad135b5d644628e791c3188a6ecbd9c257961ef8 May 2012]
 +
|-
 +
|'''FreeBSD'''
 +
|[https://github.com/freebsd/freebsd/commit/cc61ab2f133566dca51970d44cc49a4355039b5d June 2012]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/9ae529ec5dbdc828ff8326beae58062971d74b2e Jan 2013]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/9ae529ec5dbdc828ff8326beae58062971d74b2e Jan 2013]
 +
|}
 +
 
 +
==== Reduce Number of Empty bpobjs ====
 +
 
 +
Every time OpenZFS takes a snapshot it creates on-disk block pointer objects (bpobj's) to track blocks associated with that snapshot. In common use cases most of these bpobj's are empty, but the number of bpobjs per-snapshot is proportional to the number of snapshots already taken of the same filesystem or volume. When a single filesystem or volume has many (tens of thousands) snapshots these unecessary empty bpobjs can waste space and cause performance problems. OpenZFS waits to create each bpobjs until the first entry is added to it, thus eliminating the empty bpobjs.
 +
 
 +
Note: The <tt>empty_bpobj</tt> feature flag must be enabled to take advantage of this.
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|[https://github.com/illumos/illumos-gate/commit/f17457368189aa911f774c38c1f21875a568bdca Aug 2012]
 +
|-
 +
|'''FreeBSD'''
 +
|[https://github.com/freebsd/freebsd/commit/d9fa2f486ee98b2557e1d5ad5f1af418c663cfc8 Aug 2012]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/753c38392ddff9d3cf140bb4d28f3bfba52c92d2 Dec 2012]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/753c38392ddff9d3cf140bb4d28f3bfba52c92d2 Dec 2012]
 +
|}
 +
 
 +
==== Single Copy ARC ====
 +
 
 +
OpenZFS caches disk blocks in-memory in the adaptive replacement cache (ARC). Originally when the same disk block was accessed from different clones it was cached multiple times (one for each clone accessing the block) in case a clone planned to modify the block. With these changes OpenZFS caches at most one copy of every block unless a clone is actually modifying the block.
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|[https://github.com/illumos/illumos-gate/commit/9253d63df408bb48584e0b1abfcc24ef2472382e Sep 2012]
 +
|-
 +
|'''FreeBSD'''
 +
|[https://github.com/freebsd/freebsd/commit/8e1fdec2c76a948b16ebf8e4abe2cb73a60d3477 Nov 2012]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/1eb5bfa3dcdaecb19543d9df13131374a7a42947 Dec 2012]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/1eb5bfa3dcdaecb19543d9df13131374a7a42947 Dec 2012]
 +
|}
 +
 
 +
==== TRIM Support ====
 +
 
 +
TRIM support provides the ability to pass deletes / frees through to underlying vdevs that help to ensure devices such as SSD's, which rely on receiving TRIM / UNMAP requests for sectors which are no longer needed, maintain optimal performance.
 +
 
 +
Two modes of TRIM/UNMAP were added: manual and automatic. Manual TRIM through the `zpool trim` command does on-demand TRIMing. Automatic TRIM can be enabled to perform a periodic background TRIM.
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|not yet ported
 +
|-
 +
|'''FreeBSD'''
 +
|[https://github.com/openzfs/zfs/commit/1b939560be5c51deecf875af9dada9d094633bf7 OpenZFS v2]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/openzfs/zfs/commit/1b939560be5c51deecf875af9dada9d094633bf7 Mar 2019]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/38a6bed7c2cb9a59a9bd01119e1cf1f0a4bed2ec Mar 2019]
 +
|}
 +
 
 +
==== FASTWRITE Algorithm ====
 +
 
 +
Improves synchronous IO performance.
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|not yet ported
 +
|-
 +
|'''FreeBSD'''
 +
|not yet ported
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/920dd524fb2997225d4b1ac180bcbc14b045fda6 Oct 2012]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/920dd524fb2997225d4b1ac180bcbc14b045fda6 Oct 2012]
 +
|}
 +
 
 +
Note that a [https://github.com/ryao/zfs/commit/858822a04b4563657b2267131e90d9687d67e31b locking enhancement] is being reviewed.
 +
 
 +
==== Block Freeing Performance Improvments ====
 +
 
 +
Performance analysis of OpenZFS revealed that the algorithms used when freeing blocks could cause significant performance problems when freeing a large amount of blocks in a single transaction or when dealing with fragmented pools. Several performance improvements were made in this area.
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|[https://github.com/illumos/illumos-gate/commit/01f55e48fb4d524eaf70687728aa51b7762e2e97 Nov 2012]
 +
|[https://github.com/illumos/illumos-gate/commit/16a4a8074274d2d7cc408589cf6359f4a378c861 Feb 2013]
 +
|[https://github.com/illumos/illumos-gate/commit/9eb57f7f3fbb970d4b9b89dcd5ecf543fe2414d5 Feb 2013]
 +
|-
 +
|'''FreeBSD'''
 +
|[https://github.com/freebsd/freebsd/commit/5d9b3f284b13ac492326e05f6ba4c00e98adf05c Nov 2012]
 +
|[https://github.com/freebsd/freebsd/commit/18e9a0422b52091035dae9d69bde9dd571a8ff7e Feb 2013]
 +
|[https://github.com/freebsd/freebsd/commit/18e9a0422b52091035dae9d69bde9dd571a8ff7e Feb 2013]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/55d85d5a8c45c4559a4a0e675c37b0c3afb19c2f May 2013]
 +
|[https://github.com/zfsonlinux/zfs/commit/e51be06697762215dc3b679f8668987034a5a048 June 2013]
 +
|[https://github.com/zfsonlinux/zfs/commit/c2e42f9d53bec422abb71efade2c004383345038 Oct 2013]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/55d85d5a8c45c4559a4a0e675c37b0c3afb19c2f May 2013]
 +
|[https://github.com/openzfsonosx/zfs/commit/e51be06697762215dc3b679f8668987034a5a048 June 2013]
 +
|[https://github.com/openzfsonosx/zfs/commit/c2e42f9d53bec422abb71efade2c004383345038 Oct 2013]
 +
|}
 +
 
 +
==== nop-write ====
 +
 
 +
ZFS supports end-to-end checksumming of every data block. When a cryptographically secure checksum is being used (and compression is enabled) OpenZFS will compare the checksums of incoming writes to checksum of the existing on-disk data and avoid issuing any write i/o for data that has not changed. This can help performance and snapshot space usage in situations were the same files are regularly overwritten with almost-identical data (e.g. regular full-backups of large random-access files).
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|[https://github.com/illumos/illumos-gate/commit/80901aea8e78a2c20751f61f01bebd1d5b5c2ba5 Nov 2012]
 +
|-
 +
|'''FreeBSD'''
 +
|[https://github.com/freebsd/freebsd/commit/3a0bfecf052237768517963f169d0797e2978f59 Nov 2012]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/03c6040bee6c87a9413b7da41d9f580f79a8ab62 Nov 2013]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/03c6040bee6c87a9413b7da41d9f580f79a8ab62 Nov 2013]
 +
|}
 +
 
 +
==== lz4 compression ====
 +
 
 +
OpenZFS supports on-the-fly compression of all user data with a variety of compression algorithm. This feature adds support for the lz4 compression algorithm. lz4 is usually faster and compresses data better than lzjb, the old default OpenZFS compression algorithm.
 +
 
 +
Note: The <tt>lz4_compress</tt> feature flag must be enabled to take advantage of this.
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|[https://github.com/illumos/illumos-gate/commit/a6f561b4aee75d0d028e7b36b151c8ed8a86bc76 Jan 2013]
 +
|-
 +
|'''FreeBSD'''
 +
|[https://github.com/freebsd/freebsd/commit/c6d9dc1ad2d2e36220845b84a2d180bd97354797 Feb 2013]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/9759c60f1a1503e48dc5c45a209c3edd5758319f Jan 2013]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/9759c60f1a1503e48dc5c45a209c3edd5758319f Jan 2013]
 +
|}
 +
 
 +
==== synctask rewrite ====
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|[https://github.com/illumos/illumos-gate/commit/3b2aab18808792cbd248a12f1edf139b89833c13 Feb 2013]
 +
|-
 +
|'''FreeBSD'''
 +
|[https://github.com/freebsd/freebsd/commit/f1985e5cd725ff1816cd4afbdee2d95b661883f0 March 2013]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/13fe019870c8779bf2f5b3ff731b512cf89133ef Sept 2013]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/13fe019870c8779bf2f5b3ff731b512cf89133ef Sept 2013]
 +
|}
 +
 
 +
==== l2arc compression ====
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|[https://github.com/illumos/illumos-gate/commit/aad02571bc59671aa3103bb070ae365f531b0b62 Jun 2013]
 +
|-
 +
|'''FreeBSD'''
 +
|[https://github.com/freebsd/freebsd/commit/b20363fe596a7e3a55d5b62f4d3fdb482c65c47a Jun 2013]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/3a17a7a99a1a6332d0999f9be68e2b8dc3933de1 Aug 2013]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/3a17a7a99a1a6332d0999f9be68e2b8dc3933de1 Aug 2013]
 +
|}
 +
 
 +
==== ARC Shouldn't Cache Freed Blocks ====
 +
 
 +
Originally cached blocks in the ARC remained cached until they were evicted due to memory pressure, even if the underlying disk block was freed. In some workloads these freed blocks were so frequently accessed before they were freed that the ARC continued to cache them while evicting blocks which had not been freed yet. Since freed blocks could never be accessed again continuing to cache them was unnecessary. In OpenZFS ARC blocks are evicted immediately when their underlying data blocks are freed.
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|[https://github.com/illumos/illumos-gate/commit/6e6d5868f52089b9026785bd90257a3d3f6e5ee2 Jun 2013]
 +
|-
 +
|'''FreeBSD'''
 +
|[https://github.com/freebsd/freebsd/commit/13d822743426263ee50bcf047ab41a1e386156a8 Jun 2013]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/df4474f92d0b1b8d54e1914fdd56be2b75f1ff5e Jun 2013]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/df4474f92d0b1b8d54e1914fdd56be2b75f1ff5e Jun 2013]
 +
|}
 +
 
 +
==== Improve N-way mirror read performance ====
 +
 
 +
Queues read requests to least busy leaf vdev in mirrors.
 +
 
 +
In addition to the vdev load biasing first implemented by ZFS on Linux in July 2013, the FreeBSD October 2013 version added I/O locality and device rotational information to further enhance the performance.
 +
 
 +
{| class="wikitable"
 +
!OS
 +
!Load
 +
!Load + I/O Locality & Rotational Information
 +
|-
 +
|'''illumos'''
 +
|not yet ported
 +
|not yet ported
 +
|-
 +
|'''FreeBSD'''
 +
|N/A
 +
|[http://svnweb.freebsd.org/changeset/base/256956 23rd October 2013]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/556011dbec2d10579819078559a77630fc559112 Jul 2013]
 +
|[https://github.com/zfsonlinux/zfs/commit/9f500936c82137ef3a57c53013894f622dcec14e Feb 26, 2016]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/556011dbec2d10579819078559a77630fc559112 Jul 2013]
 +
|not yet ported
 +
|}
 +
 
 +
==== Smoother Write Throttle ====
 +
 
 +
The write throttle (dsl_pool_tempreserve_space() and txg_constrain_throughput()) is rewritten to produce much more consistent delays when under constant load. The new write throttle is based on the amount of dirty data, rather than guesses about future performance of the system. When there is a lot of dirty data, each transaction (e.g. write() syscall) will be delayed by the same small amount. This eliminates the "brick wall of wait" that the old write throttle could hit, causing all transactions to wait several seconds until the next txg opens. One of the keys to the new write throttle is decrementing the amount of dirty data as i/o completes, rather than at the end of spa_sync(). Note that the write throttle is only applied once the i/o scheduler is issuing the maximum number of outstanding async writes. See the block comments in dsl_pool.c and above dmu_tx_delay() for more details.
 +
 
 +
The ZFS i/o scheduler (vdev_queue.c) now divides i/os into 5 classes: sync read, sync write, async read, async write, and scrub/resilver. The scheduler issues a number of concurrent i/os from each class to the device. Once a class has been selected, an i/o is selected from this class using either an elevator algorithem (async, scrub classes) or FIFO (sync classes). The number of concurrent async write i/os is tuned dynamically based on i/o load, to achieve good sync i/o latency when there is not a high load of writes, and good write throughput when there is. See the block comment in vdev_queue.c for more details.
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|[https://github.com/illumos/illumos-gate/commit/69962b5647e4a8b9b14998733b765925381b727e Aug 2013]
 +
|-
 +
|'''FreeBSD'''
 +
|[http://svnweb.freebsd.org/changeset/base/258632 Nov 2013]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/e8b96c6007bf97cdf34869c1ffbd0ce753873a3d Dec 2013]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/e8b96c6007bf97cdf34869c1ffbd0ce753873a3d Mar 2014]
 +
|}
 +
 
 +
==== Disable LBA Weighting on files and SSDs ====
 +
 
 +
On rotational media, the bandwidth of the outermost tracks is approximately twice that of innermost tracks. A heuristic called LBA weighting was put into the metaslab allocator to account for this by favoring the outermost tracks over the innermost tracks. This has the consequence that metaslabs tend to fill at different rates depending on their location. This causes the metaslabs corresponding to outermost tracks to enter the best-fit allocation strategy.
 +
 
 +
The best-fit allocation strategy is more CPU intensive than the typical first-fit because it looks for the smallest region of free space able to fulfill an allocation rather than picking the next avaliable one. The CPU time is fairly excessive and is known to harm IOPS, but it exists to minimize use of gang blocks as a metaslab becomes excessively full. Gaining a bandwidth improvement from LBA weighting at the expense of an earlier switch to the best-fit allocation behavior on the weighted metaslabs is reasonable on rotational disks. However, it makes no sense on files, where the underlying filesystem is free to place things however way it sees fit, and on SSDs, where there is no bandwidth difference based on LBA.
 +
 
 +
With this change, we will more evenly fill metaslabs on pools whose vdevs consist of only files and SSDs, which will minimize the metaslabs that enter the best fit allocation strategy when a pool is mostly full, but still below 96% full. This is particularly important on SSDs, where drops in IOPS are more pronounced.
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|not yet
 +
|-
 +
|'''FreeBSD'''
 +
|not yet
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/zfsonlinux/zfs/commit/fb40095f5f0853946f8150481ca22602d1334dfe Aug 2015]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/fb40095f5f0853946f8150481ca22602d1334dfe Sep 2015]
 +
|}
 +
 
 +
==== Sequential scrub and resilvers ====
 +
 
 +
Improves performance by splitting scrubs and resilvers into a metadata scanning phase and an IO issuing phase. The metadata scan reads through the structure of the pool and gathers an in-memory queue of I/Os, sorted by size and offset on disk. The issuing phase will then issue the scrub I/Os as sequentially as possible, greatly improving performance.
 +
 
 +
Saso Kiselkov of Nexenta gave a talk on [[Scrub/Resilver Performance]] at the [[OpenZFS Developer Summit 2016]] (September 2016): [https://youtu.be/SZFwv8BdBj4 Video], [https://drive.google.com/file/d/0B5hUzsxe4cdmVU91cml1N0pKYTQ/view?usp=sharing Slides]
 +
 
 +
{| class="wikitable"
 +
|-
 +
|'''illumos'''
 +
|not yet
 +
|-
 +
|'''FreeBSD'''
 +
|[https://github.com/openzfs/zfs/commit/d4a72f23863382bdf6d0ae33196f5b5decbc48fd OpenZFS v2]
 +
|-
 +
|'''ZFS on Linux'''
 +
|[https://github.com/openzfs/zfs/commit/d4a72f23863382bdf6d0ae33196f5b5decbc48fd Nov 2017]
 +
|-
 +
|'''OpenZFS on OS X'''
 +
|[https://github.com/openzfsonosx/zfs/commit/a32f4595232540b80a8069cc7e0b6f5cdb7e6ef7 Dec 2018]
 +
|}
 +
 
 +
== Dataset Properties ==
 +
 
 +
These are new filesystem, volume, and snapshot properties which can be accessed with the <tt>zfs(1)</tt> command's <tt>get</tt> subcommand. See the <tt>zfs(1)</tt> manpage for your distribution for more details on each of these properties.
 +
 
 +
{| class="wikitable"
 +
!Property
 +
!Description
 +
!illumos
 +
!FreeBSD
 +
!ZFS on Linux
 +
!OpenZFS on OS X
 +
|-
 +
|<tt>refcompressratio</tt>
 +
|The compression ratio acheived for all data referenced by (but not necessarily unique to) a snapshot, filesystem, or volume, expressed as a multiplier.
 +
|[https://github.com/illumos/illumos-gate/commit/187d6ac08adc31ea6868bde0cfbbb288826254e8 Jun 2011]
 +
|[https://github.com/freebsd/freebsd/commit/333e34b938a2b5e7b036e02b36408880b415109c Jun 2011]
 +
|[https://github.com/zfsonlinux/zfs/commit/77999e804fff35782ab4b578d2cecf064c54a841 Aug 2012]
 +
|[https://github.com/openzfsonosx/zfs/commit/77999e804fff35782ab4b578d2cecf064c54a841 Aug 2012]
 +
|-
 +
|<tt>clones</tt>
 +
|For snapshots, this property is a comma-separated list of filesystems or volumes which are clones of this snapshot.
 +
|[https://github.com/illumos/illumos-gate/commit/e5351341b58845eee9d722bd71543d5a7c26b6cc Nov 2011]
 +
|[https://github.com/freebsd/freebsd/commit/73cab7197174153a718dedc97ff344341bcf6098 Nov 2011]
 +
|[https://github.com/zfsonlinux/zfs/commit/330d06f90d143b41b276796526a66a1c1fff046d Jul 2012]
 +
|[https://github.com/openzfsonosx/zfs/commit/330d06f90d143b41b276796526a66a1c1fff046d Jul 2012]
 +
|-
 +
|<tt>written</tt>
 +
|The amount of referenced space written to this dataset since the previous snapshot.
 +
|[https://github.com/illumos/illumos-gate/commit/e5351341b58845eee9d722bd71543d5a7c26b6cc Nov 2011]
 +
|[https://github.com/freebsd/freebsd/commit/73cab7197174153a718dedc97ff344341bcf6098 Nov 2011]
 +
|[https://github.com/zfsonlinux/zfs/commit/330d06f90d143b41b276796526a66a1c1fff046d Jul 2012]
 +
|[https://github.com/openzfsonosx/zfs/commit/330d06f90d143b41b276796526a66a1c1fff046d Jul 2012]
 
|-
 
|-
|Feature A
+
|<tt>written@''<snap>''</tt>
|Yes
+
|The amount of referenced space written to this dataset since the specified snapshot. This is the space referenced by this dataset, but not referenced by the specified snapshot.
|Yes
+
|[https://github.com/illumos/illumos-gate/commit/e5351341b58845eee9d722bd71543d5a7c26b6cc Nov 2011]
 +
|[https://github.com/freebsd/freebsd/commit/73cab7197174153a718dedc97ff344341bcf6098 Nov 2011]
 +
|[https://github.com/zfsonlinux/zfs/commit/330d06f90d143b41b276796526a66a1c1fff046d Jul 2012]
 +
|[https://github.com/openzfsonosx/zfs/commit/330d06f90d143b41b276796526a66a1c1fff046d Jul 2012]
 
|-
 
|-
|Feature B
+
|<tt>logicalused</tt>, <tt>logicalreferenced</tt>
|Yes
+
|The amount of space used or referenced, before taking into account compression.
|No
+
|[https://github.com/illumos/illumos-gate/commit/77372cb0f35e8d3615ca2e16044f033397e88e21 Feb 2013]
 +
|[https://github.com/freebsd/freebsd/commit/2bed8f5691b7572d6f31bfc68e9f762a938af863 Mar 2013]
 +
|[https://github.com/zfsonlinux/zfs/commit/24a64651b4163d47b1187821152d762e9a263d5a Oct 2013]
 +
|[https://github.com/openzfsonosx/zfs/commit/24a64651b4163d47b1187821152d762e9a263d5a Nov 2013]
 
|-
 
|-
|Feature C
 
|Yes
 
|No
 
 
|}
 
|}

Latest revision as of 21:43, 7 October 2020

This page describes some of the more important features and performance improvements that are part of OpenZFS.

Help would be appreciated in porting features between platforms whose status is "not yet".

Feature Flags

See the Feature Flags wiki page.

libzfs_core

See this blog post (Jan 2012) and associated slides and video for more details.

First introduced in:

illumos June 2012
FreeBSD March 2013
ZFS on Linux August 2013
OpenZFS on OS X October 2013

CLI Usability

These are improvements to the command line interface. While the end result is a generally more friendly user interface, getting the desired behavior often required modifications to the core of ZFS.

Listed in chronological order (oldest first).

Pool Comment

OpenZFS has a per-pool comment property which can be set with the zpool set command and can be read even if the pool is not imported, so it is accessible even if pool import fails.

illumos Nov 2011
FreeBSD Nov 2011
ZFS on Linux Aug 2012
OpenZFS on OS X Aug 2012

Size Estimates for zfs send and zfs destroy

This feature enhances OpenZFS's internal space accounting information. This new accounting information is used to provide a -n (dry-run) option for zfs send which can instantly calculate the amount of send stream data a specific zfs send command would generate. It is also used for a -n option for zfs destroy which can instantly calculate the amount of space that would be reclaimed by a specific zfs destroy command.

illumos Nov 2011
FreeBSD Nov 2011
ZFS on Linux Jul 2012
OpenZFS on OS X Jul 2012

vdev Information in zpool list

OpenZFS adds a -v option to the zpool list command which shows detailed sizing information about the vdevs in the pool:

$ zpool list -v
NAME          SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
dcenter      5.24T  3.85T  1.39T         -    73%  1.00x  ONLINE  -
  mirror      556G   469G  86.7G         -
    c2t1d0       -      -      -         -
    c2t0d0       -      -      -         -
  mirror      556G   493G  63.0G         -
    c2t3d0       -      -      -         -
    c2t2d0       -      -      -         -
  mirror      556G   493G  62.7G         -
    c2t5d0       -      -      -         -
    c2t4d0       -      -      -         -
  mirror      556G   494G  62.5G         -
    c2t8d0       -      -      -         -
    c2t6d0       -      -      -         -
  mirror      556G   494G  62.2G         -
    c2t10d0      -      -      -         -
    c2t9d0       -      -      -         -
  mirror      556G   494G  61.9G         -
    c2t12d0      -      -      -         -
    c2t11d0      -      -      -         -
  mirror     1016G   507G   509G         -
    c1t1d0       -      -      -         -
    c1t5d0       -      -      -         -
  mirror     1016G   496G   520G         -
    c1t3d0       -      -      -         -
    c1t4d0       -      -      -         -
illumos Jan 2012
FreeBSD May 2012
ZFS on Linux Sept 2012
OpenZFS on OS X Sept 2012

ZFS list snapshot property alias

Functionally identical to Solaris 11 extension zfs list -t snap.

illumos not yet
FreeBSD Oct 2013
ZFS on Linux Apr 2012
OpenZFS on OS X Apr 2012

ZFS snapshot alias

Functionally identical to Solaris 11 extension zfs snap.

illumos not yet
FreeBSD Oct 2013
ZFS on Linux Apr 2012
OpenZFS on OS X Apr 2012

zfs send Progress Reporting

OpenZFS introduces a -v option to zfs send which reports per-second information on how much data has been sent, how long it has taken, and how much data remains to be sent.

illumos May 2012
FreeBSD May 2012
ZFS on Linux Sept 2012
OpenZFS on OS X Sept 2012

Arbitrary Snapshot Arguments to zfs snapshot

illumos June 2012
FreeBSD March 2013
ZFS on Linux August 2013
OpenZFS on OS X September 2013

Native data and metadata encryption for zfs

Provides the ability to encrypt, decrypt, and authenticate protected datasets. This feature also adds the ability to do raw, encrypted sends and receives. The idea here is to send raw encrypted and compressed data and receive it exactly as is on a backup system. This means that the dataset on the receiving system is protected using the same user key that is in use on the sending side. By doing so, datasets can be efficiently backed up to an untrusted system without fear of data being compromised.

illumos Jun 2019
FreeBSD OpenZFS v2
ZFS on Linux Aug 2017
OpenZFS on OS X Aug 2017

ZFS Channel Programs

The ZFS channel program interface allows ZFS administrative operations to be run programmatically as a Lua script. The entire script is executed atomically, with no other administrative operations taking effect concurrently. A library of ZFS calls is made available to channel program scripts. Channel programs may only be run with root privileges.

See also slides and video from talk at OpenZFS Developer Summit 2013, and slides and video from the OpenZFS Developer Summit 2014

illumos Jun 2017
FreeBSD OpenZFS v2
ZFS on Linux Feb 2018
OpenZFS on OS X Oct 2018

Performance

These are significant performance improvements, often requiring substantial restructuring of the source code.

Listed in chronological order (oldest first).

SA based xattrs

Improves performance of linux-style (short) xattrs by storing them in the dnode_phys_t's bonus block. (Not to be confused with Solaris-style Extended Attributes which are full-fledged files or "forks", like NTFS streams. This work could be extended to also improve the performance on illumos of small Extended Attributes whose permissions are the same as the containing file.)

Requires a disk format change and is off by default until Filesystem (ZPL) Feature Flags are implemented (not to be confused with zpool Feature Flags).

illumos not yet (needs additional functionality)
FreeBSD ??
ZFS on Linux Oct 2011
OpenZFS on OS X May 2015

Note that SA based xattrs are no longer used on symlinks as of Aug 2013 until an issue is resolved.

Use the slog even with logbias=throughput

illumos ??
FreeBSD ??
ZFS on Linux Oct 2011
OpenZFS on OS X Oct 2011

Asynchronous Filesystem and Volume Destruction

Destroying a filesystem requires traversing all of its data in order to return its used blocks to the pool's free list. Before this feature the filesystem was not fully removed until all blocks had been reclaimed. If the destroy operation was interrupted by a reboot or power outage the next attempt to import the pool (probably during boot) would need to complete the destroy operation synchronously, possibly delaying a boot for long periods of time.

With asynchronous destruction the filesystem's data is immediately moved to a "to be freed" list, allowing the destroy operation to complete without traversing any of the filesystem's data. A background process reclaims blocks from this "to be freed" list and is capable of resuming this process after reboots without slowing the pool import process.

The new freeing algorithm also has a significant performance improvement when destroying clones. The old algorithm took time proportional to the number of blocks referenced by the clone, even if most of those blocks could not be reclaimed because they were still referenced by the clone's origin. The new algorithm only takes time proportional to the number of blocks unique to the clone.

See this blog post for more detailed performance analysis.

Note: The async_destroy feature flag must be enabled to take advantage of this.

illumos May 2012
FreeBSD June 2012
ZFS on Linux Jan 2013
OpenZFS on OS X Jan 2013

Reduce Number of Empty bpobjs

Every time OpenZFS takes a snapshot it creates on-disk block pointer objects (bpobj's) to track blocks associated with that snapshot. In common use cases most of these bpobj's are empty, but the number of bpobjs per-snapshot is proportional to the number of snapshots already taken of the same filesystem or volume. When a single filesystem or volume has many (tens of thousands) snapshots these unecessary empty bpobjs can waste space and cause performance problems. OpenZFS waits to create each bpobjs until the first entry is added to it, thus eliminating the empty bpobjs.

Note: The empty_bpobj feature flag must be enabled to take advantage of this.

illumos Aug 2012
FreeBSD Aug 2012
ZFS on Linux Dec 2012
OpenZFS on OS X Dec 2012

Single Copy ARC

OpenZFS caches disk blocks in-memory in the adaptive replacement cache (ARC). Originally when the same disk block was accessed from different clones it was cached multiple times (one for each clone accessing the block) in case a clone planned to modify the block. With these changes OpenZFS caches at most one copy of every block unless a clone is actually modifying the block.

illumos Sep 2012
FreeBSD Nov 2012
ZFS on Linux Dec 2012
OpenZFS on OS X Dec 2012

TRIM Support

TRIM support provides the ability to pass deletes / frees through to underlying vdevs that help to ensure devices such as SSD's, which rely on receiving TRIM / UNMAP requests for sectors which are no longer needed, maintain optimal performance.

Two modes of TRIM/UNMAP were added: manual and automatic. Manual TRIM through the `zpool trim` command does on-demand TRIMing. Automatic TRIM can be enabled to perform a periodic background TRIM.

illumos not yet ported
FreeBSD OpenZFS v2
ZFS on Linux Mar 2019
OpenZFS on OS X Mar 2019

FASTWRITE Algorithm

Improves synchronous IO performance.

illumos not yet ported
FreeBSD not yet ported
ZFS on Linux Oct 2012
OpenZFS on OS X Oct 2012

Note that a locking enhancement is being reviewed.

Block Freeing Performance Improvments

Performance analysis of OpenZFS revealed that the algorithms used when freeing blocks could cause significant performance problems when freeing a large amount of blocks in a single transaction or when dealing with fragmented pools. Several performance improvements were made in this area.

illumos Nov 2012 Feb 2013 Feb 2013
FreeBSD Nov 2012 Feb 2013 Feb 2013
ZFS on Linux May 2013 June 2013 Oct 2013
OpenZFS on OS X May 2013 June 2013 Oct 2013

nop-write

ZFS supports end-to-end checksumming of every data block. When a cryptographically secure checksum is being used (and compression is enabled) OpenZFS will compare the checksums of incoming writes to checksum of the existing on-disk data and avoid issuing any write i/o for data that has not changed. This can help performance and snapshot space usage in situations were the same files are regularly overwritten with almost-identical data (e.g. regular full-backups of large random-access files).

illumos Nov 2012
FreeBSD Nov 2012
ZFS on Linux Nov 2013
OpenZFS on OS X Nov 2013

lz4 compression

OpenZFS supports on-the-fly compression of all user data with a variety of compression algorithm. This feature adds support for the lz4 compression algorithm. lz4 is usually faster and compresses data better than lzjb, the old default OpenZFS compression algorithm.

Note: The lz4_compress feature flag must be enabled to take advantage of this.

illumos Jan 2013
FreeBSD Feb 2013
ZFS on Linux Jan 2013
OpenZFS on OS X Jan 2013

synctask rewrite

illumos Feb 2013
FreeBSD March 2013
ZFS on Linux Sept 2013
OpenZFS on OS X Sept 2013

l2arc compression

illumos Jun 2013
FreeBSD Jun 2013
ZFS on Linux Aug 2013
OpenZFS on OS X Aug 2013

ARC Shouldn't Cache Freed Blocks

Originally cached blocks in the ARC remained cached until they were evicted due to memory pressure, even if the underlying disk block was freed. In some workloads these freed blocks were so frequently accessed before they were freed that the ARC continued to cache them while evicting blocks which had not been freed yet. Since freed blocks could never be accessed again continuing to cache them was unnecessary. In OpenZFS ARC blocks are evicted immediately when their underlying data blocks are freed.

illumos Jun 2013
FreeBSD Jun 2013
ZFS on Linux Jun 2013
OpenZFS on OS X Jun 2013

Improve N-way mirror read performance

Queues read requests to least busy leaf vdev in mirrors.

In addition to the vdev load biasing first implemented by ZFS on Linux in July 2013, the FreeBSD October 2013 version added I/O locality and device rotational information to further enhance the performance.

OS Load Load + I/O Locality & Rotational Information
illumos not yet ported not yet ported
FreeBSD N/A 23rd October 2013
ZFS on Linux Jul 2013 Feb 26, 2016
OpenZFS on OS X Jul 2013 not yet ported

Smoother Write Throttle

The write throttle (dsl_pool_tempreserve_space() and txg_constrain_throughput()) is rewritten to produce much more consistent delays when under constant load. The new write throttle is based on the amount of dirty data, rather than guesses about future performance of the system. When there is a lot of dirty data, each transaction (e.g. write() syscall) will be delayed by the same small amount. This eliminates the "brick wall of wait" that the old write throttle could hit, causing all transactions to wait several seconds until the next txg opens. One of the keys to the new write throttle is decrementing the amount of dirty data as i/o completes, rather than at the end of spa_sync(). Note that the write throttle is only applied once the i/o scheduler is issuing the maximum number of outstanding async writes. See the block comments in dsl_pool.c and above dmu_tx_delay() for more details.

The ZFS i/o scheduler (vdev_queue.c) now divides i/os into 5 classes: sync read, sync write, async read, async write, and scrub/resilver. The scheduler issues a number of concurrent i/os from each class to the device. Once a class has been selected, an i/o is selected from this class using either an elevator algorithem (async, scrub classes) or FIFO (sync classes). The number of concurrent async write i/os is tuned dynamically based on i/o load, to achieve good sync i/o latency when there is not a high load of writes, and good write throughput when there is. See the block comment in vdev_queue.c for more details.

illumos Aug 2013
FreeBSD Nov 2013
ZFS on Linux Dec 2013
OpenZFS on OS X Mar 2014

Disable LBA Weighting on files and SSDs

On rotational media, the bandwidth of the outermost tracks is approximately twice that of innermost tracks. A heuristic called LBA weighting was put into the metaslab allocator to account for this by favoring the outermost tracks over the innermost tracks. This has the consequence that metaslabs tend to fill at different rates depending on their location. This causes the metaslabs corresponding to outermost tracks to enter the best-fit allocation strategy.

The best-fit allocation strategy is more CPU intensive than the typical first-fit because it looks for the smallest region of free space able to fulfill an allocation rather than picking the next avaliable one. The CPU time is fairly excessive and is known to harm IOPS, but it exists to minimize use of gang blocks as a metaslab becomes excessively full. Gaining a bandwidth improvement from LBA weighting at the expense of an earlier switch to the best-fit allocation behavior on the weighted metaslabs is reasonable on rotational disks. However, it makes no sense on files, where the underlying filesystem is free to place things however way it sees fit, and on SSDs, where there is no bandwidth difference based on LBA.

With this change, we will more evenly fill metaslabs on pools whose vdevs consist of only files and SSDs, which will minimize the metaslabs that enter the best fit allocation strategy when a pool is mostly full, but still below 96% full. This is particularly important on SSDs, where drops in IOPS are more pronounced.

illumos not yet
FreeBSD not yet
ZFS on Linux Aug 2015
OpenZFS on OS X Sep 2015

Sequential scrub and resilvers

Improves performance by splitting scrubs and resilvers into a metadata scanning phase and an IO issuing phase. The metadata scan reads through the structure of the pool and gathers an in-memory queue of I/Os, sorted by size and offset on disk. The issuing phase will then issue the scrub I/Os as sequentially as possible, greatly improving performance.

Saso Kiselkov of Nexenta gave a talk on Scrub/Resilver Performance at the OpenZFS Developer Summit 2016 (September 2016): Video, Slides

illumos not yet
FreeBSD OpenZFS v2
ZFS on Linux Nov 2017
OpenZFS on OS X Dec 2018

Dataset Properties

These are new filesystem, volume, and snapshot properties which can be accessed with the zfs(1) command's get subcommand. See the zfs(1) manpage for your distribution for more details on each of these properties.

Property Description illumos FreeBSD ZFS on Linux OpenZFS on OS X
refcompressratio The compression ratio acheived for all data referenced by (but not necessarily unique to) a snapshot, filesystem, or volume, expressed as a multiplier. Jun 2011 Jun 2011 Aug 2012 Aug 2012
clones For snapshots, this property is a comma-separated list of filesystems or volumes which are clones of this snapshot. Nov 2011 Nov 2011 Jul 2012 Jul 2012
written The amount of referenced space written to this dataset since the previous snapshot. Nov 2011 Nov 2011 Jul 2012 Jul 2012
written@<snap> The amount of referenced space written to this dataset since the specified snapshot. This is the space referenced by this dataset, but not referenced by the specified snapshot. Nov 2011 Nov 2011 Jul 2012 Jul 2012
logicalused, logicalreferenced The amount of space used or referenced, before taking into account compression. Feb 2013 Mar 2013 Oct 2013 Nov 2013