Difference between revisions of "Features"

Revision as of 07:52, 15 November 2013

This page describes some of the more important features and performance improvements that are part of OpenZFS.

Help would be appreciated in porting features between platforms whose status is "not yet".

Feature Flags

Originally the ZFS on-disk format was versioned with a single number which was increased whenever a new on-disk format change was introduced. This worked well when a single entity controlled the development of ZFS; however, in the more distributed development model of OpenZFS a single version number is not ideal. Every OpenZFS implementation would need to agree on every change to the on-disk format.

One of the first OpenZFS projects was a new versioning system called "feature flags" which tags on-disk format changes with unique names. The system supports both completely independent format changes, as well as format changes that depend on each other. A pool's on-disk format is portable between OpenZFS implementations as long as all of the feature flags in use by the pool are supported by both implementations.

All OpenZFS implementations now support feature flags and regularly port features between them to remain compatible.

	illumos	FreeBSD	ZFS on Linux	ZFS-OSX
async_destroy	Y	Y	Y	Y
empty_bpobj	Y	Y	Y	Y
lz4_compress	Y	Y	Y	Y
spacemap_histogram	Y
extensible_dataset	Y

For more details see these slides (Jan 2012) and zpool-features(5).

libzfs_core

See this blog post (Jan 2012) and associated slides and video for more details.

First introduced in:

illumos	June 2012
FreeBSD	March 2013
ZFS on Linux	August 2013
ZFS-OSX	October 2013

CLI Usability

These are improvements to the command line interface. While the end result is a generally more friendly user interface, getting the desired behavior often required modifications to the core of ZFS.

Listed in chronological order (oldest first).

Pool Comment

OpenZFS has a per-pool comment property which can be set with the zpool set command and can be read even if the pool is not imported, so it is accessible even if pool import fails.

illumos	Nov 2011
FreeBSD	Nov 2011
ZFS on Linux	Aug 2012
ZFS-OSX	Aug 2012

Size Estimates for `zfs send` and `zfs destroy`

This feature enhances OpenZFS's internal space accounting information. This new accounting information is used to provide a -n (dry-run) option for zfs send which can instantly calculate the amount of send stream data a specific zfs send command would generate. It is also used for a -n option for zfs destroy which can instantly calculate the amount of space that would be reclaimed by a specific zfs destroy command.

illumos	Nov 2011
FreeBSD	Nov 2011
ZFS on Linux	Jul 2012
ZFS-OSX	Jul 2012

vdev Information in `zpool list`

OpenZFS adds a -v option to the zpool list command which shows detailed sizing information about the vdevs in the pool:

$ zpool list -v
NAME          SIZE  ALLOC   FREE  EXPANDSZ    CAP  DEDUP  HEALTH  ALTROOT
dcenter      5.24T  3.85T  1.39T         -    73%  1.00x  ONLINE  -
  mirror      556G   469G  86.7G         -
    c2t1d0       -      -      -         -
    c2t0d0       -      -      -         -
  mirror      556G   493G  63.0G         -
    c2t3d0       -      -      -         -
    c2t2d0       -      -      -         -
  mirror      556G   493G  62.7G         -
    c2t5d0       -      -      -         -
    c2t4d0       -      -      -         -
  mirror      556G   494G  62.5G         -
    c2t8d0       -      -      -         -
    c2t6d0       -      -      -         -
  mirror      556G   494G  62.2G         -
    c2t10d0      -      -      -         -
    c2t9d0       -      -      -         -
  mirror      556G   494G  61.9G         -
    c2t12d0      -      -      -         -
    c2t11d0      -      -      -         -
  mirror     1016G   507G   509G         -
    c1t1d0       -      -      -         -
    c1t5d0       -      -      -         -
  mirror     1016G   496G   520G         -
    c1t3d0       -      -      -         -
    c1t4d0       -      -      -         -

illumos	Jan 2012
FreeBSD	May 2012
ZFS on Linux	Sept 2012
ZFS-OSX	Sept 2012

ZFS list snapshot property alias

Functionally identical to Solaris 11 extension zfs list -t snap.

illumos	not yet
FreeBSD	Oct 2013
ZFS on Linux	Apr 2012
ZFS-OSX	Apr 2012

ZFS snapshot alias

Functionally identical to Solaris 11 extension zfs snap.

illumos	not yet
FreeBSD	Oct 2013
ZFS on Linux	Apr 2012
ZFS-OSX	Apr 2012

`zfs send` Progress Reporting

OpenZFS introduces a -v option to zfs send which reports per-second information on how much data has been sent, how long it has taken, and how much data remains to be sent.

illumos	May 2012
FreeBSD	May 2012
ZFS on Linux	Sept 2012
ZFS-OSX	Sept 2012

Arbitrary Snapshot Arguments to `zfs snapshot`

illumos	June 2012
FreeBSD	March 2013
ZFS on Linux	August 2013
ZFS-OSX	not yet

Performance

These are significant performance improvements, often requiring substantial restructuring of the source code.

Listed in chronological order (oldest first).

SA based xattrs

Improves performance of linux-style (short) xattrs by storing them in the dnode_phys_t's bonus block. (Not to be confused with Solaris-style Extended Attributes which are full-fledged files or "forks", like NTFS streams. This work could be extended to also improve the performance on illumos of small Extended Attributes whose permissions are the same as the containing file.)

Requires a disk format change and is off by default until Filesystem (ZPL) Feature Flags are implemented (not to be confused with zpool Feature Flags).

illumos	not yet (needs additional functionality)
FreeBSD	??
ZFS on Linux	Oct 2011
ZFS-OSX	??

Note that SA based xattrs are no longer used on symlinks as of Aug 2013 until an issue is resolved.

Use the slog even with logbias=throughput

illumos	??
FreeBSD	??
ZFS on Linux	Oct 2011
ZFS-OSX	Oct 2011

Asynchronous Filesystem and Volume Destruction

Destroying a filesystem requires traversing all of its data in order to return its used blocks to the pool's free list. Before this feature the filesystem was not fully removed until all blocks had been reclaimed. If the destroy operation was interrupted by a reboot or power outage the next attempt to import the pool (probably during boot) would need to complete the destroy operation synchronously, possibly delaying a boot for long periods of time.

With asynchronous destruction the filesystem's data is immediately moved to a "to be freed" list, allowing the destroy operation to complete without traversing any of the filesystem's data. A background process reclaims blocks from this "to be freed" list and is capable of resuming this process after reboots without slowing the pool import process.

The new freeing algorithm also has a significant performance improvement when destroying clones. The old algorithm took time proportional to the number of blocks referenced by the clone, even if most of those blocks could not be reclaimed because they were still referenced by the clone's origin. The new algorithm only takes time proportional to the number of blocks unique to the clone.

See this blog post for more detailed performance analysis.

Note: The async_destroy feature flag must be enabled to take advantage of this.

illumos	May 2012
FreeBSD	June 2012
ZFS on Linux	Jan 2013
ZFS-OSX	Jan 2013

Reduce Number of Empty bpobjs

Every time OpenZFS takes a snapshot it creates on-disk block pointer objects (bpobj's) to track blocks associated with that snapshot. In common use cases most of these bpobj's are empty, but the number of bpobjs per-snapshot is proportional to the number of snapshots already taken of the same filesystem or volume. When a single filesystem or volume has many (tens of thousands) snapshots these unecessary empty bpobjs can waste space and cause performance problems. OpenZFS waits to create each bpobjs until the first entry is added to it, thus eliminating the empty bpobjs.

Note: The empty_bpobj feature flag must be enabled to take advantage of this.

illumos	Aug 2012
FreeBSD	Aug 2012
ZFS on Linux	Dec 2012
ZFS-OSX	Dec 2012

Single Copy ARC

OpenZFS caches disk blocks in-memory in the adaptive replacement cache (ARC). Originally when the same disk block was accessed from different clones it was cached multiple times (one for each clone accessing the block) in case a clone planned to modify the block. With these changes OpenZFS caches at most one copy of every block unless a clone is actually modifying the block.

illumos	Sep 2012
FreeBSD	Nov 2012
ZFS on Linux	Dec 2012
ZFS-OSX	Dec 2012

TRIM Support

TRIM support provides the ability to pass deletes / frees through to underlying vdevs that help to ensure devices such as SSD's, which rely on receiving TRIM / UNMAP requests for sectors which are no longer needed, maintain optimal performance.

The current FreeBSD implementation builds a map of regions that were freed. On every write the code consults the map and removes ranges that were freed before, but are now overwritten.

Freed blocks are not TRIMed immediately, there is a low priority thread that TRIMs ranges when the time comes.

Support for TRIM has been demonstrated to significantly improved the general performance of SSD in the field, eliminating the need for regular secure erase cycles on busy hosts.

An alternative method, which is arguably better, and works by tracking the metaslab allocator is also in progress and can be found here: ZFS TRIM Support for illumos

There is a pull request for ZFS On Linux which implements FreeBSD's (Sep 2012) ZFS TRIM support.

illumos	not yet ported
FreeBSD	Sep 2012
ZFS on Linux	not yet ported
ZFS-OSX	not yet ported

FASTWRITE Algorithm

Improves synchronous IO performance.

illumos	not yet ported
FreeBSD	??
ZFS on Linux	Oct 2012
ZFS-OSX	Oct 2012

Note that a locking enhancement is being reviewed.

Block Freeing Performance Improvments

Performance analysis of OpenZFS revealed that the algorithms used when freeing blocks could cause significant performance problems when freeing a large amount of blocks in a single transaction or when dealing with fragmented pools. Several performance improvements were made in this area.

illumos	Nov 2012	Feb 2013	Feb 2013
FreeBSD	Nov 2012	Feb 2013	Feb 2013
ZFS on Linux	May 2013	June 2013	not yet
ZFS-OSX	May 2013	June 2013	not yet

nop-write

ZFS supports end-to-end checksumming of every data block. When a cryptographically secure checksum is being used OpenZFS will compare the checksums of incoming writes to checksum of the existing on-disk data and avoid issuing any write i/o for data that has not changed. This can help performance and snapshot space usage in situations were the same files are regularly overwritten with almost-identical data (e.g. regular full-backups of large random-access files).

illumos	Nov 2012
FreeBSD	Nov 2012
ZFS on Linux	Not yet (as of Sept. 13, 2013)
ZFS-OSX	not yet

lz4 compression

OpenZFS supports on-the-fly compression of all user data with a variety of compression algorithm. This feature adds support for the lz4 compression algorithm. lz4 is usually faster and compresses data better than lzjb, the old default OpenZFS compression algorithm.

Note: The lz4_compress feature flag must be enabled to take advantage of this.

illumos	Jan 2013
FreeBSD	Feb 2013
ZFS on Linux	Jan 2013
ZFS-OSX	Jan 2013

synctask rewrite

illumos	Feb 2013
FreeBSD	March 2013
ZFS on Linux	Sept 2013
ZFS-OSX	not yet

l2arc compression

illumos	Jun 2013
FreeBSD	Jun 2013
ZFS on Linux	Aug 2013
ZFS-OSX	Aug 2013

ARC Shouldn't Cache Freed Blocks

Originally cached blocks in the ARC remained cached until they were evicted due to memory pressure, even if the underlying disk block was freed. In some workloads these freed blocks were so frequently accessed before they were freed that the ARC continued to cache them while evicting blocks which had not been freed yet. Since freed blocks could never be accessed again continuing to cache them was unnecessary. In OpenZFS ARC blocks are evicted immediately when their underlying data blocks are freed.

illumos	Jun 2013
FreeBSD	Jun 2013
ZFS on Linux	Jun 2013
ZFS-OSX	Jun 2013

Improve N-way mirror read performance

Queues read requests to least busy leaf vdev in mirrors.

In addition to the vdev load biasing first implemented by ZFS on Linux in July 2013, the FreeBSD October 2013 version added I/O locality and device rotational information to further enhance the performance.

OS	Load	Load + I/O Locality & Rotational Information
illumos	not yet ported	not yet ported
FreeBSD	N/A	23rd October 2013
ZFS on Linux	Jul 2013	not yet ported
ZFS-OSX	Jul 2013	not yet ported

Smoother Write Throttle

The write throttle (dsl_pool_tempreserve_space() and txg_constrain_throughput()) is rewritten to produce much more consistent delays when under constant load. The new write throttle is based on the amount of dirty data, rather than guesses about future performance of the system. When there is a lot of dirty data, each transaction (e.g. write() syscall) will be delayed by the same small amount. This eliminates the "brick wall of wait" that the old write throttle could hit, causing all transactions to wait several seconds until the next txg opens. One of the keys to the new write throttle is decrementing the amount of dirty data as i/o completes, rather than at the end of spa_sync(). Note that the write throttle is only applied once the i/o scheduler is issuing the maximum number of outstanding async writes. See the block comments in dsl_pool.c and above dmu_tx_delay() for more details.

The ZFS i/o scheduler (vdev_queue.c) now divides i/os into 5 classes: sync read, sync write, async read, async write, and scrub/resilver. The scheduler issues a number of concurrent i/os from each class to the device. Once a class has been selected, an i/o is selected from this class using either an elevator algorithem (async, scrub classes) or FIFO (sync classes). The number of concurrent async write i/os is tuned dynamically based on i/o load, to achieve good sync i/o latency when there is not a high load of writes, and good write throughput when there is. See the block comment in vdev_queue.c for more details.

illumos	Aug 2013
FreeBSD	not yet
ZFS on Linux	not yet
ZFS-OSX	not yet

Dataset Properties

These are new filesystem, volume, and snapshot properties which can be accessed with the zfs(1) command's get subcommand. See the zfs(1) manpage for your distribution for more details on each of these properties.

Property	Description	illumos	FreeBSD	ZFS on Linux	ZFS-OSX
`refcompressratio`	The compression ratio acheived for all data referenced by (but not necessarily unique to) a snapshot, filesystem, or volume, expressed as a multiplier.	Jun 2011	Jun 2011	Aug 2012	Aug 2012
`clones`	For snapshots, this property is a comma-separated list of filesystems or volumes which are clones of this snapshot.	Nov 2011	Nov 2011	Jul 2012	Jul 2012
`written`	The amount of referenced space written to this dataset since the previous snapshot.	Nov 2011	Nov 2011	Jul 2012	Jul 2012
`written@<snap>`	The amount of referenced space written to this dataset since the specified snapshot. This is the space referenced by this dataset, but not referenced by the specified snapshot.	Nov 2011	Nov 2011	Jul 2012	Jul 2012
`logicalused`, `logicalreferenced`	The amount of space used or referenced, before taking into account compression.	Feb 2013	Mar 2013	not yet	not yet

@@ Line 68: / Line 68: @@
 |-
 |'''ZFS-OSX'''
-|not yet
+|[https://github.com/zfs-osx/zfs/commit/50b0b2d9ea604d29ed729be8fa61bb77ae3ff4e9 October 2013]
 |}

Difference between revisions of "Features"

Revision as of 07:52, 15 November 2013

Contents

Feature Flags

libzfs_core

CLI Usability

Pool Comment

Size Estimates for `zfs send` and `zfs destroy`

vdev Information in `zpool list`

ZFS list snapshot property alias

ZFS snapshot alias

`zfs send` Progress Reporting

Arbitrary Snapshot Arguments to `zfs snapshot`

Performance

SA based xattrs

Use the slog even with logbias=throughput

Asynchronous Filesystem and Volume Destruction

Reduce Number of Empty bpobjs

Single Copy ARC

TRIM Support

FASTWRITE Algorithm

Block Freeing Performance Improvments

nop-write

lz4 compression

synctask rewrite

l2arc compression

ARC Shouldn't Cache Freed Blocks

Improve N-way mirror read performance

Smoother Write Throttle

Dataset Properties

Navigation menu

Difference between revisions of "Features"

Revision as of 07:52, 15 November 2013

Feature Flags

libzfs_core

CLI Usability

Pool Comment

Size Estimates for zfs send and zfs destroy

vdev Information in zpool list

ZFS list snapshot property alias

ZFS snapshot alias

zfs send Progress Reporting

Arbitrary Snapshot Arguments to zfs snapshot

Performance

SA based xattrs

Use the slog even with logbias=throughput

Asynchronous Filesystem and Volume Destruction

Reduce Number of Empty bpobjs

Single Copy ARC

TRIM Support

FASTWRITE Algorithm

Block Freeing Performance Improvments

nop-write

lz4 compression

synctask rewrite

l2arc compression

ARC Shouldn't Cache Freed Blocks

Improve N-way mirror read performance

Smoother Write Throttle

Dataset Properties

Navigation menu

Search

Size Estimates for `zfs send` and `zfs destroy`

vdev Information in `zpool list`

`zfs send` Progress Reporting

Arbitrary Snapshot Arguments to `zfs snapshot`