Documentation/Administrative Commands

This webpage describes the code flow when doing zfs administrative commands (/sbin/zfs subcommands that change state). We will look at the example of  and examine what each layer of code is responsible for. This is intended as an introduction to the many layers of ZFS, so we won't go into detail on how snapshots are implemented. You can read more about snapshots in an old blog post.

/sbin/zfs infrastructure (
The generic (subcommand agnostic) infrastructure of the zfs command does the following:
 * Create a libzfs handle (libzfs_init).
 * Determine which subcommand should be executed and run it.
 * Each zfs subcommand has a callback, typically named.
 * Call a libzfs function (zpool_log_history) to log the command (see below for details)

snapshot subcommand
The snapshot subcommand's callback is. It does the following:
 * Parse the command line arguments.
 * Create a list of the snapshots that need to be created.
 * Call a libzfs function (zfs_iter_filesystems) to iterate over the descendent filesystems, adding the snapshot of that filesystem to the list
 * Call a libzfs function (zfs_snapshot_nvl) to create the snapshots and handle any errors.

libzfs
We saw two uses of libzfs: iterating over the descendent filesystems, and creating the snapshots.

filesystem iteration
libzfs provides "handles" to zfs datasets, represented by a zfs_handle_t. The handle is created by getting stats on a dataset from the kernel. The handle then caches these stats (e.g. property values) in userland. Note that the handle is a purely userland (libzfs) concept; the kernel doesn't know about them, and the handle doesn't prevent any concurrent activity (e.g. destroying the dataset, changing properties, etc).

To iterate over a filesystem's children, libzfs uses the ZFS_IOC_DATASET_LIST_NEXT ioctl to the kernel. Each call to this ioctl returns the next child of the specified dataset, along with the stats (e.g. properties) of that dataset. libzfs uses this information to make a zfs_handle_t, and passes the handle to a callback provided by the caller (zfs_do_snapshot in this case).

snapshot creation
/sbin/zfs provides the list of snapshots to create, so this is a relatively thin layer in libzfs. Other subcommands have substantially more of their logic implemented in libzfs. The one interesting part of what libzfs does here is handle the errors from the kernel. It will print out human-readable error messages depending on what the error code was from the kernel.

libzfs calls into libzfs_core to do the actual ioctl to the kernel.

libzfs_core
libzfs calls  in libzfs_core. libzfs_core is a very thin layer which basically just marshals the arguments and calls the ioctl to the kernel. In this case it would be ZFS_IOC_SNAPSHOT.

ioctl infrastructure
When userland calls  on /dev/zfs, the kernel infrastructure will call. This has code which is applied to all zfs ioctls. It does the following:
 * marshals the arguments, copying them in from the user address space
 * determines which ioctl function should be called
 * specific ioctl functions are typically named
 * ioctl functions are stored in, which is populated by calling   from
 * Call the ioctl-specific permission checking function,
 * Call the ioctl-specific function,
 * If this is a new-style ioctl (which ZFS_IOC_SNAPSHOT is), and it was successful, we log the ioctl and its arguments on disk in the pool's history.
 * This history log can be printed by running zpool history -i
 * If this ioctl allows it (which ZFS_IOC_SNAPSHOT does), and the ioctl was successful, we remember that this thread is allowed to log the CLI history, which will be done as a separate ioctl (see below).

snapshot ioctl
This is a relatively thin layer which typically checks that the arguments are well-formed. E.g. all of the snapshots must be in the same pool.

DSL
This layer is also relatively thin, it marshals its arguments into structs and creates a synctask to execute a callback from syncing context. For snapshots, there is also some code to suspend the ZIL on old version pools.

synctask infrastructure
The synctask infrastructure allows for a thread executing in open context (i.e. from an ioctl) to execute a callback in syncing context (i.e. from ). The MOS (Meta Object Set), which contains all the pool-wide metadata, can only be modified from syncing context.

snapshot synctask
This code creates each of the snapshots, by modifying the MOS. Specifically, by creating a new object in the MOS to represent each snapshot. Each snapshot's object stores a, which will be filled in by this code. Related datasets will also be modified. Note that in this phase, we are only modifying and dirtying the in-memory copy of data that will be

DMU layer: MOS sync
In the previous phase, we only modified the in-memory copy of data which is represented on disk. Subsequently (in the same TXG), we write out all the dirty data in the MOS. The MOS is a object set (objset) like any other, the primary difference being that it is only dirtied (modified) in syncing context.