Projects/ZFS Channel Programs
A ZFS Channel Program (ZCP) permits the execution of user programs inside the kernel. A ZFS Channel Program manipulate ZFS internals in a single, atomically-visible operation. For instance, to delete all snapshots of a filesystem a channel program could be written which 1) generates the list of snapshots, 2) traverses that list, and 3) destroys each snapshot unconditionally. Because each of these statements would be evaluated from within the kernel, channel programs can guarantee safety from interference with other concurrent ZFS modifications. Executing from inside the kernel allows us to guarantee atomic visibility of these operations (correctness) and allows them to be performed in a single transaction group (performance).
A successful implementation of ZCP will:
- Support equivalent functionality for all of the current ZFS commands with improved performance and correctness from the point of view of the user of ZFS.
- Facilitate the quick addition of new and useful commands as ZCP enables the implementation of more powerful operations which previously would have been unsafe to implement in user programs, or would require modifications to the kernel. Since the ZCP layer guarantees the atomicity of each channel program, we no longer need to write elaborate new sync_tasks for each new IOCTL.
- Allow ZFS users to safely implement their own ZFS operations without breaking ZFS or performing operations they don’t have the privileges for.
- Improve the performance and correctness of existing applications built on ZFS operations.
ZCP will have three main components: the frontend programming language in which ZCP programs can be written, the intermediate representation (IR) that ZCP programs are translated to and which is passed to the kernel, and the modifications to the kernel which execute the ZFS operations specified by the IR program.
The bulk of the work for supporting ZCP will be in the addition of a ZCP interpreter to the kernel. Current support for executing ZFS commands is provided by dsl_sync_task objects and the check and sync functions they contain. A dsl_sync_task object represents an operation on ZFS metadata (e.g. creating a snapshot, destroying several snapshots, or setting a property). A sync task can fail if either 1) its check function fails in open context, or 2) its check function fails in syncing context. As an example, consider the operation of setting a property on a dataset. In this case, the check function performs some basic checks such as checking that the name of the property has a valid length or that the current version of ZFS supports properties for this dataset’s type. The sync function performs the actual setting of the property on the specified dataset.
The end product should evaluate ZCP programs as trees of sync_task objects (or some other analogous new object type), enabling organic and arbitrary combinations of ZFS operations within the restrictions of the syntax of the ZCP language, syntax of the IR, and checks built into the interpreter. The steps to reach this end goal have been designed such that the implementation of ZCP can be incremental. Adding modular items individually will simplify performance testing and correctness checking of each piece of ZCP.
- Modifying the existing sync_task objects to match up with the planned ZCP operations, and modifying the implementation of each check and sync function to work with the new object definitions while still taking the same zfs_cmd_t input from user space.
- Add any new sync_task objects necessary for ZCP, plugging them in to replace any existing ZCP operations where necessary while retaining all or most of the existing sync_task implementation. After this step, the backend/kernel implementation should be stable and should support most of the elemental operations planned for ZCP.
- Write implementations of each zfs operation as channel programs. This task bridges the gap between the ZCP frontend and backend, as we will need some kind of language standard to support this step.
- Modify the existing dsl_sync_task API to take as input a ZCP IR program and translate that IR program into calls to check and sync calls for the new sync_task objects.
- Create a library for manually generating ZCP IR programs from user space to enable testing.
- Support permissions checking on all objects touched by a ZCP.
Proposals for the ZCP programming language have included a procedural (Python) or a functional frontend supporting a number of primitives. Details of the core operations proposed are at in the ZFS channel program design doc.
The current design uses nvlists as the IR for ZCP programs. Translating a functional or lisp-like language to this would probably be simpler, though Python may be more familiar for users. For the frontend, the main challenges are:
- Clearly defining the operations and semantics of each statement and operation in the ZCP programming language and IR, for both successful and failed execution.
- Translating the high-level ZCP programming language to the ZCP IR. However, this would be a final step done at the very end of ZCP implementation.
To further illustrate this, it is useful to walk through the pathway an example channel program takes from its construction in user space to its execution in the kernel. Let’s consider the task of destroying all snapshots of a filesystem. Given a filesystem pool/fs with snapshots pool/fs@snap1, pool/fs@snap2, and pool/fs@snap3, the current system would require two ZFS operations: first a zfs list to discover all snapshots of pool/fs, followed by a zfs destroy to destroy all the snapshots listed. This allows another agent to interfere with our goal of deleting all snaphots of pool/fs by deleting or creating snapshots or clones off of pool/fs between our list and our destroy.
Examples of how a channel program would be written by a person in either a procedural or functional language are below:
|Procedural (Python-esque)||Functional (Scheme-esque)|
|fs = argv||("""iterate_snapshots""" fs snap||for snap in fs."""snapshots"""():||("""destroy_snapshot""" snap))||snap."""destroy"""()|