Difference between revisions of "Documentation/ZfsSend"

Jump to navigation Jump to search
6,614 bytes added ,  18:18, 30 August 2013
no edit summary
Line 15: Line 15:
$ zfs send -i rpool/send-test@before rpool/send-test@after > send.log<br />
$ zfs send -i rpool/send-test@before rpool/send-test@after > send.log<br />
</pre>
</pre>
The last command in this list is essentially producing the modifications necessary to bring a ZFS filesystem whose state is identical to rpool/send-test@before to the state snapshotted at rpool/send-test@after. The contents of that send stream can then be inspected using ‘zstreamdump -v’:
<pre>
BEGIN record
        hdrtype = 1
        features = 4
        magic = 2f5bacbac
        creation_time = 521f9995
        type = 2
        flags = 0x0
        toguid = 3cb6074c7a9c9294
        fromguid = fcdfcdcd9ca829c5
        toname = rpool/send-test@after
FREEOBJECTS firstobj = 0 numobjs = 1
OBJECT object = 1 type = 21 bonustype = 0 blksz = 1024 bonuslen = 0
FREE object = 1 offset = 1024 length = -1
...
OBJECT object = 8 type = 19 bonustype = 44 blksz = 512 bonuslen = 168
FREE object = 8 offset = 512 length = -1
FREEOBJECTS firstobj = 9 numobjs = 23
WRITE object = 4 type = 20 checksum type = 7
offset = 0 length = 512 props = 200000000
WRITE object = 8 type = 19 checksum type = 7
offset = 0 length = 512 props = 200000000
END checksum = 30b54c49a2/d58e900baf32/249b6af78cff8e7/b72483930c1bdec2
SUMMARY:
        Total DRR_BEGIN records = 1
        Total DRR_END records = 1
        Total DRR_OBJECT records = 8
        Total DRR_FREEOBJECTS records = 2
        Total DRR_WRITE records = 2
        Total DRR_FREE records = 8
        Total DRR_SPILL records = 0
        Total records = 22
        Total write size = 1024 (0x400)
        Total stream length = 8392 (0x20c8)
</pre>
This output is only marginally more readable than the original binary file, but note the two lines starting with WRITE indicating a ZFS block that has been modified between the two snapshots we are analyzing. Other lines beginning with FREEOBJECTS, OBJECT, and FREE represent other records in the ZFS send stream. If we add the -d flag to zstreamdump we get a little more information about the second WRITE to object 8:
<pre>
WRITE object = 8 type = 19 checksum type = 7
offset = 0 length = 512 props = 200000000
31 32 33 0a  00 00 00 00  00 00 00 00  00 00 00 00  123. .... .... ....
00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00  .... .... .... ....
</pre>
Great! This shows that the block at offset 0 in object 8 (which corresponds to the tmp file in this example) was modified with the ASCII characters “123\n”.
So we’ve seen that ZFS send works as expected, in that it can transmit the modified contents of a ZFS filesystem when doing an incremental send. You can also inspect the contents of a full ZFS send (generated with ‘zfs send rpool/send-test@after) using zstreamdump, but this can dump a much larger stream and take much longer to process. For example, with our small test the incremental send from @before to @after was ~8KB but the full send of @after was ~43KB. The size of an incremental send depends almost entirely on how rapidly a filesystem is changing (though the total size of a filesystem decides the upper bound on incremental send lengths). Experimenting with full sends is left as an exercise for the reader.
But how does ZFS send actually determine the information to be transmitted? How does it construct the records that are actually sent? And how does ZFS recv use those records to reconstruct the original state but on the target pool? Covering these questions in detail and side-by-side with the code will be the focus of the remaining sections. This study will assume some pre-existing knowledge of the ZFS architecture and access to the ZFS code base. ZFS send also provides a number of more advanced options (such as -R or -D), but this walkthrough will focus on the code path taken when doing an incremental send of a single filesystem.
The most common way for a user to interact with ZFS send is through the zfs command line tool, and its send subcommand. Going this route, send-specific code starts at zfs_do_send in zfs_main.c. However, for the simple case we are considering the actual logic begins a little deeper in the call stack, at dump_filesystem in libzfs_sendrecv.c.
dump_filesystem passes control down to zfs_iter_snapshots_sorted. zfs_iter_snapshots_sorted’s main responsibility is to sort the snapshots of the target filesystem of this ZFS send operation and iterate over them from earliest to latest. A callback function is called on each snapshot. For the case of ZFS send, this callback is dump_snapshot.
dump_snapshot filters out any snapshots which are not in the range of snapshots the current ZFS send needs to transmit. For our simplified case of performing an incremental send of a single filesystem, dump_snapshot iterates to the source snapshot (@before in the original example), saves the name and object ID of that snapshot object, places a hold on that snapshot to ensure it cannot be destroyed while we operate on it, and then iterates to the target snapshot (@after) where it calls dump_ioctl.
dump_ioctl is where we transition from executing in user space inside the ZFS kernel module. Between dump_ioctl and the next piece of interesting logic there are several intermediate calls which perform error checking and data retrieval (zfs_ioctl, zfs_ioc_send, dmu_send_obj) but let’s focus a little farther down the stack at dmu_send_impl in dmu_send.c where it really gets interesting.
dmu_send_impl is the first place where we begin writing to the actual ZFS send stream. For instance, dmu_send_impl passes the in-memory representation of the BEGIN and END records into dump_bytes for output to the ZFS send stream. The BEGIN record includes identifying info on the source and target snapshots in an incremental send, the name of the dataset being sent, and timestamp time on the target snapshot. The END record includes a checksum of the entire send stream and identifying info on the target snapshot. Even more important, dmu_send_impl also performs traversal of the current dataset by combining traverse_dataset with a callback, backup_cb.
traverse_dataset’s core functionality is implemented in traverse_visitbp. traverse_visitbp recursively visits all objects and blocks in the ZFS object it is passed (in this case the target snapshot of the ZFS send) and calls the callback function it is passed on each block. traverse_visitbp has special filtering on the blocks it visits that allows it to skip any ZFS blocks which were not modified after a certain transaction group (i.e. snapshot), which is useful for incremental sends.
backup_cb, called by traverse_visitbp, handles writing ZFS send records for each ZFS block passed to it. backup_cb performs different actions for a number of different cases, including:
Editor
90

edits

Navigation menu