Difference between revisions of "Documentation/ZfsSend"

Jump to navigation Jump to search
no edit summary
Line 76: Line 76:
dump_snapshot filters out any snapshots which are not in the range of snapshots the current ZFS send needs to transmit. For our simplified case of performing an incremental send of a single filesystem, dump_snapshot iterates to the source snapshot (@before in the original example), saves the name and object ID of that snapshot object, places a hold on that snapshot to ensure it cannot be destroyed while we operate on it, and then iterates to the target snapshot (@after) where it calls dump_ioctl.
dump_snapshot filters out any snapshots which are not in the range of snapshots the current ZFS send needs to transmit. For our simplified case of performing an incremental send of a single filesystem, dump_snapshot iterates to the source snapshot (@before in the original example), saves the name and object ID of that snapshot object, places a hold on that snapshot to ensure it cannot be destroyed while we operate on it, and then iterates to the target snapshot (@after) where it calls dump_ioctl.


dump_ioctl is where we transition from executing in user space inside the ZFS kernel module. Between dump_ioctl and the next piece of interesting logic there are several intermediate calls which perform error checking and data retrieval (zfs_ioctl, zfs_ioc_send, dmu_send_obj) but let’s focus a little farther down the stack at dmu_send_impl in dmu_send.c where it really gets interesting.
dump_ioctl is where we transition from executing in user space inside the ZFS kernel module. Between dump_ioctl and the next piece of interesting logic there are several intermediate calls which perform error checking and data retrieval (zfs_ioctl, zfs_ioc_send, ''dmu_send_obj'') but let’s focus a little farther down the stack at ''dmu_send_impl'' in 'dmu_send.c where it really gets interesting.


dmu_send_impl is the first place where we begin writing to the actual ZFS send stream. For instance, dmu_send_impl passes the in-memory representation of the BEGIN and END records into dump_bytes for output to the ZFS send stream. The BEGIN record includes identifying info on the source and target snapshots in an incremental send, the name of the dataset being sent, and timestamp time on the target snapshot. The END record includes a checksum of the entire send stream and identifying info on the target snapshot. Even more important, dmu_send_impl also performs traversal of the current dataset by combining traverse_dataset with a callback, backup_cb.
''dmu_send_impl'' is the first place where we begin writing to the actual ZFS send stream. For instance, ''dmu_send_impl'' passes the in-memory representation of the BEGIN and END records into dump_bytes for output to the ZFS send stream. The BEGIN record includes identifying info on the source and target snapshots in an incremental send, the name of the dataset being sent, and timestamp time on the target snapshot. The END record includes a checksum of the entire send stream and identifying info on the target snapshot. Even more important, ''dmu_send_impl'' also performs traversal of the current dataset by combining traverse_dataset with a callback, backup_cb.


traverse_dataset’s core functionality is implemented in traverse_visitbp. traverse_visitbp recursively visits all objects and blocks in the ZFS object it is passed (in this case the target snapshot of the ZFS send) and calls the callback function it is passed on each block. traverse_visitbp has special filtering on the blocks it visits that allows it to skip any ZFS blocks which were not modified after a certain transaction group (i.e. snapshot), which is useful for incremental sends.
traverse_dataset’s core functionality is implemented in traverse_visitbp. traverse_visitbp recursively visits all objects and blocks in the ZFS object it is passed (in this case the target snapshot of the ZFS send) and calls the callback function it is passed on each block. traverse_visitbp has special filtering on the blocks it visits that allows it to skip any ZFS blocks which were not modified after a certain transaction group (i.e. snapshot), which is useful for incremental sends.
Line 100: Line 100:
|}
|}


Each of the dump_* functions listed above eventually calls into dump_bytes with its specially formatted payload for whichever record it is writing (FREE, FREEOBJECTS, OBJECT, etc). A list of all record types can be found in the type dmu_replay_record.
Each of the dump_* functions listed above eventually calls into dump_bytes with its specially formatted payload for whichever record it is writing (FREE, FREEOBJECTS, OBJECT, etc). A list of all record types can be found in the type ''dmu_replay_record''.


So far we’ve covered the full stack trace from requesting a send using the zfs command line tool down to the dump_bytes function which does the actual write of ZFS send records from the kernel. However, ZFS send normally (though not necessarily) has a matching ZFS recv which takes the send stream and applies the changes defined by the contained records to some other ZFS dataset. How does that work?
So far we’ve covered the full stack trace from requesting a send using the zfs command line tool down to the dump_bytes function which does the actual write of ZFS send records from the kernel. However, ZFS send normally (though not necessarily) has a matching ZFS recv which takes the send stream and applies the changes defined by the contained records to some other ZFS dataset. How does that work?


Similar to ZFS send, the most common interface to ZFS recv is through the zfs command line utility with the recv/receive subcommand. The core logic of ZFS receive is located in the kernel down a stack trace such as: zfs_do_receive => zfs_receive => zfs_receive_impl => zfs_receive_one => zfs_ioctl(ZFS_IOC_RECV) => zfs_ioc_recv. Of course, these intermediate functions are necessary to perform setup and error handling, but they aren’t particularly interesting for this overview of the core functionality of ZFS receive. (TODO: I’m kind of punting on this at the moment so I can focus on the interesting code in dmu_recv_begin, dmu_recv_stream, and dmu_recv_end but I should probably go back and mention some things about the functions that are higher in the stack. I’m also getting a little tired of writing so in general receive is probably going to need more fleshing out).
Similar to ZFS send, the most common interface to ZFS recv is through the zfs command line utility with the recv/receive subcommand. The core logic of ZFS receive is located in the kernel down a stack trace such as: zfs_do_receive => zfs_receive => zfs_receive_impl => zfs_receive_one => zfs_ioctl(ZFS_IOC_RECV) => zfs_ioc_recv. Of course, these intermediate functions are necessary to perform setup and error handling, but they aren’t particularly interesting for this overview of the core functionality of ZFS receive. (TODO: I’m kind of punting on this at the moment so I can focus on the interesting code in ''dmu_recv_begin'', ''dmu_recv_stream'', and ''dmu_recv_end'' but I should probably go back and mention some things about the functions that are higher in the stack. I’m also getting a little tired of writing so in general receive is probably going to need more fleshing out).


The implementation of ZFS receive centers around three functions: dmu_recv_begin, dmu_recv_stream, and dmu_recv_end. dmu_recv_begin performs setup for receiving the stream’s records based on the information contained in the BEGIN record. This includes either creating a new ZFS dataset for the operations in the stream to be applied to, or creating a clone of an existing filessytem to apply those operations to (in the case of an incremental receive). This work is performed inside a DSL sync task.
The implementation of ZFS receive centers around three functions: ''dmu_recv_begin'', ''dmu_recv_stream'', and ''dmu_recv_end''. ''dmu_recv_begin'' performs setup for receiving the stream’s records based on the information contained in the BEGIN record. This includes either creating a new ZFS dataset for the operations in the stream to be applied to, or creating a clone of an existing filessytem to apply those operations to (in the case of an incremental receive). This work is performed inside a DSL sync task.


dmu_recv_stream performs the processing of the FREE, FREEOBJECTS, OBJECT, etc records that follow the BEGIN record. Using restore_read to read individual records, dmu_recv_stream loops until it reaches the END record or an error occurs. For each type of record it calls a different method which executes the operations on the dataset that receive has been provided.
''dmu_recv_stream'' performs the processing of the FREE, FREEOBJECTS, OBJECT, etc records that follow the BEGIN record. Using ''restore_read'' to read individual records, ''dmu_recv_stream'' loops until it reaches the END record or an error occurs. For each type of record it calls a different method which executes the operations on the dataset that receive has been provided.


<center>
<center>
Line 115: Line 115:
| '''Record Type''' || '''Method Called'''
| '''Record Type''' || '''Method Called'''
|-
|-
| OBJECT || restore_object
| OBJECT || ''restore_object''
|-
|-
| FREEOBJECTS || restore_freeobjects
| FREEOBJECTS || ''restore_freeobjects''
|-
|-
| WRITE || restore_write
| WRITE || ''restore_write''
|-
|-
| WRITE_BYREF || restore_write_byref
| WRITE_BYREF || ''restore_write_byref''
|-
|-
| FREE || restore_free
| FREE || ''restore_free''
|-
|-
| END || restore_end
| END || ''restore_end''
|-
|-
| SPILL || restore_spill
| SPILL || ''restore_spill''
|-
|-
| Anything else || Exit with EINVAL
| Anything else || Exit with EINVAL
Line 133: Line 133:
</center>
</center>


We won’t go into detail on each of these functions, but it is useful to consider at least one so let’s take a look at restore_write.
We won’t go into detail on each of these functions, but it is useful to consider at least one so let’s take a look at ''restore_write''.


restore_write first retrieves the actual modified data block from the send stream using restore_read. It then creates a transaction, assigns that transaction to the current TXG_WAIT transaction group, and associates a write to the specified object at the specified offset and length with that new transaction. Finally, it commits that transaction.  
''restore_write'' first retrieves the actual modified data block from the send stream using ''restore_read''. It then creates a transaction, assigns that transaction to the current TXG_WAIT transaction group, and associates a write to the specified object at the specified offset and length with that new transaction. Finally, it commits that transaction.  


dmu_recv_end then performs cleanup of the transformed dataset following successful completion of dmu_recv_stream. Like dmu_recv_begin, dmu_recv_end performs its work inside of a sync task.
dmu_recv_end then performs cleanup of the transformed dataset following successful completion of ''dmu_recv_stream''. Like ''dmu_recv_begin'', ''dmu_recv_end'' performs its work inside of a sync task.
Editor
90

edits

Navigation menu