This guide assumes familiarity with common ZFS commands and configuration steps. At a mimimum, you
should understand how ZFS categorizes I/O and how to use zpool iostat -r, -q and -w. There's no
magic list of parameters to drop in, but rather a procedure to follow so that you can
match ZFS to
your device. This process can be used on local disk as well to identify bottlenecks and problems
with data flow, but the gains may be much less significant.
being tunneled across a PPP link, or a Ceph server providing an RBD from a continent over.
Obviously there are limits to what we can do with that kind of latency, but ZFS can make working
within these limits much easier by refactoring our data into larger blocks
and efficiently mergingreads and writes. This is a method for optimizing that; very high performance and large
IOP size is possible.
There are a few requirements, though:
* Larger blocks are better.
* * 64K Only suitable as a write-once or receive-only pool * * 128K Reasonable choice for a receive-only pool * * 256K Very good choice for a receive-only pool, probably the minimum size for a pool taking TxG commit writes. * * 512K and up: best choice for a TxG commit pool.
Larger blocks are easier to merge during I/O processing but more importantly,
based on high latency storage may be much more painful.
* Reads are usually not a problem. Writes must be done carefully for the best results.
The most optimal possible solution is a pool that only receives enough writes to fill it once.
try a pool with logbias=throughput, the increased fragmentation will destroy read performance.
* Lots of ARC is a good thing.
Lots of dirty data space can also be a good thing provided that
dirty data stabilizes without hitting the maximum per-pool or the ARC limit.
Async writes in ZFS flow very roughly as follows:
* * Dirty data for pool (must be stable and about 80% of dirty_data_max)
* TxG commit
* * zfs_sync_taskq_batch_pct (traverses data structures to generate IO)
* * zio_taskq_batch_pct (for compression and checksumming)
* * zio_dva_throttle_enabled (ZIO throttle)
* VDEV thread limits
* * zfs_vdev_async_write_min_active
* * zfs_vdev_async_write_max_active
* Aggregation (set this first)
* * zfs_vdev_aggregation_limit (maximum I/O size)
* * zfs_vdev_write_gap_limit (I/O gaps)
* * zfs_vdev_read_gap_limit
* block device scheduler (set this first)
You must work through this flow to determine if there are any
* block device scheduler (should be noop or none)
K is a factor that determines the likely size of free spaces on your pool after
K = 2.5 for txg commit pools with no indirect writes
Your numbers may be different, but this is a good starting point.
The approach taken works like this:
Open up batch taskq, aggregation limits, write threads, and ZIO throttle
. TxG commit should now drive writes without throttling for latency.
Turn zfs_sync_taskq_batch_pct down until speed reduces 10%. This sets the pace of the initial flow within the TxG commit.
Verify dirty data is stable and roughly at the midpoint of the dirty data throttle, when under high throughput workloads.
Decrease agg limits to K * blocksize
Decrease write threads until speed starts to reduce
Verify IO merge
Decrease async read threads until speed reduces 20%
Decrease sync read threads until speed starts to reduce
Raise agg limit to K * blocksize * 3
Check agg size
optionally: set and enable throttle match
Check agg size and throughput
Test and verify dirty data
IO prioritization: assume SRmax is the highest max (it usually will be). If not, find a compromise value for it so that