Difference between revisions of "Performance tuning"

Jump to navigation Jump to search
189 bytes added ,  17:03, 9 January 2016
m
link to manual page for gnop(8); illumos not Illumos; FreeBSD not freeBSD; and working around error 404 for http://ivoras.sharanet.org/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html
m (Fix deduplication description to clarify that we do not keep the DDT entries in dedicated kmem buffers long enough to actually double cache, contrary to my mistaken previous belief.)
m (link to manual page for gnop(8); illumos not Illumos; FreeBSD not freeBSD; and working around error 404 for http://ivoras.sharanet.org/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html)
Line 28: Line 28:
Reporting the correct sector sizes is the responsibility the block device layer. This unfortunately has made proper handling of devices that misreport drives different across different platforms. The respective methods are as follows:
Reporting the correct sector sizes is the responsibility the block device layer. This unfortunately has made proper handling of devices that misreport drives different across different platforms. The respective methods are as follows:


* [http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks#ZFSandAdvancedFormatdisks-OverridingthePhysicalBlockSize sd.conf] on Illumos
* [http://wiki.illumos.org/display/illumos/ZFS+and+Advanced+Format+disks#ZFSandAdvancedFormatdisks-OverridingthePhysicalBlockSize sd.conf] on illumos
* [http://ivoras.sharanet.org/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html gnop] on freeBSD
* [https://www.freebsd.org/cgi/man.cgi?query=gnop&sektion=8&manpath=FreeBSD+10.2-RELEASE gnop(8)] on FreeBSD; see for example [http://web.archive.org/web/20151022020605/http://ivoras.sharanet.org/blog/tree/2011-01-01.freebsd-on-4k-sector-drives.html FreeBSD on 4K sector drives] (2011-01-01)
* [http://zfsonlinux.org/faq.html#HowDoesZFSonLinuxHandlesAdvacedFormatDrives -o ashift=] on ZFS on Linux
* [http://zfsonlinux.org/faq.html#HowDoesZFSonLinuxHandlesAdvacedFormatDrives -o ashift=] on ZFS on Linux
* -o ashift= also works with both MacZFS (pool version 8) and ZFS-OSX (pool version 5000).
* -o ashift= also works with both MacZFS (pool version 8) and ZFS-OSX (pool version 5000).


-o ashift= is convenient, but it is flawed in that the creation of pools containing top level vdevs that have multiple optimal sector sizes require the use of multiple commands. [http://www.listbox.com/member/archive/182191/2013/07/search/YXNoaWZ0/sort/time_rev/page/2/entry/16:58/20130709002459:82E21654-E84F-11E2-A0FF-F6B47351D2F5/ A newer syntax] that will rely on the actual sector sizes has been discussed as a cross platform replacement and will likely be implemented in the future.
-o ashift= is convenient, but it is flawed in that the creation of pools containing top level vdevs that have multiple optimal sector sizes require the use of multiple commands. [http://www.listbox.com/member/archive/182191/2013/07/search/YXNoaWZ0/sort/time_rev/page/2/entry/16:58/20130709002459:82E21654-E84F-11E2-A0FF-F6B47351D2F5/ A newer syntax] that will rely on the actual sector sizes has been discussed as a cross platform replacement and will likely be implemented in the future.


In addition, [[User:Ryao | Richard Yao]] has contributed a [https://github.com/zfsonlinux/zfs/blob/master/cmd/zpool/zpool_vdev.c#L108 database of drives known to misreport sector sizes] to the ZFS on Linux project. It is used to automatically adjust ashift without the assistance of the system administrator. This approach is unable to fully compensate for misreported sector sizes whenever drive identifiers are used ambiguously (e.g. virtual machines, iSCSI LUNs, some rare SSDs), but it does a great amount of good. The format is roughly compatible with Illumos' sd.conf and it is expected that other implementations will integrate the database in future releases. Strictly speaking, this database does not belong in ZFS, but the difficulty of patching the Linux kernel (especially older ones) necessitated that this be implemented in ZFS itself for Linux. The same is true for MacZFS. However, FreeBSD and Illumos are both able to implement this in the correct layer.
In addition, [[User:Ryao | Richard Yao]] has contributed a [https://github.com/zfsonlinux/zfs/blob/master/cmd/zpool/zpool_vdev.c#L108 database of drives known to misreport sector sizes] to the ZFS on Linux project. It is used to automatically adjust ashift without the assistance of the system administrator. This approach is unable to fully compensate for misreported sector sizes whenever drive identifiers are used ambiguously (e.g. virtual machines, iSCSI LUNs, some rare SSDs), but it does a great amount of good. The format is roughly compatible with illumos' sd.conf and it is expected that other implementations will integrate the database in future releases. Strictly speaking, this database does not belong in ZFS, but the difficulty of patching the Linux kernel (especially older ones) necessitated that this be implemented in ZFS itself for Linux. The same is true for MacZFS. However, FreeBSD and illumos are both able to implement this in the correct layer.


=== Compression ===
=== Compression ===
Line 96: Line 96:
ZFS will behave differently on different platforms when given a whole disk.
ZFS will behave differently on different platforms when given a whole disk.


On Illumos, ZFS will enable the write cache on a whole disk while it will leave it set to whatever it was set on a partition. The Illumos UFS driver cannot ensure integrity with the write cache enabled, so ZFS avoids enabling the write cache to avoid potentially causing corruption should a UFS filesystem occupy a partition on the disk.
On illumos, ZFS will enable the write cache on a whole disk while it will leave it set to whatever it was set on a partition. The illumos UFS driver cannot ensure integrity with the write cache enabled, so ZFS avoids enabling the write cache to avoid potentially causing corruption should a UFS filesystem occupy a partition on the disk.


On Linux, the Linux IO elevator is largely redundant given that ZFS has its own IO elevator, so ZFS will set the IO elevator to noop to avoid unnecessary CPU overhead.
On Linux, the Linux IO elevator is largely redundant given that ZFS has its own IO elevator, so ZFS will set the IO elevator to noop to avoid unnecessary CPU overhead.


ZFS will also create a GPT partition table own partitions when given a whole disk under Illumos on x86/amd64 and on Linux. This is mainly to make booting through UEFI possible because UEFI requires a small FAT partition to be able to boot the system. The ZFS driver will be able to tell the difference between whether the pool had been given the entire disk or not via the whole_disk field in the label.
ZFS will also create a GPT partition table own partitions when given a whole disk under illumos on x86/amd64 and on Linux. This is mainly to make booting through UEFI possible because UEFI requires a small FAT partition to be able to boot the system. The ZFS driver will be able to tell the difference between whether the pool had been given the entire disk or not via the whole_disk field in the label.


This is not done on FreeBSD. Pools created by FreeBSD will always have the whole_disk field set to true, such that a pool imported on another platform that was created on FreeBSD will always be treated as the whole disks were given to ZFS.
This is not done on FreeBSD. Pools created by FreeBSD will always have the whole_disk field set to true, such that a pool imported on another platform that was created on FreeBSD will always be treated as the whole disks were given to ZFS.
Line 139: Line 139:
Alternatively, some devices allow you to change the sizes that they report.This would also work, although a secure erase should be done prior to changing the reported size to ensure that the SSD recognizes the additional spare area. Changing the reported size can be done on drives that support it with `hdparm -N <sectors>` on systems that have laptop-mode-tools.
Alternatively, some devices allow you to change the sizes that they report.This would also work, although a secure erase should be done prior to changing the reported size to ensure that the SSD recognizes the additional spare area. Changing the reported size can be done on drives that support it with `hdparm -N <sectors>` on systems that have laptop-mode-tools.


The choice of 4GB is somewhat arbitrary. Most systems do not write anything close to 4GB to ZIL between transaction group commits, so overprovisioning all storage beyond the 4GB partition should be alright. If a workload needs more, then make it no more than the maximum ARC size. Even under extreme workloads, ZFS will not benefit from more SLOG storage than the maximum ARC size. That is half of system memory on Linux and 3/4 of system memory on Illumos.
The choice of 4GB is somewhat arbitrary. Most systems do not write anything close to 4GB to ZIL between transaction group commits, so overprovisioning all storage beyond the 4GB partition should be alright. If a workload needs more, then make it no more than the maximum ARC size. Even under extreme workloads, ZFS will not benefit from more SLOG storage than the maximum ARC size. That is half of system memory on Linux and 3/4 of system memory on illumos.


=== Whole disks ===
=== Whole disks ===
Editor
241

edits

Navigation menu