Difference between revisions of "Projects"

Line 142:

* property(?) to charge quota usage by before-compression-dedup size.

=== Periodic Data Validation ===

Problem: ZFS does a great job detecting data errors due to lost writes, media, errors, storage bugs, but only when the user actually accesses the data. Scrub in its current form can take a very long time and can have highly deleterious impacts to overall performance.

Data validation in ZFS should be specified according to data or business needs. Kicking off a scrub every day, week, or month doesn’t directly express that need. More likely, the user wants to express their requirements like this:

* “Check all old data at least once per month”

* “Make sure all new writes are verified within 1 day”

* “Don’t consume more than 50% of my IOPS capacity”

Note that constraints like these may overlap, but that’s fine — the user just must indicate priority and the system must alert the user of violations.

I suggest a new type of scrub. Constraints should be expressed and persisted with the pool. Execution of the scrub should tie into the ZFS IO scheduler. That subsystem is ideally situated to identify a relatively idle system. Further, we should order scrub IOs to be minimally impactful. That may mean having a small queue of outstanding scrub IOs that we’d send to the device, or it might mean that we try to organize large, dense contiguous scrub reads by sorting by LBA.

Further, after writing data to disk, there’s a window for repair while the data is still in the ARC. If ZFS could read that data back, then it could not only detect the failure, but correct it even in a system without redundant on-disk data.

- ahl

== Lustre feature ideas ==

@@ Line 142: / Line 142: @@
 * property(?) to charge quota usage by before-compression-dedup size.
+=== Periodic Data Validation ===
+Problem: ZFS does a great job detecting data errors due to lost writes, media, errors, storage bugs, but only when the user actually accesses the data. Scrub in its current form can take a very long time and can have highly deleterious impacts to overall performance.
+Data validation in ZFS should be specified according to data or business needs. Kicking off a scrub every day, week, or month doesn’t directly express that need. More likely, the user wants to express their requirements like this:
+* “Check all old data at least once per month”
+* “Make sure all new writes are verified within 1 day”
+* “Don’t consume more than 50% of my IOPS capacity”
+Note that constraints like these may overlap, but that’s fine — the user just must indicate priority and the system must alert the user of violations.
+I suggest a new type of scrub. Constraints should be expressed and persisted with the pool. Execution of the scrub should tie into the ZFS IO scheduler. That subsystem is ideally situated to identify a relatively idle system. Further, we should order scrub IOs to be minimally impactful. That may mean having a small queue of outstanding scrub IOs that we’d send to the device, or it might mean that we try to organize large, dense contiguous scrub reads by sorting by LBA.
+Further, after writing data to disk, there’s a window for repair while the data is still in the ARC. If ZFS could read that data back, then it could not only detect the failure, but correct it even in a system without redundant on-disk data.
+- ahl
 == Lustre feature ideas ==

Difference between revisions of "Projects"

Navigation menu

Search