Delphix Brainstorming

From OpenZFS
Jump to navigation Jump to search

September 18, 2013

Leading up to Delphix's semi-annual Engineering Kick Off (EKO), there was a ZFS brainstorming meeting held at Delphix HQ. Below are ideas that came from that meeting, ranging from ideas that are immediately actionable to more long-term and strategic thoughts.

Bold indicates high-priority items. (?) indicates more investigation needed before defining a project.

  • ZFS self-tuning
  • Estimate performance, consumption after destroys
    • Predict performance improvement of freeing up space
  • Ping-pong write, re-use blocks for the same file as it gets re-written
  • Investigate effectiveness of pre-fetch (?)
    • lock contention
  • Pure Storage collaboration
  • Performance tests with compression
    • Using compression histograms gathered from customers
  • Missing metrics on ZFS performance
  • DMU changes for async read, larger block sizes
  • Trim (issuing trim commands to the storage)
  • Write performance
    • multi-block ZIL writer (for RAC) (?)
    • How many metaslabs (?)
  • DTrace provider
  • Compressed ARC - George Wilson
  • ARC observability
    • ARC sizing stats
      • Hit rate, theoretical optimal hit rate based on ghost lists, projection of hit rate given more memory
  • Channel programs
  • LUN removal
  • Cross-pool cloning / distributed DSL
    • Shadow replication, shadow blocks
    • Streaming replication, send out blocks in syncing context
    • Lightweight replication, remove some responsibility from app stack
  • one pool/different vdev “classes”?
  • Resumable send - Max Grossman
  • Compressed send(?)
  • Data masking / block differencing (?)
    • Efficiently store transformed data using a bit function from original data to transformed data
  • Pool fragmentation analytics
    • Provide feedback on when to add storage
    • Provide “% fragmented” metric
  • Data rebalancing/redistribution/defrag/placement
    • Do we care about defrag on SSD?
  • Testing
    • Adding coverage for new features
    • Realistic compression ratios in testing
    • Automated tests for every change
    • Better userland test coverage (full stack from IOCTL down)
    • Better performance tests
  • Scrub should be better (some kind of SLA)
    • Some kind of guarantee on data corruption, how quickly it will be caught?
    • Better error reporting
    • zfs scrub (on specific filesystems)
    • LBA ordered traversals
    • pause/resume scrub (already have stop/restart)
  • Fix broken blkptrs (automated?)
    • Include estimates on reconstruction time
  • Dataset property that sets owner of all contained files in constant time