Skip to content

Support external dictionaries #3

@nhz2

Description

@nhz2

Some libraries such as zlib and zstd support using external dictionaries to improve the compression performance for small files: https://facebook.github.io/zstd/zstd_manual.html#Chapter10

The same dictionary is then required to decode the data.

This can be implemented with the current API by adding a dictionary field to the Codec struct, but there are some complications.

  1. When encoding, the raw dictionary isn't directly useful. It first needs to be digested. It would be nice to cache the digested dictionary somewhere because often the same dictionary is used repeatedly.
  2. Sometimes dictionaries have an associated ID, and the encoded data has a dictionary ID stored in a header. I think the idea is that a decoder could have multiple dictionaries, and then pick one to use based on the ID in the header, though I'm not sure how this would work as part of a larger format like Zarr.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions