Skip to content

Discussion of zstd and warcs #105

@wumpus

Description

@wumpus

A discussion broke out in the End of Term Archive 2024 slack about zstd warcs. zstd is a relatively new compression format and should be discussed early and often before being adopted by the archive community. In the EOT 2024 case, Archive Team submitted many zstd warcs.

Proposal: https://iipc.github.io/warc-specifications/specifications/warc-zstd/

One comment brought up already is that the possible dictionary frame at the start of every warc might make playback slower.

Another comment is that the zstd dictionary frame is not a WARC record. That might not be a good choice when some (most?) warc tools don't support zstd, and will fail to correctly index or extract or replay any records from a zstd warc.

But the most important point that I'd like to make is that we should discuss these issues early and often, and not after petabytes of warc files are generated with a new format that was not discussed by the community.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions