-
Notifications
You must be signed in to change notification settings - Fork 34
Discussion of zstd and warcs #105
Description
A discussion broke out in the End of Term Archive 2024 slack about zstd warcs. zstd is a relatively new compression format and should be discussed early and often before being adopted by the archive community. In the EOT 2024 case, Archive Team submitted many zstd warcs.
Proposal: https://iipc.github.io/warc-specifications/specifications/warc-zstd/
One comment brought up already is that the possible dictionary frame at the start of every warc might make playback slower.
Another comment is that the zstd dictionary frame is not a WARC record. That might not be a good choice when some (most?) warc tools don't support zstd, and will fail to correctly index or extract or replay any records from a zstd warc.
But the most important point that I'd like to make is that we should discuss these issues early and often, and not after petabytes of warc files are generated with a new format that was not discussed by the community.