Skip to content

Correct WARC Content-Type header in revisit records #40

@sebastian-nagel

Description

@sebastian-nagel

The WARC specification 1.1 shows an example revisit record with the WARC header Content-Type: message/http. The WARC writer writes revisit records following this example.

However, according to iipc/warc-specifications#55 this should be instead Content-Type: application/http;msgtype=response to be consistent with other HTTP response records. See also the WARC spec with community annotations.

Example revisit record (archive link):

WARC/1.0
WARC-Type: revisit
WARC-Date: 2026-02-06T22:04:59Z
WARC-Record-ID: <urn:uuid:ae2ee200-4cf8-424e-ab30-5a6a687bf408>
Content-Length: 3132
Content-Type: message/http
WARC-Warcinfo-ID: <urn:uuid:3e025a91-8f07-4f57-ae26-8a96b477da02>
WARC-Refers-To: <urn:uuid:53e28592-ed1d-443a-bfc7-88056a790a5c>
WARC-IP-Address: 208.80.154.224
WARC-Target-URI: https://de.wikipedia.org/wiki/Wikipedia:WikiCon_2025
WARC-Protocol: h2
WARC-Protocol: tls/1.3
WARC-Cipher-Suite: TLS_AES_128_GCM_SHA256
WARC-Refers-To-Target-URI: https://de.wikipedia.org/wiki/Wikipedia:WikiCon_2025
WARC-Refers-To-Date: 2026-01-24T17:23:54Z
WARC-Profile: http://netpreserve.org/warc/1.1/revisit/server-not-modified
WARC-Block-Digest: sha1:4GRJCHWWKHDYTDWRQQ4TCFO46XUXZLTX

HTTP/1.1 304 
date: Fri, 06 Feb 2026 10:55:35 GMT
...
Content-Length: 0

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions