-
Notifications
You must be signed in to change notification settings - Fork 34
New WARC header to provide hint where to find original records for revisit records. #111
Description
The revisit record generally includes WARC-Refers-To-Target-URI and WARC-Refers-To-Target-Date to indicate the URL / timestamp
of the original record that the revisit refers to.
It would also be useful to provide an additional hint as to where this data may be found, through some sort of user-defined identifier,
which, if provided, could be used to help find the original records. The header can be a URI.
Some name ideas for this header:
WARC-Revisit-Original-SourceWARC-Revisit-Original-LocationWARC-Revisit-Original-Identifier
For our use case, we would this to store the name of a WACZ file that contains the original, but not necessarily the location, as that may change, eg:
WARC-Revisit-Original-Source: file://somefile.wacz
WARC-Revisit-Original-Source: urn:wacz:somefile.wacz
Other identifiers:
WARC-Revisit-Original-Source: https://example.com/path/to/known.warc.gz
WARC-Revisit-Original-Source: urn:sha-256:<...>
WARC-Revisit-Original-Source: urn:customurn:...
Idea it make this generic so that it could be used for variety of identifiers, though could also go with something more specific (like WARC-Revisit-Original-WACZ-Filename instead).
Would anyone else use such a header if it existed? Thoughts on the name, eg. source vs location?