Skip to content

Should get_memento() ignore the mode in archive.org URLs? #115

@Mr0grog

Description

@Mr0grog

Currently, get_memento() can be called in a few different ways:

  • get_memento(archived_url) requests a memento using the URL, timestamp, and mode that are baked into the URL. (archived_url means a URL like https://web.archive.org/web/[YYYYMMDDHHmmss][mode]/[url])
  • get_memento(cdx_record, mode=mode) requests a memento with the URL and timestamp from the CDX record object, and the given mode (where mode defaults to original)
  • get_memento(url, timestamp, mode=mode) requests a memento of the given URL at the given timestamp with the given mode (again, mode is optional and defaults to original)

Folks using this library will usually want mode=Mode.original, which is what we typically do by default. BUT since an archive URL has the mode baked in, we obey whatever mode was in the URL.

The problem is that mode as a concept is a little advanced and requires extra thinking about what you’re asking for. Folks are prone to copying a URL from their browser and dropping it in here to try things out, or accidentally using cdx_record.view_url instead of just passing the CDX record directly without realizing that they are changing modes (or what that even means!). For example, #109 uncovered a legitimate issue with view mode, but the user didn’t actually want to be using view mode at all! (Once I explained that, it turned out the actual issue wasn’t even a blocker for him — he switched to original mode and was good to go.)

So: should calling get_memento(archived_url) ignore the mode that’s in the URL and use whatever one is explicitly set as a parameter instead (as in all other cases, defaulting to original)? For example:

client.get_memento("https://web.archive.org/web/20230101000000/https://www.epa.gov/")

Currently gets you a memento in view mode. The change I’m thinking about would mean you’d get original mode instead here. If you wanted view mode, you’d have to ask for it explicitly:

client.get_memento("https://web.archive.org/web/20230101000000/https://www.epa.gov/", mode=Mode.view)

It would also mean all these calls get you the same result, instead of different ones:

client.get_memento("https://web.archive.org/web/20230101000000/https://www.epa.gov/")
client.get_memento("https://web.archive.org/web/20230101000000id_/https://www.epa.gov/")
client.get_memento("https://web.archive.org/web/20230101000000js_/https://www.epa.gov/")
client.get_memento("https://web.archive.org/web/20230101000000cs_/https://www.epa.gov/")
client.get_memento("https://web.archive.org/web/20230101000000im_/https://www.epa.gov/")
# Note different mode values ---------------------------------^^^

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions