Skip to content

Make checksum comparisons more robust to insignificant file changes #16

@ezwelty

Description

@ezwelty

The archiving process compares file checksums and only stores the new file if it does not match an existing file in the archive. This protects from unnecessary duplication, but some source files include information like timestamps such that they always appear new. Perhaps this could be addressed by applying small tweaks to the downloaded file before calculating the checksum.

Files

  • ArcGIS API: Strip timestamps from response (GML, JSON?). Download as CSV if supported?
  • WFS: Strip timestamps from response (GML, JSON). Download as CSV if supported?

Web page

  • MHTML: Strip timestamp and de-randomize boundary hash
  • HTML: Testing needed
  • PDF: Testing needed
  • PNG: Testing needed

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions