Skip to content

GitHub‐Based Graph Versioning with Memento Integration

Martynas Jusevičius edited this page Jan 6, 2026 · 1 revision

This document outlines a high-level design and key findings for implementing versioned RDF named graphs using GitHub as a backend and exposing them via the Memento protocol.

Goals

  • Enable immutable, timestamped versions of RDF graphs
  • Support point-in-time retrieval (Accept-Datetime) via Memento
  • Allow undo and restore without complex triplestore features
  • Avoid embedding backend-specific metadata in RDF
  • Use GitHub as the canonical audit and versioning layer

Architecture Overview

Each RDF named graph is represented as a .nt (N-Triples) file in a GitHub repository. The GitHub API is used to:

  • Push new versions (via full file uploads)
  • Retrieve past versions (via commit history)
  • Revert changes (via re-committing previous file states)

Git commit timestamps map to Memento-Datetime values. Each commit implicitly defines a Memento of the graph.

File Mapping

  • Each named graph = one .nt file (e.g., graphs/products.nt)
  • Files are committed to a specific branch (e.g., main)
  • Optionally namespaced by dataspace or logical grouping

HTTP Method Mappings

HTTP Method Behavior GitHub API Mapping
PUT Replace full graph PUT /repos/.../contents/graphs/X.nt
PATCH Modify graph (send new full content) Same as PUT (no diff support needed)
DELETE Remove graph DELETE /repos/.../contents/graphs/X.nt
GET Retrieve current or historic version GET ...?ref=<sha>

Version Fingerprinting

  • Blob SHA: Git's hash of the file content (blob <len>\0<bytes>) is the canonical fingerprint
  • Commit SHA: Identifies full repository state, maps to a Memento
  • No hashes stored in RDF: All fingerprinting is external
  • Optional: ETag = "git-blob-<sha>"

Memento Integration

  • Each named graph has an Original Resource URI (e.g., /data/products)
  • Mementos are identified by commit SHA or timestamped URI
  • Memento headers:
    • Memento-Datetime: Git commit timestamp
    • ETag: Git blob SHA
    • Link: to original, timegate, and timemap

TimeGate Resolution Flow

  1. GET /data/products with Accept-Datetime: t
  2. Resolve to closest commit ≤ t for graphs/products.nt
  3. Return file from that commit with Memento headers

Undo and Restore

  • Undo = re-commit a previous version of the file (identified by commit or blob SHA)
  • Restore = fetch snapshot from GitHub, re-import into triplestore or overwrite file
  • No need for SPARQL update logs or quad-store versioning

Trust and Provenance

  • Git history = full audit trail (author, time, reason)
  • Commit metadata can store provenance context (in commit messages or sidecar files)
  • No internal provenance RDF required (optional for advanced cases)

Benefits

  • Git-native tooling (diffs, PRs, CI)
  • Immutable, verifiable graph snapshots
  • Clean separation of RDF content from backend metadata
  • Memento support without triplestore versioning features

Constraints and Tradeoffs

  • No SPARQL queries across versions (e.g., diffs in SPARQL)
  • Restore requires file re-import or overwrite
  • No partial graph updates (always full .nt push)
  • Depends on GitHub availability and API rate limits

Optional Extensions

  • GitHub Actions for RDF validation or auto-snapshotting
  • Support GitHub App auth instead of personal tokens
  • Use blob SHA in URIs (e.g., /data/products?version=sha)
  • Maintain local index for fast datetime-to-commit resolution

This design is intended to support a clean, agent-safe, interoperable versioning system for RDF named graphs using widely available infrastructure. It can be iterated incrementally while ensuring trust and simplicity.