-
Notifications
You must be signed in to change notification settings - Fork 145
GitHub‐Based Graph Versioning with Memento Integration
Martynas Jusevičius edited this page Jan 6, 2026
·
1 revision
This document outlines a high-level design and key findings for implementing versioned RDF named graphs using GitHub as a backend and exposing them via the Memento protocol.
- Enable immutable, timestamped versions of RDF graphs
- Support point-in-time retrieval (
Accept-Datetime) via Memento - Allow undo and restore without complex triplestore features
- Avoid embedding backend-specific metadata in RDF
- Use GitHub as the canonical audit and versioning layer
Each RDF named graph is represented as a .nt (N-Triples) file in a GitHub repository. The GitHub API is used to:
- Push new versions (via full file uploads)
- Retrieve past versions (via commit history)
- Revert changes (via re-committing previous file states)
Git commit timestamps map to Memento-Datetime values. Each commit implicitly defines a Memento of the graph.
- Each named graph = one
.ntfile (e.g.,graphs/products.nt) - Files are committed to a specific branch (e.g.,
main) - Optionally namespaced by dataspace or logical grouping
| HTTP Method | Behavior | GitHub API Mapping |
|---|---|---|
PUT |
Replace full graph | PUT /repos/.../contents/graphs/X.nt |
PATCH |
Modify graph (send new full content) | Same as PUT (no diff support needed) |
DELETE |
Remove graph | DELETE /repos/.../contents/graphs/X.nt |
GET |
Retrieve current or historic version | GET ...?ref=<sha> |
-
Blob SHA: Git's hash of the file content (
blob <len>\0<bytes>) is the canonical fingerprint - Commit SHA: Identifies full repository state, maps to a Memento
- No hashes stored in RDF: All fingerprinting is external
-
Optional: ETag =
"git-blob-<sha>"
- Each named graph has an Original Resource URI (e.g.,
/data/products) - Mementos are identified by commit SHA or timestamped URI
- Memento headers:
-
Memento-Datetime: Git commit timestamp -
ETag: Git blob SHA -
Link: tooriginal,timegate, andtimemap
-
-
GET /data/productswithAccept-Datetime: t - Resolve to closest commit ≤
tforgraphs/products.nt - Return file from that commit with Memento headers
- Undo = re-commit a previous version of the file (identified by commit or blob SHA)
- Restore = fetch snapshot from GitHub, re-import into triplestore or overwrite file
- No need for SPARQL update logs or quad-store versioning
- Git history = full audit trail (author, time, reason)
- Commit metadata can store provenance context (in commit messages or sidecar files)
- No internal provenance RDF required (optional for advanced cases)
- Git-native tooling (diffs, PRs, CI)
- Immutable, verifiable graph snapshots
- Clean separation of RDF content from backend metadata
- Memento support without triplestore versioning features
- No SPARQL queries across versions (e.g., diffs in SPARQL)
- Restore requires file re-import or overwrite
- No partial graph updates (always full
.ntpush) - Depends on GitHub availability and API rate limits
- GitHub Actions for RDF validation or auto-snapshotting
- Support GitHub App auth instead of personal tokens
- Use blob SHA in URIs (e.g.,
/data/products?version=sha) - Maintain local index for fast datetime-to-commit resolution
This design is intended to support a clean, agent-safe, interoperable versioning system for RDF named graphs using widely available infrastructure. It can be iterated incrementally while ensuring trust and simplicity.