Reproducibility tools for research and teaching
A lightweight Python package for tracking provenance and publishing outputs in computational research projects. Ensures full reproducibility by tracking git state, input/output checksums, and build metadata.
- Provenance Tracking: Automatically capture git state, input/output checksums, timestamps, and build commands
- Flexible Publishing: Two-mode system for publishing complete analyses or specific files
- Git Safety Checks: Enforce clean working tree, current HEAD, and upstream sync before publishing
- Teaching-Friendly: Simple API, clear documentation, minimal dependencies
For local development or teaching:
pip install -e /home/stanton/01_work/infrastructure/40_lib/python/repro-toolsOr add to your conda environment.yml:
dependencies:
- pip:
- -e /home/stanton/01_work/infrastructure/40_lib/python/repro-toolspip install repro-toolsfrom pathlib import Path
from repro_tools import write_build_record
# In your build script
write_build_record(
out_meta=Path("output/provenance/my_analysis.yml"),
artifact_name="my_analysis",
command=["python", "build_my_analysis.py", "--data", "data.csv"],
repo_root=Path("."),
inputs=[Path("data.csv")],
outputs=[Path("output/figure.pdf"), Path("output/table.tex")],
)from repro_tools import auto_build_record
# Simpler version - auto-detects artifact name, repo root, command
auto_build_record(
out_meta=Path("output/provenance/my_analysis.yml"),
inputs=[Path("data.csv")],
outputs=[Path("output/figure.pdf"), Path("output/table.tex")],
)from pathlib import Path
from repro_tools import publish_analyses
publish_analyses(
project_root=Path("."),
paper_root=Path("paper"),
analysis_names=["price_base", "remodel_base"],
kinds=["figures", "tables"],
require_current_head=True, # Strict mode
)from repro_tools import publish_files
publish_files(
project_root=Path("."),
paper_root=Path("paper"),
file_paths=[
Path("output/figures/figure1.pdf"),
Path("output/tables/table1.tex"),
],
)git_state(repo_root)- Capture git commit, branch, dirty status, ahead/behind countssha256_file(path)- Compute SHA256 checksum of a filewrite_build_record(...)- Write complete build provenance recordauto_build_record(...)- Simplified version with auto-detection
publish_analyses(...)- Publish all outputs from specified analysespublish_files(...)- Publish specific output filescopy_if_changed(src, dst)- Copy only if content differsload_yml(path)/save_yml(path, obj)- YAML utilities
# Interactive scaffolding
repro-new-project
# Non-interactive with all languages
repro-new-project my-project --python --julia --stata
# Python-only project
repro-new-project my-project --python
# Custom configuration
repro-new-project my-project \
--python --julia \
--gpu \
--studies "analysis1,analysis2"Creates complete project structure with:
- Environment setup (Python, Julia, Stata)
- Example scripts for selected languages
- Makefile with build targets
- Git submodule for repro-tools
- Documentation and configuration
repro-record \
--artifact my_analysis \
--out-meta output/provenance/my_analysis.yml \
--inputs data.csv \
--outputs output/figure.pdf output/table.tex# Publish complete analyses
repro-publish analyses \
--paper-root paper \
--names "price_base remodel_base" \
--require-current-head
# Publish specific files
repro-publish files \
--paper-root paper \
--files "output/figures/fig1.pdf output/tables/tab1.tex"All publishing functions enforce configurable safety checks:
allow_dirty(default:False) - Refuse to publish from dirty working treerequire_not_behind(default:True) - Refuse if branch behind upstreamrequire_current_head(default:False) - Require artifacts from current HEAD
Build records are stored as YAML:
artifact: price_base
built_at_utc: '2026-01-18T05:30:00+00:00'
command: [python, build_price_base.py, --data, data.csv]
git:
is_git_repo: true
commit: cbb163e7a1b2c3d4...
branch: main
dirty: false
ahead: 0
behind: 0
inputs:
- path: /path/to/data.csv
sha256: 48917387ef250e...
bytes: 325
mtime: 1737179400.123
outputs:
- path: /path/to/output/figure.pdf
sha256: 3855687dcbeff3...
bytes: 12482
mtime: 1737179410.456See examples/makefile_integration/ for complete Makefile templates.
See examples/ directory:
basic_usage.py- Simple build script with provenancemakefile_integration/- Complete Make-based workflowpublishing_workflow/- Two-mode publishing examples
# Set up environment (one command)
make env
# Run tests
make test
# Run tests with coverage
make coverage
# Format code
make format
# Type checking
make typecheck
# Run all checks (lint + test)
make checkMIT License - See LICENSE file
This package is primarily for personal research and teaching. Feel free to use and adapt for your own projects.
If you use this package in your research, please cite:
@software{stanton2026reprotools,
title = {repro-tools: Reproducibility Tools for Research},
author = {Stanton, Richard},
year = {2026},
url = {https://github.com/rhstanton/repro-tools}
}