A minimal template for reproducible research with provenance tracking and automated builds
This template provides a complete workflow for building research artifacts (figures and tables) with full provenance tracking, separating build outputs from published results.
This template is designed for:
- Economists & Social Scientists conducting empirical research with data analysis
- Researchers who need reproducible, traceable research workflows
- Anyone who wants to:
- Track exactly what code produced each figure and table
- Use multiple languages (Python, Julia, Stata) in one project
- Separate exploratory analysis from publication-ready outputs
- Ensure their work can be replicated by reviewers and future researchers
- Meet journal requirements for replication packages
Not a fit if:
- You only need a simple Jupyter notebook (this adds structure for complex projects)
- You don't care about reproducibility or provenance tracking
- You're doing pure software development (not research)
Option 1: Clone with submodules (recommended):
git clone --recursive https://github.com/rhstanton/project_template.git my-project
cd my-project
make environmentOption 2: Clone normally (submodules auto-initialize):
git clone https://github.com/rhstanton/project_template.git my-project
cd my-project
make environment # Automatically initializes git submoduleslib/repro-tools/ directory. Let git handle it as a submodule. The Makefile automatically initializes it when you run make environment.
Updating repro-tools:
- Quick update:
make update-submodules(updates submodule only) - Full update:
make update-environment(updates submodule + reinstalls environment) - See docs/submodule_cheatsheet.md for details
First time setup (required once, ~10-15 minutes):
make environment # Install Python, Julia, Stata packages + initialize submodulesVerify setup:
make verify # Quick smoke test (~1 minute)To build all artifacts:
make all # Build figures + tables + provenance (~5 minutes)To publish to paper directory:
make publish # Copy outputs to paper/ with provenanceTo verify outputs:
make test-outputs # Check all expected files existTo test setup:
make examples # Run example scriptsNeed help? See docs/journal_editor_readme.md for journal editors.
Prefer working in VS Code? Everything works through the UI:
- Install extensions (VS Code will prompt you)
- Press
Ctrl+Shift+Pβ type "task" β browse available tasks - Press
Ctrl+Shift+Bto build everything - Press
F5to debug Python scripts
Full guide: GETTING_STARTED_VSCODE.md Cheat sheet: .vscode/QUICK_REFERENCE.md Details: docs/vscode_integration.md
All Make commands are available as VS Code tasks - you can work entirely in the GUI!
- Reproducible builds: GNU Make orchestration with grouped targets
- Provenance tracking: Full git state + input/output SHA256 hashes
- Build/publish separation: Build in
output/, publish topaper/ - Multi-language support: Python, Julia, Stata
- Jupyter Notebook support: Parameterized notebooks via papermill with full provenance
- VS Code integration: Complete workflow via GUI (see docs/vscode_integration.md)
- Code quality tools: Integrated linting (ruff), formatting (black + ruff), and type checking (mypy)
- Automated testing: pytest-based test suite for reliability
- Output comparison: Diff current vs. published outputs
- Pre-submission checks: Comprehensive validation before journal submission
- Replication reports: Auto-generated HTML reports for reviewers
- Example workflows: Sample scripts for all three languages
project_template/
βββ run_analysis.py # Unified analysis script (handles all studies)
βββ data/ # Input datasets
βββ env/ # Environment setup (Python/Julia/Stata)
β βββ examples/ # Sample scripts for testing
βββ lib/ # Git submodules (repro-tools)
βββ output/ # Build outputs (can be deleted/rebuilt)
β βββ figures/ # Generated PDFs
β βββ tables/ # Generated LaTeX tables
β βββ provenance/ # Per-artifact build records
β βββ logs/ # Build logs
βββ paper/ # Published outputs (separate git repo)
β βββ figures/ # Published figures
β βββ tables/ # Published tables
β βββ provenance.yml # Aggregated publication provenance
βββ scripts/ # Shared utilities (provenance.py, publish_artifacts.py)
βββ shared/ # Configuration, CLI and validation utilities
βββ config.py # Study configurations (STUDIES dictionary)
βββ cli.py # Enhanced command-line interface tools
βββ config_validator.py # Configuration validation
See docs/directory_structure.md for complete details.
Each analysis script follows a standard pattern:
make price_base # Builds one artifact
make all # Builds all artifactsThis produces three outputs per artifact (atomically):
output/figures/<name>.pdf- The figureoutput/tables/<name>.tex- The tableoutput/provenance/<name>.yml- Build metadata
make publish # Publish all artifacts
make publish PUBLISH_ARTIFACTS="price_base" # Publish specific ones
make publish REQUIRE_CURRENT_HEAD=1 # Strict: require current HEADPublishing enforces git safety checks:
- Working tree must be clean
- Branch must not be behind upstream
- Optionally require artifacts from current HEAD
See docs/publishing.md for details.
Build provenance (output/provenance/<name>.yml):
artifact: price_base
built_at_utc: '2026-01-17T04:04:49+00:00'
command: [run_analysis.py, price_base]
git:
commit: cbb163e
branch: main
dirty: false
inputs:
- path: data/housing_panel.csv
sha256: 48917387...
outputs:
- path: output/figures/price_base.pdf
sha256: 3855687d...Publication provenance (paper/provenance.yml):
- Aggregates all build records
- Tracks when each artifact was published
- Records analysis repo git state at publication time
See docs/provenance.md for complete explanation.
Adding a new analysis is simple - just add configuration to config.py:
-
Add to config.py STUDIES:
STUDIES = { "price_base": { ... }, "remodel_base": { ... }, "my_new_study": { "data": DATA_FILES["housing"], "xlabel": "Year", "ylabel": "My metric", "title": "My analysis title", "groupby": "region", "yvar": "my_variable", "xvar": "year", "table_agg": "mean", "figure": OUTPUT_DIR / "figures" / "my_new_study.pdf", "table": OUTPUT_DIR / "tables" / "my_new_study.tex", }, }
-
Add to Makefile ANALYSES and create pattern definition:
ANALYSES := price_base remodel_base my_new_study # Add pattern definition: my_new_study.script := run_analysis.py my_new_study.runner := $(PYTHON) my_new_study.inputs := $(DATA) my_new_study.outputs := $(OUT_FIG_DIR)/my_new_study.pdf $(OUT_TBL_DIR)/my_new_study.tex $(OUT_PROV_DIR)/my_new_study.yml my_new_study.args := my_new_study
-
Build and publish:
make my_artifact make publish PUBLISH_ARTIFACTS="my_artifact"
Managed via conda with automatic Julia integration:
# Environment wrapper with Julia bridge
env/scripts/runpython script.py
# Direct conda activation (alternative)
conda activate .env
python script.pyPackages (see env/python.yml):
- pandas, matplotlib, numpy
- pyyaml (for provenance)
- juliacall (Python/Julia interop)
- jinja2 (for pandas LaTeX export)
Pure Julia:
env/scripts/runjulia script.jlPython/Julia interop (via juliacall):
from juliacall import Main as jl
jl.seval("using DataFrames")
df = jl.DataFrame(x=[1,2,3], y=[4,5,6])Packages (see env/Project.toml):
- PythonCall (Julia/Python interop)
- DataFrames
Julia is auto-installed to .julia/pyjuliapkg/ via juliacall.
env/scripts/runstata script.doPackages (see env/stata-packages.txt):
- reghdfe, ftools, estout
Installed to .stata/ado/plus/ (local to project).
Test your setup:
make examples # Run all examples
make sample-python # Python example
make sample-julia # Pure Julia example
make sample-juliacall # Python/Julia interop
make sample-stata # Stata example (if installed)See env/examples/README.md for details.
- OS: Linux or macOS (Windows requires WSL)
- RAM: 8GB minimum (16GB recommended)
- Disk: 5GB (2GB environment + 3GB cache)
- Time: ~15 minutes total (10 min setup + 5 min execution)
- Software: GNU Make 4.3+, conda/mamba (auto-installed if needed)
- Optional: Nix (for reproducible dev shell via
flake.nix)
make # Brief guidance (essential commands)
make help # Detailed command reference (all targets)
make info # Comprehensive project information
make environment # Setup Python/Julia/Stata (one-time)
make verify # Verify environment and data (quick check)
make all # Build all artifacts
make <artifact> # Build specific artifact
make test-outputs # Verify all expected outputs exist
make publish # Publish all to paper/
make publish PUBLISH_ARTIFACTS="x y" # Publish specific
make publish REQUIRE_CURRENT_HEAD=1 # Strict: require current HEAD
make test # Run test suite
make lint # Run code linter (ruff)
make format # Auto-format code (black + ruff)
make type-check # Run type checker (mypy)
make check # Run all quality checks (lint + format + type + test)
make diff-outputs # Compare current vs published outputs
make pre-submit # Run pre-submission checklist
make replication-report # Generate replication report
make journal-package # Create journal submission package
make examples # Run example scripts
make clean # Remove all outputs- QUICKSTART.md - Get up and running in 5 minutes
- CHANGELOG.md - Version history and release notes
- docs/environment.md - Environment setup and management
- docs/provenance.md - Provenance tracking system
- docs/publishing.md - Publishing workflow and safety checks
- docs/vscode_integration.md - Working entirely in VS Code
- docs/directory_structure.md - Project organization
- docs/julia_python_integration.md - Julia/Python bridge configuration
- docs/platform_compatibility.md - System requirements and GPU support
- docs/troubleshooting.md - Common issues and solutions
- docs/journal_editor_readme.md - One-page quick guide for reviewers
- docs/paper_output_mapping.md - Map paper figures/tables to outputs
- docs/expected_outputs.md - Verification checklist
- DATA_AVAILABILITY.md - Data access documentation
See examples/ directory for sample scripts in Python, Julia, and Stata.
Provenance tracking requires git:
git init
git add -A
git commit -m "Initial commit"
make all # Builds include git commit hash
make publish # Tracks publication from specific commitThe paper/ directory is intended as a separate git repository for Overleaf integration.
- QUICKSTART.md - Get up and running in 5 minutes
- CHANGELOG.md - Version history and release notes
- docs/environment.md - Environment setup and management
- docs/provenance.md - Provenance tracking system
- docs/publishing.md - Publishing workflow and safety checks
- docs/directory_structure.md - Project organization
- docs/julia_python_integration.md - Julia/Python bridge configuration
- docs/platform_compatibility.md - System requirements and GPU support
- docs/troubleshooting.md - Common issues and solutions
See examples/ directory for sample scripts in Python, Julia, and Stata.
Quick fixes:
- Import errors: Use
env/scripts/runpythonnot barepython - Build failures:
make clean && make all - Environment issues:
make cleanall && make environment
Detailed help: See docs/troubleshooting.md for comprehensive solutions.
We welcome contributions! Whether you're fixing bugs, adding features, or improving documentation:
- Bug reports: Open an issue
- Feature requests: Open an issue
- Pull requests: See CONTRIBUTING.md for guidelines
Development setup:
git clone https://github.com/rhstanton/project_template.git
cd project_template
make environment
make check # Run tests, linting, formattingMIT License - See LICENSE file.
Summary: Free to use, modify, and distribute. Attribution appreciated but not required.
If you use this template in your research, please cite:
@software{stanton2026template,
title = {Reproducible Research Template},
author = {Stanton, Richard},
year = {2026},
url = {https://github.com/rhstanton/project_template}
}See CITATION.cff for structured metadata.
Current version: 1.0.0
- Check version:
env/scripts/runpython run_analysis.py --versionormake info - Version file:
_version.py - Changelog: CHANGELOG.md
Dependencies:
- repro-tools: 0.2.0 (git submodule at
lib/repro-tools/)- Provides provenance tracking, CLI utilities, publishing tools
- See docs/submodule_cheatsheet.md for updates
Authors preparing replication packages:
make journal-package # Creates clean replication packageThis creates a fresh git repository excluding:
- Development files (
.github/,.vscode/, etc.) - Author-only directories (
data-construction/,notes/,paper/) - Internal documentation (
TEMPLATE_USAGE.md, etc.)
See JOURNAL_EXCLUDE for complete list and docs/journal_editor_readme.md for journal editor instructions.
Last updated: January 17, 2026