Skip to content

rhstanton/project_template

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

134 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Reproducible Research Template

License: MIT Python 3.11 Julia 1.10+ GNU Make 4.3+ Tests

A minimal template for reproducible research with provenance tracking and automated builds

This template provides a complete workflow for building research artifacts (figures and tables) with full provenance tracking, separating build outputs from published results.


πŸ‘₯ Who Is This For?

This template is designed for:

  • Economists & Social Scientists conducting empirical research with data analysis
  • Researchers who need reproducible, traceable research workflows
  • Anyone who wants to:
    • Track exactly what code produced each figure and table
    • Use multiple languages (Python, Julia, Stata) in one project
    • Separate exploratory analysis from publication-ready outputs
    • Ensure their work can be replicated by reviewers and future researchers
    • Meet journal requirements for replication packages

Not a fit if:

  • You only need a simple Jupyter notebook (this adds structure for complex projects)
  • You don't care about reproducibility or provenance tracking
  • You're doing pure software development (not research)

🎯 Creating a New Project from This Template

Option 1: Clone with submodules (recommended):

git clone --recursive https://github.com/rhstanton/project_template.git my-project
cd my-project
make environment

Option 2: Clone normally (submodules auto-initialize):

git clone https://github.com/rhstanton/project_template.git my-project
cd my-project
make environment  # Automatically initializes git submodules

⚠️ IMPORTANT: When creating a new project, do NOT manually copy the lib/repro-tools/ directory. Let git handle it as a submodule. The Makefile automatically initializes it when you run make environment.

Updating repro-tools:

  • Quick update: make update-submodules (updates submodule only)
  • Full update: make update-environment (updates submodule + reinstalls environment)
  • See docs/submodule_cheatsheet.md for details

πŸš€ Quick Start

First time setup (required once, ~10-15 minutes):

make environment    # Install Python, Julia, Stata packages + initialize submodules

Verify setup:

make verify         # Quick smoke test (~1 minute)

To build all artifacts:

make all           # Build figures + tables + provenance (~5 minutes)

To publish to paper directory:

make publish       # Copy outputs to paper/ with provenance

To verify outputs:

make test-outputs  # Check all expected files exist

To test setup:

make examples      # Run example scripts

Need help? See docs/journal_editor_readme.md for journal editors.


οΏ½ VS Code Users: No Command Line Required!

Prefer working in VS Code? Everything works through the UI:

  1. Install extensions (VS Code will prompt you)
  2. Press Ctrl+Shift+P β†’ type "task" β†’ browse available tasks
  3. Press Ctrl+Shift+B to build everything
  4. Press F5 to debug Python scripts

Full guide: GETTING_STARTED_VSCODE.md Cheat sheet: .vscode/QUICK_REFERENCE.md Details: docs/vscode_integration.md

All Make commands are available as VS Code tasks - you can work entirely in the GUI!


οΏ½πŸ“Š What This Template Provides

Core Features

  • Reproducible builds: GNU Make orchestration with grouped targets
  • Provenance tracking: Full git state + input/output SHA256 hashes
  • Build/publish separation: Build in output/, publish to paper/
  • Multi-language support: Python, Julia, Stata
  • Jupyter Notebook support: Parameterized notebooks via papermill with full provenance
  • VS Code integration: Complete workflow via GUI (see docs/vscode_integration.md)
  • Code quality tools: Integrated linting (ruff), formatting (black + ruff), and type checking (mypy)
  • Automated testing: pytest-based test suite for reliability
  • Output comparison: Diff current vs. published outputs
  • Pre-submission checks: Comprehensive validation before journal submission
  • Replication reports: Auto-generated HTML reports for reviewers
  • Example workflows: Sample scripts for all three languages

Directory Structure

project_template/
β”œβ”€β”€ run_analysis.py    # Unified analysis script (handles all studies)
β”œβ”€β”€ data/              # Input datasets
β”œβ”€β”€ env/               # Environment setup (Python/Julia/Stata)
β”‚   └── examples/      # Sample scripts for testing
β”œβ”€β”€ lib/               # Git submodules (repro-tools)
β”œβ”€β”€ output/            # Build outputs (can be deleted/rebuilt)
β”‚   β”œβ”€β”€ figures/       # Generated PDFs
β”‚   β”œβ”€β”€ tables/        # Generated LaTeX tables
β”‚   β”œβ”€β”€ provenance/    # Per-artifact build records
β”‚   └── logs/          # Build logs
β”œβ”€β”€ paper/             # Published outputs (separate git repo)
β”‚   β”œβ”€β”€ figures/       # Published figures
β”‚   β”œβ”€β”€ tables/        # Published tables
β”‚   └── provenance.yml # Aggregated publication provenance
β”œβ”€β”€ scripts/           # Shared utilities (provenance.py, publish_artifacts.py)
└── shared/            # Configuration, CLI and validation utilities
    β”œβ”€β”€ config.py      # Study configurations (STUDIES dictionary)
    β”œβ”€β”€ cli.py         # Enhanced command-line interface tools
    └── config_validator.py  # Configuration validation

See docs/directory_structure.md for complete details.


🎯 Workflows

Building Artifacts

Each analysis script follows a standard pattern:

make price_base       # Builds one artifact
make all              # Builds all artifacts

This produces three outputs per artifact (atomically):

  • output/figures/<name>.pdf - The figure
  • output/tables/<name>.tex - The table
  • output/provenance/<name>.yml - Build metadata

Publishing Results

make publish                              # Publish all artifacts
make publish PUBLISH_ARTIFACTS="price_base"  # Publish specific ones
make publish REQUIRE_CURRENT_HEAD=1         # Strict: require current HEAD

Publishing enforces git safety checks:

  • Working tree must be clean
  • Branch must not be behind upstream
  • Optionally require artifacts from current HEAD

See docs/publishing.md for details.

Provenance Chain

Build provenance (output/provenance/<name>.yml):

artifact: price_base
built_at_utc: '2026-01-17T04:04:49+00:00'
command: [run_analysis.py, price_base]
git:
  commit: cbb163e
  branch: main
  dirty: false
inputs:
  - path: data/housing_panel.csv
    sha256: 48917387...
outputs:
  - path: output/figures/price_base.pdf
    sha256: 3855687d...

Publication provenance (paper/provenance.yml):

  • Aggregates all build records
  • Tracks when each artifact was published
  • Records analysis repo git state at publication time

See docs/provenance.md for complete explanation.


πŸ”§ Adding New Analyses

Adding a new analysis is simple - just add configuration to config.py:

  1. Add to config.py STUDIES:

    STUDIES = {
        "price_base": { ... },
        "remodel_base": { ... },
        "my_new_study": {
            "data": DATA_FILES["housing"],
            "xlabel": "Year",
            "ylabel": "My metric",
            "title": "My analysis title",
            "groupby": "region",
            "yvar": "my_variable",
            "xvar": "year",
            "table_agg": "mean",
            "figure": OUTPUT_DIR / "figures" / "my_new_study.pdf",
            "table": OUTPUT_DIR / "tables" / "my_new_study.tex",
        },
    }
  2. Add to Makefile ANALYSES and create pattern definition:

    ANALYSES := price_base remodel_base my_new_study
    
    # Add pattern definition:
    my_new_study.script  := run_analysis.py
    my_new_study.runner  := $(PYTHON)
    my_new_study.inputs  := $(DATA)
    my_new_study.outputs := $(OUT_FIG_DIR)/my_new_study.pdf $(OUT_TBL_DIR)/my_new_study.tex $(OUT_PROV_DIR)/my_new_study.yml
    my_new_study.args    := my_new_study
  3. Build and publish:

    make my_artifact
    make publish PUBLISH_ARTIFACTS="my_artifact"

🐍 Python Environment

Managed via conda with automatic Julia integration:

# Environment wrapper with Julia bridge
env/scripts/runpython script.py

# Direct conda activation (alternative)
conda activate .env
python script.py

Packages (see env/python.yml):

  • pandas, matplotlib, numpy
  • pyyaml (for provenance)
  • juliacall (Python/Julia interop)
  • jinja2 (for pandas LaTeX export)

πŸ“š Julia Environment

Pure Julia:

env/scripts/runjulia script.jl

Python/Julia interop (via juliacall):

from juliacall import Main as jl
jl.seval("using DataFrames")
df = jl.DataFrame(x=[1,2,3], y=[4,5,6])

Packages (see env/Project.toml):

  • PythonCall (Julia/Python interop)
  • DataFrames

Julia is auto-installed to .julia/pyjuliapkg/ via juliacall.


πŸ“Š Stata Environment (Optional)

env/scripts/runstata script.do

Packages (see env/stata-packages.txt):

  • reghdfe, ftools, estout

Installed to .stata/ado/plus/ (local to project).


πŸ§ͺ Examples

Test your setup:

make examples          # Run all examples
make sample-python     # Python example
make sample-julia      # Pure Julia example
make sample-juliacall  # Python/Julia interop
make sample-stata      # Stata example (if installed)

See env/examples/README.md for details.


βš™οΈ System Requirements

  • OS: Linux or macOS (Windows requires WSL)
  • RAM: 8GB minimum (16GB recommended)
  • Disk: 5GB (2GB environment + 3GB cache)
  • Time: ~15 minutes total (10 min setup + 5 min execution)
  • Software: GNU Make 4.3+, conda/mamba (auto-installed if needed)
  • Optional: Nix (for reproducible dev shell via flake.nix)

πŸ” Makefile Targets

make                  # Brief guidance (essential commands)
make help             # Detailed command reference (all targets)
make info             # Comprehensive project information

make environment      # Setup Python/Julia/Stata (one-time)
make verify           # Verify environment and data (quick check)
make all              # Build all artifacts
make <artifact>       # Build specific artifact

make test-outputs     # Verify all expected outputs exist
make publish          # Publish all to paper/
make publish PUBLISH_ARTIFACTS="x y"  # Publish specific
make publish REQUIRE_CURRENT_HEAD=1   # Strict: require current HEAD

make test             # Run test suite
make lint             # Run code linter (ruff)
make format           # Auto-format code (black + ruff)
make type-check       # Run type checker (mypy)
make check            # Run all quality checks (lint + format + type + test)
make diff-outputs     # Compare current vs published outputs
make pre-submit       # Run pre-submission checklist
make replication-report  # Generate replication report
make journal-package  # Create journal submission package
make examples         # Run example scripts
make clean            # Remove all outputs

πŸ“– Documentation

Quick Start

Detailed Guides

For Journal Submission

Examples

See examples/ directory for sample scripts in Python, Julia, and Stata.


πŸ”’ Git Integration

Provenance tracking requires git:

git init
git add -A
git commit -m "Initial commit"
make all              # Builds include git commit hash
make publish          # Tracks publication from specific commit

The paper/ directory is intended as a separate git repository for Overleaf integration.


οΏ½ Documentation

Quick Start

Detailed Guides

Examples

See examples/ directory for sample scripts in Python, Julia, and Stata.


πŸ“ž Troubleshooting

Quick fixes:

  • Import errors: Use env/scripts/runpython not bare python
  • Build failures: make clean && make all
  • Environment issues: make cleanall && make environment

Detailed help: See docs/troubleshooting.md for comprehensive solutions.


🀝 Contributing

We welcome contributions! Whether you're fixing bugs, adding features, or improving documentation:

Development setup:

git clone https://github.com/rhstanton/project_template.git
cd project_template
make environment
make check  # Run tests, linting, formatting

πŸ“„ License

MIT License - See LICENSE file.

Summary: Free to use, modify, and distribute. Attribution appreciated but not required.


πŸ“š Citation

If you use this template in your research, please cite:

@software{stanton2026template,
  title = {Reproducible Research Template},
  author = {Stanton, Richard},
  year = {2026},
  url = {https://github.com/rhstanton/project_template}
}

See CITATION.cff for structured metadata.


🏷️ Version

Current version: 1.0.0

  • Check version: env/scripts/runpython run_analysis.py --version or make info
  • Version file: _version.py
  • Changelog: CHANGELOG.md

Dependencies:

  • repro-tools: 0.2.0 (git submodule at lib/repro-tools/)

🎯 For Journal Submission

Authors preparing replication packages:

make journal-package    # Creates clean replication package

This creates a fresh git repository excluding:

  • Development files (.github/, .vscode/, etc.)
  • Author-only directories (data-construction/, notes/, paper/)
  • Internal documentation (TEMPLATE_USAGE.md, etc.)

See JOURNAL_EXCLUDE for complete list and docs/journal_editor_readme.md for journal editor instructions.


Last updated: January 17, 2026

About

Template for research project

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors