Stata Project Template for Reproducible Research

A template repository for reproducible Stata analysis projects using modern workflow tools and best practices. This template integrates IPA's Data Cleaning Guide and Stata coding standards with established practices from leading development economics research groups.

Warning

NEVER COMMIT DATA FILES TO GITHUB.

NEVER USE AI ASSISTANTS WITH PERSONALLY IDENTIFIABLE DATA.

YOU ARE REQUIRED TO REMOVE IDENTIFYING INFORMATION BEFORE CONNECTING AI

ASSISTANTS OR STORING IN ANY UNENCRYPTED LOCATION.

Use This Template

If you want to use this template for your own project:

Click the green Use this template button at the top of this page
Select Create a new repository
On the "Create a new repository" page:
1. Start with a template: PovertyAction/ipa-stata-template
2. Include all branches: Off
3. Select Owner, Repository name, Description, and Configuration as desired.
Click the green Create repository button.

Quick Start (Minimal Setup)

Tip

If you are new to this repository, start here but be sure to read the full README as well as the Getting Started Guide.

Get started with just Git and Stata - no additional tools required.

Prerequisites

Git installed (download)
- Windows: winget install --id Git.Git -e
- macOS: brew install git
- Linux: Use your package manager, e.g., sudo apt install git, brew install git
Stata 17+ installed and licensed

Steps

Clone the repository

git clone https://github.com/PovertyAction/ipa-stata-template.git
cd ipa-stata-template

Configure your Stata path

Copy .env-example to .env and set your Stata executable path:

# Windows example
STATA_CMD='C:\Program Files\Stata18\StataSE-64.exe'
STATA_EDITION='se'

# macOS example
# STATA_CMD='/Applications/Stata/StataSE.app/Contents/MacOS/StataSE'

# Linux example
# STATA_CMD='/usr/local/stata18/stata-se'

One-time setup (install setroot package)

# From command line - run once to install dependencies
just stata-setup

# Or from Stata directly:
do setup.do

Run the analysis pipeline

# Full pipeline
just stata-run

# Or run a single script
just stata-script 01_data_cleaning

# Or open Stata and run directly
# IMPORTANT: First change to the project directory in Stata:
cd ~/code/ipa-stata-template
do do_files/00_run.do
do do_files/00_run.do "01_data_cleaning"  // single script

[!TIP] If you get a "Root folder of project not found" error, make sure you've changed to the project directory in Stata using cd ~/code/ipa-stata-template before running the do-file.

Check outputs
- Tables: outputs/tables/
- Figures: outputs/figures/
- Logs: logs/

That's it! You now have a reproducible Stata workflow.

How Path Resolution Works

The template uses setroot to automatically find the project root by looking for a .here marker file. This means:

No c(pwd) dependency - scripts work regardless of where Stata is launched
No user-specific if c(user) blocks - paths resolve automatically
Full adopath isolation - only BASE + local ado/ for reproducibility
Runner pattern - run individual scripts with proper environment setup

Separating Code and Data Paths

The template supports storing code and data in separate locations. This is useful when:

Data is stored in a Cryptomator vault
Data is synced via Box/Dropbox/OneDrive to different locations on each machine
Multiple team members work from different directory structures
You want to keep large data files outside your git repository

Setup for Separate Data Storage

Copy the template file:
```
# Windows Command Prompt or bash
cp config.do.template config.do

# Windows PowerShell

Copy-Item config.do.template config.do
```
[!IMPORTANT] Never commit config.do - it's gitignored for a reason (contains user-specific paths). Always commit config.do.template so others know how to configure theirs.

Edit config.do to set your data path:

// Example: Dropbox
global data_root "C:/Users/YourName/Dropbox/Research/ProjectName/data"

// Example: External drive (macOS)
global data_root "/Volumes/ExternalDrive/research-data/ProjectName"

Run your analysis as usual - paths are resolved automatically:
```
just stata-run
```

Default Behavior (Everything Together)

If you don't create a config.do file, the template uses default paths:

data/raw/     -> [project_root]/data/raw/
data/clean/   -> [project_root]/data/clean/
data/final/   -> [project_root]/data/final/

Path Variables Reference

After setup, these globals are available in all scripts:

Data paths (customizable via config.do):

${data_root} - Root of all data folders
${data_raw} - Raw/original data
${data_clean} - Cleaned data
${data_final} - Final analysis datasets

Code/output paths (always in project root):

${project_path} - Project root (from setroot)
${scripts} - Do-files directory
${outputs} - All outputs
${tables} - Regression tables
${figures} - Figures and graphs
${logs} - Log files

Tip

Want more automation? See Advanced Setup below for:

just task runner for common commands
scons for dependency tracking (rebuild only what changed)
nbstata for running Stata interactively in VS Code
Pre-commit hooks for automatic code quality checks

Project Structure

├── README.md                           # Important information about the project. Keep this updated, provide additional documentation as needed in `/documentation`.
├── .here                               # Project root marker (for setroot)
├── .env                                # Stata configuration (gitignored) (copy from .env-example)
├── config.do.template                  # Template for user-specific data paths
├── config.do                           # User-specific data paths (gitignored) (copy from config.do.template)
├── .config/                            # Configuration files for packages and tools
│   ├── quarto/                         # Config for Quarto formatting Quarto Markdown documents
│   └── stata/                          # Stata package requirements
│       ├── install_packages.do         # Script to install required Stata packages
│       └── stata_requirements.txt      # List of required Stata packages
├── setup.do                            # One-time setup script
├── data/                               # Data files (DO NOT COMMIT SENSITIVE DATA OR LARGE FILES TO GIT/GITHUB)
│   ├── raw/                            # Original, immutable data files
│   ├── clean/                          # Cleaned data (intermediate)
│   └── final/                          # Analysis-ready datasets
├── do_files/                           # Stata do-files (files here are illustrative; actual do-files may vary)
│   ├── 00_run.do                       # Master do-file (controls pipeline + runner)
│   ├── 01_data_cleaning.do
│   ├── 02_data_preparation.do
│   ├── 03_descriptive_analysis.do
│   ├── 04_main_analysis.do
│   ├── 05_robustness_checks.do
│   ├── 06_generate_figures.do
│   └── functions.do                # Reusable helper functions
├── ado/                                # Local Stata packages (for reproducibility)
├── outputs/
│   ├── figures/                        # Figures (.pdf, .png files)
│   └── tables/                         # Regression tables (.tex, .md files)
├── logs/                               # Log files from Stata runs (should be gitignored)
├── reports/                            # Generate reports (e.g., Quarto, LaTeX)
├── src/                                # Additional scripts (e.g., Python for data processing)
└── documentation/                      # Project documentation

Understanding `00_run.do`

The master do-file orchestrates your entire analysis pipeline. It uses control switches to run specific sections:

// Change to 0 to skip during development
local data_cleaning         = 1
local data_preparation      = 1
local descriptive_analysis  = 1
local main_analysis         = 1
local robustness_checks     = 1
local generate_figures      = 1

This allows you to quickly iterate on specific parts without re-running everything.

Advanced Setup

For teams wanting additional automation, code quality tools, and VS Code integration.

Prerequisites for Advanced Features

Everything from Quick Start, plus:
just command runner (install)
For full setup: uv Python manager, Node.js (for linting)

Option A: Task Runner Only (+just)

Install just and use simple commands instead of typing full Stata paths:

# Windows
winget install --id Casey.Just -e

# macOS/Linux
brew install just

Now you can run:

just stata-setup                    # One-time setup (install setroot + packages)
just stata-run                      # Run the full pipeline
just stata-script 01_data_cleaning  # Run a single script
just stata-config                   # Show your Stata configuration
just help                           # See available commands

Option B: Full Development Environment

For the complete setup including Python tools, nbstata, and pre-commit hooks:

just get-started

This installs:

uv for Python environment management
Git for version control
GitHub CLI for interaction with GitHub
Quarto for reports and presentations
markdownlint-cli2 for Markdown formatting
Python virtual environment with nbstata (run Stata in VS Code/Jupyter)
Stata packages from .config/stata/stata_requirements.txt

After installation, verify your setup:

just stata-check-installation

VS Code Integration with nbstata

For interactive Stata execution in VS Code (similar to Ctrl+D workflow):

Install the vscode-stata extension
Test with demo files in do_files/demo/
Select the nbstata kernel at .venv/Scripts/python.exe (Windows) or .venv/bin/python (macOS/Linux)

See the nbstata User Guide for details.

Dependency Tracking with scons (Advanced)

For large projects where full rebuilds take >5 minutes, use scons to only rebuild changed files:

just stata-build        # Build with dependency tracking
just stata-data         # Build only data pipeline
just stata-analysis     # Build only analysis
just stata-clean        # Clean all outputs

The SConstruct file defines dependencies between do-files and their outputs. When you modify 01_data_cleaning.do, scons knows to re-run downstream scripts but not unrelated ones.

Note

For most projects, the simple 00_run.do approach is sufficient. Only adopt scons if you have genuinely slow builds that would benefit from incremental rebuilding.

Best Practices Built In

Workflow Features

IPA Data Standards: Follows IPA Data Cleaning Guide and Stata coding best practices
Defensive programming: Uses assert statements and quality checks throughout
Extended missing values: Implements IPA's .d/.o/.n/.r/.s conventions
Reproducible package management: Requirements-based Stata package installation
Comprehensive logging: All Stata runs generate detailed log files
Publication-ready outputs: Tables in LaTeX format, figures in PDF

Stata Package Management

# Install all required packages from requirements file
just stata-install-packages

Packages are listed in .config/stata/stata_requirements.txt.

Code Quality with stata_linter

just lint-stata                                    # Lint all do-files
just lint-stata-file do_files/01_data_cleaning.do  # Lint specific file

Reports saved to logs/stata_linter_report.xlsx.

IPA Visualizations (for IPA Staff)

net install github, from("https://haghish.github.io/github/")
github install PovertyAction/ipaplots

The template automatically uses IPA branding when ipaplots is available.

Troubleshooting

Command not found errors:

Verify Stata path in .env file
Ensure quotes around paths with spaces (Windows)

Permission errors (macOS/Linux):

Check file permissions on Stata executable

Batch mode issues:

Ensure your Stata license supports batch processing

Acknowledgments and References

This template builds upon established best practices and tools from the development economics and data science communities:

Primary Guidelines and Standards

IPA Data Cleaning Guide (Website): Comprehensive guide for data cleaning best practices
- Organization: Innovations for Poverty Action (IPA)
- Covers: Raw data management, variable management, dataset documentation, data aggregation
IPA Stata Tutorials (Website): Stata coding standards and best practices
- Organization: Innovations for Poverty Action (IPA)
- Covers: Stata syntax, data processing, coding standards
Data Carpentry Stata Economics (Website): Research-grade Stata programming curriculum
- Organization: Data Carpentry
- Covers: Data exploration, quality assessment, transformation, combination, programming, loops, advanced techniques
- License: CC BY 4.0

Core Dependencies

ipaplots (GitHub): IPA-branded Stata graphing scheme
- Authors: Ronny Condor, Kelly Montaño (IPA Peru)
- Organization: Innovations for Poverty Action
- Features: Professional visualization theme with IPA branding
(Optional) statacons (GitHub | Documentation): Python package for managing Stata workflows
- Authors: Brian Quistorff and colleagues
- License: MIT License

Coding Standards and Best Practices

Sean Higgins Stata Guide (GitHub): Comprehensive coding style and workflow recommendations
- Author: Sean Higgins
- License: Creative Commons
DIME Analytics Data Handbook (Website): World Bank DIME team coding standards
- Organization: World Bank Development Impact Evaluation (DIME)
- License: MIT License
World Bank Reproducible Research Repository (GitHub): Guidelines for reproducible research
- Organization: World Bank
- License: Mozilla Public License 2.0
Code and Data for the Social Sciences: A Practitioner's Guide (Website): Stata coding style guide
- Authors: Matthew Gentzkow and Jesse M. Shapiro
- Copyright (c) 2014, Matthew Gentzkow and Jesse M. Shapiro.

Development Tools

uv (Documentation): Fast Python package installer and resolver
Just (GitHub): Command runner for development tasks
Quarto (Website): Scientific and technical publishing system

Advance Workflow with SCons

Automated Build System (Recommended) - `SConstruct`

just stata-full     # Complete pipeline with build system
# OR use scons directly:
scons              # Builds entire analysis pipeline
scons data         # Builds only data cleaning/preparation
scons analysis     # Builds only analysis outputs
scons figures      # Builds only figures
scons -c           # Clean all outputs

License

This template is released under the MIT License. See LICENSE for details.

While this template is MIT licensed, please respect the licenses of the constituent tools and respect the intellectual contributions of the referenced guides and best practices.

Name		Name	Last commit message	Last commit date
Latest commit History 105 Commits
.claude/skills		.claude/skills
.config		.config
.github		.github
.vscode		.vscode
ado		ado
data		data
do_files		do_files
documentation		documentation
logs		logs
outputs		outputs
reports		reports
src		src
tutorials		tutorials
.env-example		.env-example
.gitignore		.gitignore
.here		.here
.markdownlint.yaml		.markdownlint.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Justfile		Justfile
LICENSE		LICENSE
README.md		README.md
SConstruct		SConstruct
_environment		_environment
_quarto.yml		_quarto.yml
config.do.template		config.do.template
profile.do		profile.do
pyproject.toml		pyproject.toml
setup.do		setup.do
uv.lock		uv.lock

License

PovertyAction/ipa-stata-template

Folders and files

Latest commit

History

Repository files navigation

Stata Project Template for Reproducible Research

Use This Template

Quick Start (Minimal Setup)

Prerequisites

Steps

How Path Resolution Works

Separating Code and Data Paths

Setup for Separate Data Storage

Default Behavior (Everything Together)

Path Variables Reference

Project Structure

Understanding 00_run.do

Advanced Setup

Prerequisites for Advanced Features

Option A: Task Runner Only (+just)

Option B: Full Development Environment

VS Code Integration with nbstata

Dependency Tracking with scons (Advanced)

Best Practices Built In

Workflow Features

Stata Package Management

Code Quality with stata_linter

IPA Visualizations (for IPA Staff)

Troubleshooting

Acknowledgments and References

Primary Guidelines and Standards

Core Dependencies

Coding Standards and Best Practices

Development Tools

Advance Workflow with SCons

Automated Build System (Recommended) - SConstruct

License

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 7

Uh oh!

Languages

Understanding `00_run.do`

Automated Build System (Recommended) - `SConstruct`

Packages