A template repository for reproducible Stata analysis projects using modern workflow tools and best practices. This template integrates IPA's Data Cleaning Guide and Stata coding standards with established practices from leading development economics research groups.
Warning
NEVER COMMIT DATA FILES TO GITHUB.
NEVER USE AI ASSISTANTS WITH PERSONALLY IDENTIFIABLE DATA.
YOU ARE REQUIRED TO REMOVE IDENTIFYING INFORMATION BEFORE CONNECTING AI
ASSISTANTS OR STORING IN ANY UNENCRYPTED LOCATION.
If you want to use this template for your own project:
- Click the green Use this template button at the top of this page
- Select Create a new repository
- On the "Create a new repository" page:
- Start with a template:
PovertyAction/ipa-stata-template - Include all branches: Off
- Select Owner, Repository name, Description, and Configuration as desired.
- Start with a template:
- Click the green Create repository button.
Tip
If you are new to this repository, start here but be sure to read the full README as well as the Getting Started Guide.
Get started with just Git and Stata - no additional tools required.
- Git installed (download)
- Windows:
winget install --id Git.Git -e - macOS:
brew install git - Linux: Use your package manager, e.g.,
sudo apt install git,brew install git
- Windows:
- Stata 17+ installed and licensed
-
Clone the repository
git clone https://github.com/PovertyAction/ipa-stata-template.git cd ipa-stata-template -
Configure your Stata path
Copy
.env-exampleto.envand set your Stata executable path:# Windows example STATA_CMD='C:\Program Files\Stata18\StataSE-64.exe' STATA_EDITION='se' # macOS example # STATA_CMD='/Applications/Stata/StataSE.app/Contents/MacOS/StataSE' # Linux example # STATA_CMD='/usr/local/stata18/stata-se'
-
One-time setup (install
setrootpackage)# From command line - run once to install dependencies just stata-setup # Or from Stata directly: do setup.do
-
Run the analysis pipeline
# Full pipeline just stata-run # Or run a single script just stata-script 01_data_cleaning # Or open Stata and run directly # IMPORTANT: First change to the project directory in Stata: cd ~/code/ipa-stata-template do do_files/00_run.do do do_files/00_run.do "01_data_cleaning" // single script
[!TIP] If you get a "Root folder of project not found" error, make sure you've changed to the project directory in Stata using
cd ~/code/ipa-stata-templatebefore running the do-file. -
Check outputs
- Tables:
outputs/tables/ - Figures:
outputs/figures/ - Logs:
logs/
- Tables:
That's it! You now have a reproducible Stata workflow.
The template uses setroot to automatically find the project root by looking for
a .here marker file. This means:
- No
c(pwd)dependency - scripts work regardless of where Stata is launched - No user-specific
if c(user)blocks - paths resolve automatically - Full adopath isolation - only BASE + local
ado/for reproducibility - Runner pattern - run individual scripts with proper environment setup
The template supports storing code and data in separate locations. This is useful when:
- Data is stored in a Cryptomator vault
- Data is synced via Box/Dropbox/OneDrive to different locations on each machine
- Multiple team members work from different directory structures
- You want to keep large data files outside your git repository
-
Copy the template file:
# Windows Command Prompt or bash cp config.do.template config.do # Windows PowerShell Copy-Item config.do.template config.do
[!IMPORTANT] Never commit
config.do- it's gitignored for a reason (contains user-specific paths). Always commitconfig.do.templateso others know how to configure theirs. -
Edit
config.doto set your data path:// Example: Dropbox global data_root "C:/Users/YourName/Dropbox/Research/ProjectName/data" // Example: External drive (macOS) global data_root "/Volumes/ExternalDrive/research-data/ProjectName"
-
Run your analysis as usual - paths are resolved automatically:
just stata-run
If you don't create a config.do file, the template uses default paths:
data/raw/ -> [project_root]/data/raw/
data/clean/ -> [project_root]/data/clean/
data/final/ -> [project_root]/data/final/After setup, these globals are available in all scripts:
Data paths (customizable via config.do):
${data_root}- Root of all data folders${data_raw}- Raw/original data${data_clean}- Cleaned data${data_final}- Final analysis datasets
Code/output paths (always in project root):
${project_path}- Project root (from setroot)${scripts}- Do-files directory${outputs}- All outputs${tables}- Regression tables${figures}- Figures and graphs${logs}- Log files
Tip
Want more automation? See Advanced Setup below for:
justtask runner for common commandssconsfor dependency tracking (rebuild only what changed)nbstatafor running Stata interactively in VS Code- Pre-commit hooks for automatic code quality checks
├── README.md # Important information about the project. Keep this updated, provide additional documentation as needed in `/documentation`.
├── .here # Project root marker (for setroot)
├── .env # Stata configuration (gitignored) (copy from .env-example)
├── config.do.template # Template for user-specific data paths
├── config.do # User-specific data paths (gitignored) (copy from config.do.template)
├── .config/ # Configuration files for packages and tools
│ ├── quarto/ # Config for Quarto formatting Quarto Markdown documents
│ └── stata/ # Stata package requirements
│ ├── install_packages.do # Script to install required Stata packages
│ └── stata_requirements.txt # List of required Stata packages
├── setup.do # One-time setup script
├── data/ # Data files (DO NOT COMMIT SENSITIVE DATA OR LARGE FILES TO GIT/GITHUB)
│ ├── raw/ # Original, immutable data files
│ ├── clean/ # Cleaned data (intermediate)
│ └── final/ # Analysis-ready datasets
├── do_files/ # Stata do-files (files here are illustrative; actual do-files may vary)
│ ├── 00_run.do # Master do-file (controls pipeline + runner)
│ ├── 01_data_cleaning.do
│ ├── 02_data_preparation.do
│ ├── 03_descriptive_analysis.do
│ ├── 04_main_analysis.do
│ ├── 05_robustness_checks.do
│ ├── 06_generate_figures.do
│ └── functions.do # Reusable helper functions
├── ado/ # Local Stata packages (for reproducibility)
├── outputs/
│ ├── figures/ # Figures (.pdf, .png files)
│ └── tables/ # Regression tables (.tex, .md files)
├── logs/ # Log files from Stata runs (should be gitignored)
├── reports/ # Generate reports (e.g., Quarto, LaTeX)
├── src/ # Additional scripts (e.g., Python for data processing)
└── documentation/ # Project documentation
The master do-file orchestrates your entire analysis pipeline. It uses control switches to run specific sections:
// Change to 0 to skip during development
local data_cleaning = 1
local data_preparation = 1
local descriptive_analysis = 1
local main_analysis = 1
local robustness_checks = 1
local generate_figures = 1This allows you to quickly iterate on specific parts without re-running everything.
For teams wanting additional automation, code quality tools, and VS Code integration.
- Everything from Quick Start, plus:
justcommand runner (install)- For full setup:
uvPython manager, Node.js (for linting)
Install just and use simple commands instead of typing full Stata paths:
# Windows
winget install --id Casey.Just -e
# macOS/Linux
brew install justNow you can run:
just stata-setup # One-time setup (install setroot + packages)
just stata-run # Run the full pipeline
just stata-script 01_data_cleaning # Run a single script
just stata-config # Show your Stata configuration
just help # See available commandsFor the complete setup including Python tools, nbstata, and pre-commit hooks:
just get-startedThis installs:
uvfor Python environment managementGitfor version controlGitHub CLIfor interaction with GitHubQuartofor reports and presentationsmarkdownlint-cli2for Markdown formatting- Python virtual environment with
nbstata(run Stata in VS Code/Jupyter) - Stata packages from
.config/stata/stata_requirements.txt
After installation, verify your setup:
just stata-check-installationFor interactive Stata execution in VS Code (similar to Ctrl+D workflow):
- Install the vscode-stata extension
- Test with demo files in
do_files/demo/ - Select the nbstata kernel at
.venv/Scripts/python.exe(Windows) or.venv/bin/python(macOS/Linux)
See the nbstata User Guide for details.
For large projects where full rebuilds take >5 minutes, use scons to only rebuild changed files:
just stata-build # Build with dependency tracking
just stata-data # Build only data pipeline
just stata-analysis # Build only analysis
just stata-clean # Clean all outputsThe SConstruct file defines dependencies between do-files and their outputs. When you modify 01_data_cleaning.do, scons knows to re-run downstream scripts but not unrelated ones.
Note
For most projects, the simple 00_run.do approach is sufficient. Only adopt scons
if you have genuinely slow builds that would benefit from incremental rebuilding.
- IPA Data Standards: Follows IPA Data Cleaning Guide and Stata coding best practices
- Defensive programming: Uses assert statements and quality checks throughout
- Extended missing values: Implements IPA's .d/.o/.n/.r/.s conventions
- Reproducible package management: Requirements-based Stata package installation
- Comprehensive logging: All Stata runs generate detailed log files
- Publication-ready outputs: Tables in LaTeX format, figures in PDF
# Install all required packages from requirements file
just stata-install-packagesPackages are listed in .config/stata/stata_requirements.txt.
just lint-stata # Lint all do-files
just lint-stata-file do_files/01_data_cleaning.do # Lint specific fileReports saved to logs/stata_linter_report.xlsx.
net install github, from("https://haghish.github.io/github/")
github install PovertyAction/ipaplotsThe template automatically uses IPA branding when ipaplots is available.
Command not found errors:
- Verify Stata path in
.envfile - Ensure quotes around paths with spaces (Windows)
Permission errors (macOS/Linux):
- Check file permissions on Stata executable
Batch mode issues:
- Ensure your Stata license supports batch processing
This template builds upon established best practices and tools from the development economics and data science communities:
-
IPA Data Cleaning Guide (Website): Comprehensive guide for data cleaning best practices
- Organization: Innovations for Poverty Action (IPA)
- Covers: Raw data management, variable management, dataset documentation, data aggregation
-
IPA Stata Tutorials (Website): Stata coding standards and best practices
- Organization: Innovations for Poverty Action (IPA)
- Covers: Stata syntax, data processing, coding standards
-
Data Carpentry Stata Economics (Website): Research-grade Stata programming curriculum
- Organization: Data Carpentry
- Covers: Data exploration, quality assessment, transformation, combination, programming, loops, advanced techniques
- License: CC BY 4.0
-
ipaplots (GitHub): IPA-branded Stata graphing scheme
- Authors: Ronny Condor, Kelly Montaño (IPA Peru)
- Organization: Innovations for Poverty Action
- Features: Professional visualization theme with IPA branding
-
(Optional) statacons (GitHub | Documentation): Python package for managing Stata workflows
- Authors: Brian Quistorff and colleagues
- License: MIT License
-
Sean Higgins Stata Guide (GitHub): Comprehensive coding style and workflow recommendations
- Author: Sean Higgins
- License: Creative Commons
-
DIME Analytics Data Handbook (Website): World Bank DIME team coding standards
- Organization: World Bank Development Impact Evaluation (DIME)
- License: MIT License
-
World Bank Reproducible Research Repository (GitHub): Guidelines for reproducible research
- Organization: World Bank
- License: Mozilla Public License 2.0
-
Code and Data for the Social Sciences: A Practitioner's Guide (Website): Stata coding style guide
- Authors: Matthew Gentzkow and Jesse M. Shapiro
- Copyright (c) 2014, Matthew Gentzkow and Jesse M. Shapiro.
- uv (Documentation): Fast Python package installer and resolver
- Just (GitHub): Command runner for development tasks
- Quarto (Website): Scientific and technical publishing system
just stata-full # Complete pipeline with build system
# OR use scons directly:
scons # Builds entire analysis pipeline
scons data # Builds only data cleaning/preparation
scons analysis # Builds only analysis outputs
scons figures # Builds only figures
scons -c # Clean all outputsThis template is released under the MIT License. See LICENSE for details.
While this template is MIT licensed, please respect the licenses of the constituent tools and respect the intellectual contributions of the referenced guides and best practices.