This repository contains a fully reproducible empirical machine learning pipeline for predicting U.S. recessions using the Treasury yield curve, principal component analysis (PCA), macroeconomic indicators, and modern classification models. The project emphasizes time-respecting validation, robust threshold selection, and scenario-based evaluation (Global Financial Crisis vs. COVID-19).
The structure and workflow are designed to match best practices in applied ML and empirical macro-finance research.
The goal of this project is to evaluate whether information in the U.S. yield curve—summarized via PCA—and macroeconomic variables can predict NBER recessions at a fixed forecast horizon.
Key questions:
- How much predictive power is contained in yield curve principal components beyond simple spreads?
- Does combining yield curve information with macroeconomic variables improve performance?
- How stable are results across different validation schemes?
- Why do models that perform well during the GFC struggle during COVID?
- Binary indicator of an NBER recession at horizon
$t + h$ - Default horizon: 12 months ahead
-
Yield curve levels (3m to 30y)
-
Yield spreads (10y–3m, 10y–2y)
-
Yield curve PCA (level, slope, curvature)
-
Macroeconomic variables
- Unemployment rate
- Inflation (YoY)
- Industrial production (YoY)
- Consumer sentiment
- Payroll employment (YoY)
-
Policy / credit indicators
-
Regime dummies (GFC, COVID, ZLB/QE)
- Ridge and Elastic Net
- Logistic regression (elastic net)
- Random forest
- Gradient boosting
- XGBoost (optional)
-
Expanding-window cross-validation (for threshold tuning)
-
Multiple holdout splits (sensitivity analysis)
-
Scenario-based testing:
- GFC: 2007–2009
- COVID: 2019–2021
.
├── data/ # Raw and processed datasets
├── experiments/ # Experiment entry points
├── models/ # Trained models
├── reports/ # LaTeX paper, tables, and figures
│ ├── figures/
│ ├── tables/
│ └── main.tex
├── src/ # Core library code
│ ├── data/ # Data ingestion and construction
│ ├── features/ # PCA and feature engineering
│ ├── models/ # Model definitions
│ ├── evaluations/ # Metrics, thresholds, validation
│ ├── visualizations/ # Publication-quality figures
│ └── utils/ # Helpers (LaTeX export, misc)
├── environment.yml # Conda environment (reproducible)
├── Makefile # One-command replication
└── README.md
All experiments are fully reproducible using Conda.
conda env create -f environment.yml
conda activate ec48e-recessionexport FREDAPI="YOUR_FRED_API_KEY"make runThis will:
- Download and process data from FRED
- Construct features and PCA representations
- Train all models
- Run holdout sensitivity and scenario tests
- Save results to
outputs/
The final paper is fully automated.
make reportThis compiles:
reports/main.tex
which pulls in:
- Tables generated from model outputs
- Figures produced by the pipeline
- Modular section files
See LICENSE for details.