Skip to content

Latest commit

 

History

History
149 lines (104 loc) · 3.81 KB

File metadata and controls

149 lines (104 loc) · 3.81 KB

Recession Prediction with Yield Curve PCA and Macroeconomic Variables

This repository contains a fully reproducible empirical machine learning pipeline for predicting U.S. recessions using the Treasury yield curve, principal component analysis (PCA), macroeconomic indicators, and modern classification models. The project emphasizes time-respecting validation, robust threshold selection, and scenario-based evaluation (Global Financial Crisis vs. COVID-19).

The structure and workflow are designed to match best practices in applied ML and empirical macro-finance research.


1️⃣ Research Objective

The goal of this project is to evaluate whether information in the U.S. yield curve—summarized via PCA—and macroeconomic variables can predict NBER recessions at a fixed forecast horizon.

Key questions:

  • How much predictive power is contained in yield curve principal components beyond simple spreads?
  • Does combining yield curve information with macroeconomic variables improve performance?
  • How stable are results across different validation schemes?
  • Why do models that perform well during the GFC struggle during COVID?

2️⃣ Methodological Overview

Target Variable

  • Binary indicator of an NBER recession at horizon $t + h$
  • Default horizon: 12 months ahead

Feature Blocks

  • Yield curve levels (3m to 30y)

  • Yield spreads (10y–3m, 10y–2y)

  • Yield curve PCA (level, slope, curvature)

  • Macroeconomic variables

    • Unemployment rate
    • Inflation (YoY)
    • Industrial production (YoY)
    • Consumer sentiment
    • Payroll employment (YoY)
  • Policy / credit indicators

  • Regime dummies (GFC, COVID, ZLB/QE)

Models

  • Ridge and Elastic Net
  • Logistic regression (elastic net)
  • Random forest
  • Gradient boosting
  • XGBoost (optional)

Validation Strategy

  • Expanding-window cross-validation (for threshold tuning)

  • Multiple holdout splits (sensitivity analysis)

  • Scenario-based testing:

    • GFC: 2007–2009
    • COVID: 2019–2021

3️⃣ Repository Structure

.
├── data/                   # Raw and processed datasets
├── experiments/            # Experiment entry points
├── models/                 # Trained models
├── reports/                # LaTeX paper, tables, and figures
│   ├── figures/
│   ├── tables/
│   └── main.tex
├── src/                    # Core library code
│   ├── data/               # Data ingestion and construction
│   ├── features/           # PCA and feature engineering
│   ├── models/             # Model definitions
│   ├── evaluations/        # Metrics, thresholds, validation
│   ├── visualizations/     # Publication-quality figures
│   └── utils/              # Helpers (LaTeX export, misc)
├── environment.yml         # Conda environment (reproducible)
├── Makefile                # One-command replication
└── README.md

4️⃣ Reproducibility and Environment Setup

All experiments are fully reproducible using Conda.

Step 1: Create the environment

conda env create -f environment.yml
conda activate ec48e-recession

Step 2: Set FRED API key

export FREDAPI="YOUR_FRED_API_KEY"

5️⃣ Running the Full Pipeline

Run all experiments

make run

This will:

  • Download and process data from FRED
  • Construct features and PCA representations
  • Train all models
  • Run holdout sensitivity and scenario tests
  • Save results to outputs/

6️⃣ Building the LaTeX Report

The final paper is fully automated.

make report

This compiles:

reports/main.tex

which pulls in:

  • Tables generated from model outputs
  • Figures produced by the pipeline
  • Modular section files

7️⃣ License

See LICENSE for details.