Inflation Nowcaster

Scrapes real-time retail prices to estimate inflation before official CPI releases. The idea is simple: if you track enough product prices daily, you can see inflation trends 2-3 weeks before the BLS publishes their numbers.

How it works

Scrape prices from Amazon and Walmart (mock implementations — real scraping would need proxy rotation and careful rate limiting to avoid getting blocked)
ETL pipeline cleans, validates, and stores the data as parquet files
Nowcast model computes weighted price indices using CPI basket weights
Forecast module runs ARIMA/SARIMA for short-term predictions
Airflow DAG orchestrates the whole thing on a daily schedule
Streamlit dashboard for visualization

Pipeline outputs

The backtesting framework is implemented but results depend heavily on data source quality. With the mock scrapers (hardcoded product catalogs), the pipeline runs end-to-end and produces forecasts, but the metrics aren't meaningful — you're basically forecasting data you generated yourself.

With real price feeds, the literature suggests web-scraped price indices can lead official CPI by 2-3 weeks with reasonable accuracy. The value here is the pipeline architecture, not the mock data results.

Setup

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Usage

# Run the ETL pipeline
python -m src.pipeline.etl --output ./data/processed

# Compute nowcast
python -m src.models.nowcast --data-path ./data/processed

# Dashboard
streamlit run streamlit_app/app.py

# Airflow (copy DAG first)
cp dags/scraping_dag.py $AIRFLOW_HOME/dags/

CPI Categories

Tracking 8 categories with BLS-approximate weights:

Category	Weight
Grocery	14.3%
Housing	42.4%
Transportation	16.0%
Medical	8.5%
Education	6.2%
Recreation	5.4%
Apparel	2.6%
Other	4.6%

What I struggled with

Scraping is legally tricky: Amazon and Walmart both restrict automated scraping in their ToS. I ended up using mock data for the portfolio version — in a real setting you'd need to negotiate data access or use an API
Seasonal adjustment is harder than it looks: Tried X-13ARIMA-SEATS but it needs a lot of data to work well. Fell back to simpler decomposition for now
Data quality is the real bottleneck: Spent more time on validation and outlier detection than on the actual models. Price data from scraping is noisy — products go out of stock, prices spike during sales, units change
ARIMA order selection: Grid search over (p,d,q) is slow and the AIC-optimal model isn't always the best for forecasting. Would use auto_arima from pmdarima next time

What I'd do differently

Use a proper price aggregator API instead of scraping (BLS actually publishes microdata, just with a lag)
Try a state-space model or dynamic factor model instead of just ARIMA
The Laspeyres index calculation is simplified — should handle substitution bias and quality adjustment
Great Expectations integration is mostly scaffolded, not fully wired up
The Airflow DAG works but the alerting is just a print statement

Disclaimer

This is a proof of concept. Don't use it for actual trading decisions. Always refer to official CPI releases from the BLS for real inflation data.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github		.github
dags		dags
notebooks		notebooks
src		src
streamlit_app		streamlit_app
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inflation Nowcaster

How it works

Pipeline outputs

Setup

Usage

CPI Categories

What I struggled with

What I'd do differently

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Inflation Nowcaster

How it works

Pipeline outputs

Setup

Usage

CPI Categories

What I struggled with

What I'd do differently

Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages