MarketMoLe

(/ˈmɑːrkɪt moʊl/)

MktML is a local-first stock research pipeline that uses machine learning to generate daily BUY / HOLD / SELL recommendations across 900+ tickers. It pulls price data from up to 11 providers, trains an ensemble model (Random Forest + Gradient Boosting + XGBoost) across three time horizons (5-day, 10-day, 30-day), and produces human-readable reports you can act on or feed into other tools.

It runs entirely on your machine — no cloud account, no subscription, no data leaving your system.

Live daily reports and model performance dashboard

What does it actually do?

Pulls market data from yfinance and up to 10 fallback providers, so a single API outage never stops you.
Computes signals from technicals (RSI, MACD, Bollinger Bands, etc.), macro indicators (yield curve, VIX, credit spreads via FRED), qualitative features (sector, moat, debt level), and daily news assessment (trade policy, geopolitical, regulatory, energy supply, monetary surprises via Gemini CLI).
Trains an ensemble ML model (Random Forest + Gradient Boosting + optional XGBoost) on 5-year history, scoring each ticker across 5d/10d/30d horizons.
Scans your universe and synthesizes a BUY/HOLD/SELL recommendation for every ticker, with calibrated confidence scores.
Generates a daily report (Markdown + JSON) with top picks, exit alerts for your holdings, and data-health checks.
Tracks its own accuracy via an audit system that records every recommendation and measures outcomes against SPY, other benchmarks, or a composite.

Quick Start

# 1. Clone and set up
git clone https://github.com/smkwray/market.git
cd market
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# 2. Configure
cp examples/public/config.public.example.py config.py
cp examples/public/.env.example .env
# Edit .env with your API keys (at minimum, get a free FRED key)
# Edit config.py to add your PORTFOLIO_HOLDINGS and WATCHLIST

# 3. Initialize and run
python src/main.py --init-db
python src/main.py --pipeline full

The full pipeline will: download data → compute signals → train models → scan → generate a report. First run takes longer (downloading 5 years of history); subsequent daily runs are much faster.

Architecture

flowchart LR
  A["Universe + Config"] --> B["Data Loader\n(yfinance + provider fallbacks)"]
  B --> C["Signal Engine\n(technicals + macro + qual)"]
  C --> D["ML Engine\n(5d/10d/30d ensemble)"]
  D --> E["Replay + Calibration\n(model_predictions + artifacts)"]
  D --> F["Scanner\n(BUY/HOLD/SELL synthesis)"]
  E --> F
  F --> G["Reporter\n(Markdown + JSON + OPENCLAW blocks)"]
  G --> H["Dashboard + Notifications"]
  I["Artifact Verification\n(manifest + runtime skew)"] --> F
  I --> D

CLI Reference

Daily use

Command	What it does
`python src/main.py --pipeline full`	Run everything end-to-end (data → train → scan → report)
`python src/main.py --pipeline daily`	Daily refresh (skip training, reuse existing model)
`python src/main.py --pipeline daily_auto`	Same as `daily` but designed for unattended/cron use
`python src/main.py --scan`	Scan only (generate recommendations from existing model)
`python src/main.py --update-news`	Fetch daily market news assessment via Gemini CLI
`python src/main.py --report`	Regenerate the report from the latest scan results

Training and calibration

Command	What it does
`python src/main.py --train-ml`	Retrain the ML models on current data
`python src/main.py --replay-scan --start YYYY-MM-DD --end YYYY-MM-DD`	Backtest: replay historical scans to build prediction history
`python src/main.py --build-calibration-artifacts --start YYYY-MM-DD --end YYYY-MM-DD`	Build probability calibration from replay data
`python src/main.py --verify-artifacts`	Check model/calibration artifact integrity

Audit and maintenance

Command	What it does
`python src/main.py --audit`	Score past recommendations against actual outcomes
`python src/main.py --backfill-labels`	Backfill outcome labels for older recommendations
`python src/main.py --notify`	Send daily notification (webhook or ntfy)
`python src/main.py --notify-weekly`	Send weekly performance summary

Dashboard

A local web UI for controlling runs, viewing reports, and editing config:

python src/dashboard.py
# Open http://127.0.0.1:5050

From the dashboard you can:

Start/stop any pipeline step (scan, train, audit, news, etc.)
Schedule recurring jobs (including morning/evening news assessment)
Browse reports and logs
Edit config with history and rollback
View live run status and analytics

Data Providers

MktML tries providers in order until it gets data. If yfinance is down, it moves to Alpaca, then Tiingo, and so on. You don't need all of them — yfinance works with no API key. More providers = more resilience.

Provider	Key needed?	What it provides
yfinance	No	Primary source for bulk price data
Alpaca	Yes	High-throughput OHLCV fallback
Tiingo	Yes	Daily bars fallback
Stooq	No	Public endpoint fallback
Twelve Data	Yes	CSV API fallback
Finnhub	Yes	Price data + fundamentals
Polygon	Yes	Price data + fundamentals
Alpha Vantage	Yes	OHLCV fallback (supports multiple keys)
FMP	Yes	Financial Modeling Prep fallback
EODHD	Yes	End-of-day historical fallback
FRED	Yes (free)	Macro indicators (VIX, yield curve, credit spreads)

Set your keys in .env — see examples/public/.env.example for the full list.

Gemini CLI (optional)

Gemini CLI is used for three things:

Daily news assessment — grounded web search produces signed risk scores for trade policy, geopolitical, regulatory, monetary policy, energy/commodity, and event shock categories. Runs twice daily (morning pre-market + evening post-close) with automatic model fallback.
Qualitative features — sector/industry classification and moat/debt/maturity ratings for each ticker.
Price data recovery — last-resort fallback when all standard data providers fail.

Disabled by default if the binary isn't found. News and qual features gracefully default to neutral when unavailable.

Reports and Agent Integration

Each scan produces a Markdown report with:

Market regime summary (bull/bear/neutral based on macro signals)
Data health stats (coverage, stale tickers, fetch failures)
Top BUY and SELL recommendations with confidence scores
Portfolio holdings status and exit alerts
Watchlist updates

Reports include stable OPENCLAW:SUMMARY and OPENCLAW:JSON markers so downstream tools or AI agents can parse them programmatically. See examples/public/market_report.sample.md for the format.

Notifications

Send daily or weekly summaries to:

Any webhook endpoint (set MARKET_NOTIFICATION_WEBHOOK_URL)
An ntfy topic (set NOTIFICATION_NTFY_TOPIC)

How the ML Works

Ensemble: Random Forest + Gradient Boosting + optional XGBoost, weighted and averaged.
Three horizons: Separate models for 5-day, 10-day, and 30-day predictions.
Feature contract: A strict ordered list of 124 features (technicals, macro, news, qualitative) ensures training and inference always use the same inputs. No silent drift.
Walk-forward validation: Out-of-sample only, with purged/embargo-aware splits to prevent lookahead bias.
Calibration: Probability outputs are calibrated per horizon so a "70% confidence" score means roughly 70% historical accuracy at that threshold.
Asset buckets: Optionally trains separate models for equities, ETFs, and bonds when sample sizes are large enough.

Extending MktML

Want to...	Where to look
Add a new data provider	`src/data_loader.py` — add a downloader function and insert it into the fallback chain
Add new signal features	`src/signals.py` for technicals, `src/macro_loader.py` for macro, `src/news_loader.py` for news, `scripts/update_qual_features.py` for qualitative
Add a new ML model type	`src/ml_engine.py` and `config.py`
Add a notification channel	`src/notifier.py`
Add dashboard controls	`src/dashboard.py`

Project Structure

src/
  main.py          Entry point and CLI
  data_loader.py   Multi-provider data acquisition with fallback
  signals.py       Technical indicator computation
  macro_loader.py  FRED macro data and regime features
  news_loader.py   Daily market news assessment via Gemini CLI
  ml_engine.py     Model training, inference, and calibration
  scanner.py       BUY/HOLD/SELL recommendation synthesis
  reporter.py      Markdown + JSON report generation
  audit.py         Recommendation outcome tracking
  dashboard.py     Flask web UI
  notifier.py      Webhook and ntfy notifications
  storage.py       SQLite database layer
  universe.py      Ticker universe management
scripts/           Maintenance, migration, and utility scripts
examples/public/   Sanitized starter config and templates

Performance Tuning

Threading is auto-configured based on your CPU count. Override with environment variables if needed:

CPU_RESERVED_CORES — cores to leave free (default: 4)
ML_N_JOBS — parallel jobs for model training
SCANNER_WORKERS — parallel workers for data fetching

Public Site

A sanitized public site is automatically exported to smkwray.github.io/mktml after each daily scan and audit. It includes:

Daily market report — top BUY/SELL signals, safe assets, market summary (portfolio holdings and watchlist are stripped)
Model performance dashboard — interactive Plotly charts showing strategy vs. SPY, confidence calibration, and monthly returns
Audit summary — excess return, beat rate, information ratio, and directional accuracy

The export is handled by scripts/export_public_site.py and deployed via GitHub Actions to GitHub Pages.

Current Status

MktML is actively maintained and in daily use. The audit system is still accumulating samples for full statistical reliability — see the live performance dashboard for current metrics.

Publishing Safety

This repo is designed to be safe for public hosting. Credentials, portfolio data, reports, and logs are all gitignored. The public site export (scripts/export_public_site.py) strips portfolio holdings, watchlist, exit alerts, sector/country breakdowns, and internal markers before publishing. A pre-push hook (scripts/public_push_guard.sh) blocks accidental pushes of sensitive files. See PUBLIC_RELEASE.md for the full checklist.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
.githooks		.githooks
.github/workflows		.github/workflows
assets		assets
docs		docs
examples/public		examples/public
scripts		scripts
src		src
tests		tests
PUBLIC_RELEASE.md		PUBLIC_RELEASE.md
README.md		README.md
overview.md		overview.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MarketMoLe

What does it actually do?

Quick Start

Architecture

CLI Reference

Daily use

Training and calibration

Audit and maintenance

Dashboard

Data Providers

Gemini CLI (optional)

Reports and Agent Integration

Notifications

How the ML Works

Extending MktML

Project Structure

Performance Tuning

Public Site

Current Status

Publishing Safety

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

MarketMoLe

What does it actually do?

Quick Start

Architecture

CLI Reference

Daily use

Training and calibration

Audit and maintenance

Dashboard

Data Providers

Gemini CLI (optional)

Reports and Agent Integration

Notifications

How the ML Works

Extending MktML

Project Structure

Performance Tuning

Public Site

Current Status

Publishing Safety

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages