(/ˈmɑːrkɪt moʊl/)
MktML is a local-first stock research pipeline that uses machine learning to generate daily BUY / HOLD / SELL recommendations across 900+ tickers. It pulls price data from up to 11 providers, trains an ensemble model (Random Forest + Gradient Boosting + XGBoost) across three time horizons (5-day, 10-day, 30-day), and produces human-readable reports you can act on or feed into other tools.
It runs entirely on your machine — no cloud account, no subscription, no data leaving your system.
Live daily reports and model performance dashboard
- Pulls market data from yfinance and up to 10 fallback providers, so a single API outage never stops you.
- Computes signals from technicals (RSI, MACD, Bollinger Bands, etc.), macro indicators (yield curve, VIX, credit spreads via FRED), qualitative features (sector, moat, debt level), and daily news assessment (trade policy, geopolitical, regulatory, energy supply, monetary surprises via Gemini CLI).
- Trains an ensemble ML model (Random Forest + Gradient Boosting + optional XGBoost) on 5-year history, scoring each ticker across 5d/10d/30d horizons.
- Scans your universe and synthesizes a BUY/HOLD/SELL recommendation for every ticker, with calibrated confidence scores.
- Generates a daily report (Markdown + JSON) with top picks, exit alerts for your holdings, and data-health checks.
- Tracks its own accuracy via an audit system that records every recommendation and measures outcomes against SPY, other benchmarks, or a composite.
# 1. Clone and set up
git clone https://github.com/smkwray/market.git
cd market
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# 2. Configure
cp examples/public/config.public.example.py config.py
cp examples/public/.env.example .env
# Edit .env with your API keys (at minimum, get a free FRED key)
# Edit config.py to add your PORTFOLIO_HOLDINGS and WATCHLIST
# 3. Initialize and run
python src/main.py --init-db
python src/main.py --pipeline fullThe full pipeline will: download data → compute signals → train models → scan → generate a report. First run takes longer (downloading 5 years of history); subsequent daily runs are much faster.
flowchart LR
A["Universe + Config"] --> B["Data Loader\n(yfinance + provider fallbacks)"]
B --> C["Signal Engine\n(technicals + macro + qual)"]
C --> D["ML Engine\n(5d/10d/30d ensemble)"]
D --> E["Replay + Calibration\n(model_predictions + artifacts)"]
D --> F["Scanner\n(BUY/HOLD/SELL synthesis)"]
E --> F
F --> G["Reporter\n(Markdown + JSON + OPENCLAW blocks)"]
G --> H["Dashboard + Notifications"]
I["Artifact Verification\n(manifest + runtime skew)"] --> F
I --> D
| Command | What it does |
|---|---|
python src/main.py --pipeline full |
Run everything end-to-end (data → train → scan → report) |
python src/main.py --pipeline daily |
Daily refresh (skip training, reuse existing model) |
python src/main.py --pipeline daily_auto |
Same as daily but designed for unattended/cron use |
python src/main.py --scan |
Scan only (generate recommendations from existing model) |
python src/main.py --update-news |
Fetch daily market news assessment via Gemini CLI |
python src/main.py --report |
Regenerate the report from the latest scan results |
| Command | What it does |
|---|---|
python src/main.py --train-ml |
Retrain the ML models on current data |
python src/main.py --replay-scan --start YYYY-MM-DD --end YYYY-MM-DD |
Backtest: replay historical scans to build prediction history |
python src/main.py --build-calibration-artifacts --start YYYY-MM-DD --end YYYY-MM-DD |
Build probability calibration from replay data |
python src/main.py --verify-artifacts |
Check model/calibration artifact integrity |
| Command | What it does |
|---|---|
python src/main.py --audit |
Score past recommendations against actual outcomes |
python src/main.py --backfill-labels |
Backfill outcome labels for older recommendations |
python src/main.py --notify |
Send daily notification (webhook or ntfy) |
python src/main.py --notify-weekly |
Send weekly performance summary |
A local web UI for controlling runs, viewing reports, and editing config:
python src/dashboard.py
# Open http://127.0.0.1:5050From the dashboard you can:
- Start/stop any pipeline step (scan, train, audit, news, etc.)
- Schedule recurring jobs (including morning/evening news assessment)
- Browse reports and logs
- Edit config with history and rollback
- View live run status and analytics
MktML tries providers in order until it gets data. If yfinance is down, it moves to Alpaca, then Tiingo, and so on. You don't need all of them — yfinance works with no API key. More providers = more resilience.
| Provider | Key needed? | What it provides |
|---|---|---|
| yfinance | No | Primary source for bulk price data |
| Alpaca | Yes | High-throughput OHLCV fallback |
| Tiingo | Yes | Daily bars fallback |
| Stooq | No | Public endpoint fallback |
| Twelve Data | Yes | CSV API fallback |
| Finnhub | Yes | Price data + fundamentals |
| Polygon | Yes | Price data + fundamentals |
| Alpha Vantage | Yes | OHLCV fallback (supports multiple keys) |
| FMP | Yes | Financial Modeling Prep fallback |
| EODHD | Yes | End-of-day historical fallback |
| FRED | Yes (free) | Macro indicators (VIX, yield curve, credit spreads) |
Set your keys in .env — see examples/public/.env.example for the full list.
Gemini CLI is used for three things:
- Daily news assessment — grounded web search produces signed risk scores for trade policy, geopolitical, regulatory, monetary policy, energy/commodity, and event shock categories. Runs twice daily (morning pre-market + evening post-close) with automatic model fallback.
- Qualitative features — sector/industry classification and moat/debt/maturity ratings for each ticker.
- Price data recovery — last-resort fallback when all standard data providers fail.
Disabled by default if the binary isn't found. News and qual features gracefully default to neutral when unavailable.
Each scan produces a Markdown report with:
- Market regime summary (bull/bear/neutral based on macro signals)
- Data health stats (coverage, stale tickers, fetch failures)
- Top BUY and SELL recommendations with confidence scores
- Portfolio holdings status and exit alerts
- Watchlist updates
Reports include stable OPENCLAW:SUMMARY and OPENCLAW:JSON markers so downstream tools or AI agents can parse them programmatically. See examples/public/market_report.sample.md for the format.
Send daily or weekly summaries to:
- Any webhook endpoint (set
MARKET_NOTIFICATION_WEBHOOK_URL) - An ntfy topic (set
NOTIFICATION_NTFY_TOPIC)
- Ensemble: Random Forest + Gradient Boosting + optional XGBoost, weighted and averaged.
- Three horizons: Separate models for 5-day, 10-day, and 30-day predictions.
- Feature contract: A strict ordered list of 124 features (technicals, macro, news, qualitative) ensures training and inference always use the same inputs. No silent drift.
- Walk-forward validation: Out-of-sample only, with purged/embargo-aware splits to prevent lookahead bias.
- Calibration: Probability outputs are calibrated per horizon so a "70% confidence" score means roughly 70% historical accuracy at that threshold.
- Asset buckets: Optionally trains separate models for equities, ETFs, and bonds when sample sizes are large enough.
| Want to... | Where to look |
|---|---|
| Add a new data provider | src/data_loader.py — add a downloader function and insert it into the fallback chain |
| Add new signal features | src/signals.py for technicals, src/macro_loader.py for macro, src/news_loader.py for news, scripts/update_qual_features.py for qualitative |
| Add a new ML model type | src/ml_engine.py and config.py |
| Add a notification channel | src/notifier.py |
| Add dashboard controls | src/dashboard.py |
src/
main.py Entry point and CLI
data_loader.py Multi-provider data acquisition with fallback
signals.py Technical indicator computation
macro_loader.py FRED macro data and regime features
news_loader.py Daily market news assessment via Gemini CLI
ml_engine.py Model training, inference, and calibration
scanner.py BUY/HOLD/SELL recommendation synthesis
reporter.py Markdown + JSON report generation
audit.py Recommendation outcome tracking
dashboard.py Flask web UI
notifier.py Webhook and ntfy notifications
storage.py SQLite database layer
universe.py Ticker universe management
scripts/ Maintenance, migration, and utility scripts
examples/public/ Sanitized starter config and templates
Threading is auto-configured based on your CPU count. Override with environment variables if needed:
CPU_RESERVED_CORES— cores to leave free (default: 4)ML_N_JOBS— parallel jobs for model trainingSCANNER_WORKERS— parallel workers for data fetching
A sanitized public site is automatically exported to smkwray.github.io/mktml after each daily scan and audit. It includes:
- Daily market report — top BUY/SELL signals, safe assets, market summary (portfolio holdings and watchlist are stripped)
- Model performance dashboard — interactive Plotly charts showing strategy vs. SPY, confidence calibration, and monthly returns
- Audit summary — excess return, beat rate, information ratio, and directional accuracy
The export is handled by scripts/export_public_site.py and deployed via GitHub Actions to GitHub Pages.
MktML is actively maintained and in daily use. The audit system is still accumulating samples for full statistical reliability — see the live performance dashboard for current metrics.
This repo is designed to be safe for public hosting. Credentials, portfolio data, reports, and logs are all gitignored. The public site export (scripts/export_public_site.py) strips portfolio holdings, watchlist, exit alerts, sector/country breakdowns, and internal markers before publishing. A pre-push hook (scripts/public_push_guard.sh) blocks accidental pushes of sensitive files. See PUBLIC_RELEASE.md for the full checklist.
