Card similarity, search, and deck operations for Magic: The Gathering, Pokemon TCG, and Yu-Gi-Oh!.
DeckSage combines tournament co-occurrence embeddings (PecanPy + Word2Vec on 184K deck lists), card attribute fusion, text search (MeiliSearch + Qdrant), visual embeddings (SigLIP2), and Jaccard co-occurrence to find similar cards, complete partial decks, and surface synergies / substitutes / upgrades.
Requires Python 3.11+, Docker (for MeiliSearch + Qdrant). Recommended: uv.
uv sync --extra embeddingsFor development (adds ruff, pytest, playwright):
uv sync --extra dev --extra embeddingsEmbedding files, graph databases, and processed CSVs are not stored in git (too large). To set up a new machine:
- Obtain the data archive (tarball, ~2 GB compressed / ~6 GB uncompressed)
- Extract into the repo root:
tar xzf decksage-demo-data.tar.gz - This creates the required files under
data/embeddings/,data/graphs/,data/processed/, anddata/cache/
Required assets (referenced in .env.example):
data/embeddings/magic_v7_spectral_mu35.wv(14 MB)data/embeddings/pokemon_v7_fused.wv(2.3 MB)data/embeddings/yugioh_v7_spectral_mu3.wv(7.4 MB)data/graphs/{magic,pokemon,yugioh}_unified.db(5.3 GB total -- card metadata, co-occurrence)data/processed/pairs_*.csv(456 MB -- Jaccard signal)data/processed/card_attributes_*_enriched.csv(46 MB)data/cache/text_embeddings/(205 MB -- optional, rebuilds on startup)
# 1. Start search backends (MeiliSearch + Qdrant)
docker compose up -d meilisearch qdrant
# 2. Copy and edit .env (defaults in .env.example work if data assets are in place)
cp .env.example .env
# 3. Start the API (loads all 3 games, ~40s startup)
uv run uvicorn src.ml.api.api:app --host 127.0.0.1 --port 8001Open http://localhost:8001 for the web UI, or http://localhost:8001/docs for interactive API docs.
DECKSAGE="uv run src/ml/cli/main.py"
$DECKSAGE --game magic similar "Lightning Bolt" --k 5 --output table
$DECKSAGE --game magic search "destroy all creatures" --limit 5
$DECKSAGE --game pokemon similar "Ultra Ball" --k 5
$DECKSAGE health --output json| Endpoint | Method | Description |
|---|---|---|
/live, /ready |
GET | Liveness / readiness probes |
/v1/games |
GET | List loaded games and card counts |
/v1/health?game=magic |
GET | Per-game health (card count, embedding dim) |
/v1/similar |
POST | Card similarity (all 6 modes, custom weights) |
/v1/cards/{name}/similar |
GET | Card similarity (convenience GET) |
/v1/cards/{name}/contextual |
GET | Contextual suggestions (synergies, alternatives, upgrades, downgrades) |
/v1/cards?prefix=Light |
GET | Card name autocomplete |
/v1/search |
GET/POST | Hybrid text + vector search |
/v1/deck/complete |
POST | Deck completion (greedy fill to target size) |
/v1/deck/suggest_actions |
POST | Deck improvement suggestions |
/v1/deck/apply_patch |
POST | Apply add/remove operations to a deck |
/v1/feedback |
POST | Submit user feedback |
Interactive docs at /docs when the server is running.
| Mode | Method | Use case |
|---|---|---|
substitute |
Embedding cosine | Functional replacements (same role/effect) |
synergy |
Jaccard co-occurrence | Cards that go in the same deck |
meta |
Meta fusion | Competitive metagame pairings |
fusion |
Weighted late fusion | Blended signal (all methods) |
embedding |
Raw embedding cosine | Direct embedding similarity |
jaccard |
Jaccard index | Direct co-occurrence overlap |
184K tournament deck lists (Magic 83K, Yu-Gi-Oh 77K, Pokemon 24K) from MTGGoldfish, MTGTop8, Limitless TCG, MasterDuelMeta, YGOProDeck, Archidekt. Six similarity signals: co-occurrence embeddings (PecanPy + Word2Vec, 128D), text embeddings (E5-base-v2), visual embeddings (SigLIP2), Jaccard co-occurrence, functional tag matching, and archetype similarity. See scripts/training/ for the full pipeline and docs/experimental_narrative.md for a walkthrough of 63 experiments.
100K LLM-generated annotations across 3 games ($25 total via Groq 70B + Cerebras 235B cascade). See data/experiments/SUMMARY.md for all 63 experiments.
Co-occurrence embeddings (v7, substitute nDCG@10, saturated):
| Game | nDCG | Condensed | Gap |
|---|---|---|---|
| Magic | 0.525 | 0.527 | 0.002 |
| Pokemon | 0.437 | 0.438 | 0.001 |
| Yu-Gi-Oh | 0.478 | 0.482 | 0.004 |
Text embeddings (E5-base-v2, condensed substitute nDCG@10, 14-25% better):
| Game | Condensed nDCG | vs Co-occurrence |
|---|---|---|
| Magic | 0.613 | +22% |
| Pokemon | 0.518 | +25% |
| Yu-Gi-Oh | 0.532 | +14% |
Eval scripts in scripts/evaluation/: eval_per_mode.py, eval_search_relevance.py, eval_deck_completion.py, intrinsic_eval.py.
src/ml/ Python ML code (similarity, deck building, search, API, CLI)
src/ml/tests/ Test suite (818 tests)
src/backend/ Go backend (data extraction for 6 games)
frontend/ Web frontend (unified light theme)
scripts/ Training, annotation, evaluation, data processing
tests/e2e/ Playwright E2E tests (45 tests)
data/ Embeddings, pairs, enriched CSVs, annotations, test sets
just test # full Python test suite (818 tests)
just lint # ruff check + format
npx playwright test # E2E tests (45 tests, requires running server + search backends)Search backends via Docker:
just qa-deps-up # start MeiliSearch + Qdrant
just qa-deps-down # stopdocs/experimental_narrative.md-- walkthrough of 63 experiments (12 phases), intended for ML audiencesdata/experiments/SUMMARY.md-- experiment index with metricsdocs/failure_taxonomy.md-- categorized failure modes from manual analysisdocs/figures/experiment_progression.png-- nDCG progression across experiments
MIT; see LICENSE.