PokerMon

Deep Counterfactual Regret Minimization (Deep CFR) for 6-player No-Limit Texas Hold'em.

PokerMon trains neural networks to approximate Nash equilibrium strategies in multiplayer poker using external sampling MCCFR with function approximation. The implementation is validated on Kuhn poker and Leduc Hold'em before scaling to full 6-max NLHE.

Project Structure

pokermon/
  game/          # Game logic: cards, deck, hand eval, engine, Kuhn, Leduc
  cfr/           # CFR algorithms: Deep CFR, tabular CFR+, traversal, regret matching
  net/           # Neural networks: advantage net, strategy net, encoders
  eval/          # Evaluation: exploitability, arena, baselines, metrics
  train/         # Training: config, trainer, checkpointing
  utils/         # Logging, card utilities
scripts/         # Training and evaluation entry points
configs/         # YAML configs for Kuhn, Leduc, and 6-max NLHE
tests/           # Test suite (75+ tests)

How It Works

Deep CFR replaces the regret tables in vanilla CFR with neural networks:

Traverse the game tree using external sampling MCCFR
Collect counterfactual regrets and strategy samples into reservoir buffers
Train advantage networks (one per player) from scratch each iteration to predict regrets
Train a strategy network on weighted strategy samples to produce the average policy
The strategy network's output converges to a Nash equilibrium

Key implementation details:

Advantage networks are rebuilt from scratch each iteration (not fine-tuned)
Linear CFR weighting (iteration^1.5) prioritizes later, more accurate samples
Reservoir sampling keeps memory bounded regardless of iteration count
Exploitability is computed via best-response traversal for small games

Setup

python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev]"

Requires Python 3.10+ and PyTorch 2.0+.

Training

Kuhn poker (3-card toy game, ~5 seconds):

python scripts/train_kuhn.py

Leduc Hold'em (6-card game, ~40 minutes on CPU):

python scripts/train_leduc.py --iterations 200

Heads-up No-Limit Hold'em (GPU, e.g. 3070 Ti):

python -c "from pokermon.train.config import TrainConfig; from pokermon.train.trainer import Trainer; t = Trainer(TrainConfig(game='nlhe_hu', num_players=2, small_blind=50, big_blind=100, starting_stack=10000, num_iterations=100, traversals_per_iter=5000, hidden_dim=256, num_layers=4, num_actions=7, advantage_sgd_steps=2000, strategy_sgd_steps=2000, strategy_train_every=5, lr=1e-3, batch_size=2048, buffer_capacity=1000000, checkpoint_dir='checkpoints/nlhe_hu', checkpoint_every=25, log_dir='runs/nlhe_hu', device='cuda', seed=42)); t.train()"

6-max No-Limit Hold'em (full game):

python scripts/train_nlhe.py --config configs/nlhe6.yaml

Evaluation

Run baseline matchups (Random, CallStation, FoldBot, AggressiveBot):

python scripts/evaluate.py --hands 10000

Play interactively against the agent:

python scripts/play.py --num-players 2

Validation Results

The implementation is validated through a series of gates on progressively harder games:

Game	Method	Exploitability	Threshold
Kuhn	Deep CFR (150 iter)	0.141	< 0.15
Kuhn	Tabular CFR+ (10K iter)	0.002	< 0.01
Leduc	Tabular CFR+ (10K iter)	0.011	< 0.10
Leduc	Deep CFR (200 iter)	0.171	< 0.20

For reference, Kuhn poker's Nash equilibrium game value is -1/18 (~-0.0556); tabular CFR+ converges to -0.0499.

Tests

pytest                          # all tests
pytest -k "not deep_cfr"       # skip slow Deep CFR tests (~15s)

Configuration

Training configs are in configs/. Key parameters:

Parameter	Kuhn	Leduc	NLHE 6-max
`hidden_dim`	64	128	512
`num_layers`	2	3	4
`num_actions`	2	3	7
`buffer_capacity`	100K	500K	2M
`traversals_per_iter`	1000	2000	10K
`batch_size`	256	512	2048

References

Deep Counterfactual Regret Minimization (Brown et al., 2019)
Regret Minimization in Games with Incomplete Information (Zinkevich et al., 2007)

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
checkpoints/nlhe6		checkpoints/nlhe6
configs		configs
frontend		frontend
notebooks		notebooks
pokermon		pokermon
scripts		scripts
server		server
tests		tests
.gitignore		.gitignore
.railwayignore		.railwayignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
README.md		README.md
modal_serve.py		modal_serve.py
modal_train.py		modal_train.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PokerMon

Project Structure

How It Works

Setup

Training

Evaluation

Validation Results

Tests

Configuration

References

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PokerMon

Project Structure

How It Works

Setup

Training

Evaluation

Validation Results

Tests

Configuration

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages