TCG training #430

layterz · 2025-11-28T18:46:09Z

Added the plumbing and multi agent setup for the TCG environment so it can be trained and eval'd from puffer. Note, I've not spent much time defining any reward shaping and left it as placeholder. I don't have the compute to train a real policy for this anyway and wanted to keep the PR focused on getting the plumbing to puffer working. Running eval without a trained policy shows random actions being taken, which I think demonstrates this is working.

Key changes:

Add the python, binding, .ini and environment setup so that tcg can be run as an env from the puffer command
Define NUM_PLAYERS (default 2) and write observations separately for each. This is then used to self-play when training/evaluating
Hide debug statements by default. To enable run the executable with TCG_DEBUG=true
Define a very basic reward signal of 1/-1 for win loss and -0.5 for a draw. A draw is triggered when MAX_TURNS is reached
Convert turn from 1/0 toggle to a count of turns and use % NUM_PLAYERS to get the current active player

add python binding for tcg

cbbcd57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TCG training #430

TCG training #430

Uh oh!

layterz commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

TCG training #430

Are you sure you want to change the base?

TCG training #430

Uh oh!

Conversation

layterz commented Nov 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant