Skip to content

Conversation

@layterz
Copy link

@layterz layterz commented Nov 28, 2025

Added the plumbing and multi agent setup for the TCG environment so it can be trained and eval'd from puffer. Note, I've not spent much time defining any reward shaping and left it as placeholder. I don't have the compute to train a real policy for this anyway and wanted to keep the PR focused on getting the plumbing to puffer working. Running eval without a trained policy shows random actions being taken, which I think demonstrates this is working.

Key changes:

  • Add the python, binding, .ini and environment setup so that tcg can be run as an env from the puffer command
  • Define NUM_PLAYERS (default 2) and write observations separately for each. This is then used to self-play when training/evaluating
  • Hide debug statements by default. To enable run the executable with TCG_DEBUG=true
  • Define a very basic reward signal of 1/-1 for win loss and -0.5 for a draw. A draw is triggered when MAX_TURNS is reached
  • Convert turn from 1/0 toggle to a count of turns and use % NUM_PLAYERS to get the current active player

output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant