This is my firts try to build project using Neural Network based on TensorFlow.NET. This project at earlier Alfa stage of development. 80% of this code was generated by Gemini AI using chat capabilities. Project developed using Windsurf AI IDE (https://windsurf.com) Please use with caution. There a lot of issues and bugs (But project is compilable and can be run). Primilay scope of the issues:
- TensorFlow.NET and dependent packages compatibility.
- Code need to be tested to prevent RAM and GPU memory leaks.
- Initial traning data file size can be ~20gb. And need to be optimised
This document provides a technical overview of the Ataxx AI Training Solution, a distributed multi-project system designed to train a high-performance game-playing agent for the game of Ataxx.
The architecture follows a modern, AlphaZero-like approach, creating a self-improvement "flywheel." In this loop, the AI generates its own training data through self-play, a trainer consumes this data to produce stronger models, and an evaluator promotes the best-performing models. This process allows the AI to continuously improve its strategic understanding of the game.
The system is composed of four distinct but interconnected projects, communicating through a central web API and a shared file system, enabling it to scale across multiple machines for efficient, parallelized training.
The solution operates on a distributed Controller/Worker pattern. The core components work in a continuous cycle:
- Self-Play:
SelfPlayWorker
instances fetch the current best model from theController
. They play thousands of games against themselves, using a Monte Carlo Tree Search (MCTS) guided by the model's predictions. - Data Aggregation: The results of every game—each move, the board state, the MCTS policy, and the final game outcome—are logged as
(State, Policy, Value)
tuples into a centraltraining_data.jsonl
file on a shared drive. - Training: The
Trainer
application continuously monitors the shared drive. It loads the new game data, preprocesses it into tensors, and uses it to train the neural network, producing a newcandidate
model. - Evaluation: The
Controller
detects the new candidate model. It orchestrates a head-to-head match of ~100 games between the currentbest
model and the newcandidate
. - Promotion: If the
candidate
model wins the evaluation match by a statistically significant margin, theController
promotes it to become the newbest
model.
This cycle then repeats, with the now-stronger model generating higher-quality data for the next round of training.
This is the foundational class library shared by all other projects in the solution. It contains the essential logic and data structures for the game and the AI.
- Purpose: To provide a single, reusable engine for game mechanics, AI search, and neural network interaction.
- Key Components:
AtaxxLogic
,BitboardState
,MctsEngine
,MCTSNode
,PredictionService
,TrainingGameLog
.
This is a console application responsible for the "Self-Play" phase of the training loop. Multiple instances can be run in parallel.
- Purpose: To generate high-quality training data by playing games using the current best AI model.
- Key Components:
SelfPlayJob
,GameSimulator
. - Interactions: Calls the
Ataxx.Controller
API to get the latest model and writes game logs to the shared drive.
This console application is the heart of the learning process, designed to run on a machine with a powerful GPU.
- Purpose: To train new, improved neural network models from the data generated by the
SelfPlayWorker
s. - Key Components:
ModelTrainer
,DataPreprocessor
,TrainingJob
. - Interactions: Reads game logs from the shared drive and writes new
_candidate
models back to it.
This is an ASP.NET Core web application that acts as the central coordinator for the entire distributed system.
- Purpose: To manage the model registry and orchestrate the evaluation process.
- Key Components:
ModelController
,ModelRegistryService
,EvaluationJob
. - Interactions: Manages model files on the shared drive and responds to API requests from workers.
The "brain" of the AI is a deep neural network, implemented in Ataxx.Trainer
, with an architecture inspired by AlphaZero. This design allows the network to learn complex spatial patterns and game strategies directly from the board state.
-
Input Tensor: The network takes a
7x7x4
tensor as input, representing the complete state of the game from the current player's perspective.- Channel 1: A plane with
1
s representing the current player's pieces,0
s otherwise. - Channel 2: A plane with
1
s representing the opponent's pieces,0
s otherwise. - Channel 3: A plane indicating the positions of permanently blocked squares.
- Channel 4: A plane filled entirely with a constant value indicating whose turn it is, providing the model with context.
- Channel 1: A plane with
-
Network Body: The core of the network consists of several convolutional layers (
Conv2D
). These layers are exceptionally effective at recognizing spatial patterns and relationships between pieces on the 7x7 game board. -
Dual-Output Heads: The network has two distinct outputs, which are trained simultaneously:
- Policy Head: A vector of 1176 probabilities (49 'from' squares * 24 possible moves), processed through a
Softmax
activation. This head predicts the probability distribution of the best possible moves from the current state. It is used by the MCTS engine to guide its search towards more promising actions. - Value Head: A single scalar value, processed through a
Tanh
activation to be between -1 and 1. This head predicts the expected outcome of the game from the current state (-1
= likely loss,+1
= likely win). This is used to evaluate leaf nodes in the MCTS, replacing the need for random rollouts.
- Policy Head: A vector of 1176 probabilities (49 'from' squares * 24 possible moves), processed through a
The system is designed for flexible deployment across multiple machines to maximize training efficiency. Communication is handled via the ASP.NET Core API for control and a shared network drive (e.g., a Samba share) for high-volume data transfer.
This is the ideal configuration, assigning specialized roles to each machine.
- Machine #1 (24 Cores CPU Xeon): Controller & Data Hub
- Responsibilities: Hosts the
Ataxx.Controller
API, manages the shared data drive, and runs CPU-based instances of theAtaxx.SelfPlayWorker
to contribute to data generation.
- Responsibilities: Hosts the
- Machine #2 (Mid level GPU): Self-Play & Evaluation Worker
- Responsibilities: Its primary role is to run the
Ataxx.SelfPlayWorker
, leveraging its GPU for fast MCTS rollouts. Its secondary role is to perform the evaluation matches between candidate and best models when tasked by the Controller.
- Responsibilities: Its primary role is to run the
- Machine #3 (High-spec GPU): Primary Training Worker
- Responsibilities: This machine's sole focus is running the
Ataxx.Trainer
application. It continuously ingests data from the shared drive and uses its powerful GPU for the heavy-lifting of network training.
- Responsibilities: This machine's sole focus is running the
The system is fully functional in a two-machine setup, consolidating roles effectively.
- Machine #1 (24 Cores CPU Xeon): The "Controller & Thinker"
- Responsibilities: Runs the
Ataxx.Controller
API, hosts the shared drive, runs CPU-basedSelfPlayWorker
instances, and takes on the role of the Evaluation Machine to compare models.
- Responsibilities: Runs the
- Machine #2 (High-spec GPU): The "Trainer & Power-Player"
- Responsibilities: Runs the
Ataxx.Trainer
application to handle all network training. It also runs theAtaxx.SelfPlayWorker
, using its GPU to generate high-quality game data at high speed.
- Responsibilities: Runs the
To start the full AI training pipeline, the applications should be launched in the following order:
- Start the Controller: Run the
Ataxx.Controller
web application on the designated machine. - Start the Trainer: On the primary GPU machine, run the
Ataxx.Trainer
console application. - Start the Self-Play Workers: On all participating machines, run instances of the
Ataxx.SelfPlayWorker
console application.
Once all components are running, the system is fully operational and will autonomously work to improve the Ataxx AI.