Connect 4 AI is a computer-powered opponent that leverages advanced algorithms to challenge human players. Far more than a simple game of dropping pieces into a grid, it is a battle of tactics, foresight, and strategy. Every move influences your opponent’s subsequent choices, requiring not only immediate tactical responses but also long-term planning.
Our system systematically analyzes board states to identify winning patterns, block threats, and plan multi-turn strategies. By combining simulation techniques with deep learning models, we have built an AI that not only plays Connect 4 at a high level but also provides a practical example of applying advanced machine learning to strategic decision-making.
This project details our approach—from generating training data using Monte Carlo Tree Search (MCTS) to training both Convolutional Neural Network (CNN) and Transformer models, and finally deploying the solution using Docker on AWS Lightsail.
This project doesn't contain data folder which have generated data from generate_data.py
as the the generated boards and moves were way to large and to run this code please make directory named "data" and store the generated data inside it. Thank You!
Screen.Recording.2025-04-03.at.2.21.39.PM.mp4
- Purpose:
Creates a new board state by placing a piece of the given color in the specified column. - Inputs:
board_temp
: Current 6×7 board state (NumPy array).color
: Either'plus'
(represents 1) or'minus'
(represents -1).column
: Column index (0–6) where the piece is dropped.
- Output:
Returns the updated board state.
- Purpose:
Places a piece of the given color in the specified column in-place and returns the row where it was placed. - Inputs:
board
: The current board state.color
:'plus'
or'minus'
.column
: Column index for the move.
- Output:
The row index where the piece lands, or -1 if the column is full.
- Purpose:
Resets the board cell at the given row and column back to 0 (empty). - Inputs:
board
: The current board state.row
: The row index of the move.column
: The column index of the move.
- Purpose:
Checks the entire board for a winning sequence (vertical, horizontal, or diagonal) using a brute-force approach. - Inputs:
board
: The current 6×7 board state.
- Output:
Returns a string indicating the winning pattern (e.g.,'v-plus'
,'h-minus'
, etc.) or'nobody'
if no win is detected.
- Purpose:
Checks if the last move (placed in columncol
) resulted in a win. - Inputs:
board
: The current board state.col
: Column index of the last move.
- Output:
Returns a string representing the winning pattern (e.g.,'v-plus'
) or'nobody'
if there is no winner.
- Purpose:
Determines which columns still have available space (i.e., legal moves). - Inputs:
board
: Current board state (either 6×7 or 6×7×2; if 6×7×2, it is converted).
- Output:
Returns a list of column indices (0–6) where a move can be made.
- Purpose:
Checks all legal moves to see if any move results in an immediate win for the given color. - Inputs:
board
: Current board state.color
:'plus'
or'minus'
.
- Output:
Returns the column index of the winning move, or -1 if none is found.
- Purpose:
Identifies all legal moves that do not immediately allow the opponent to win. - Inputs:
board
: Current board state.color
: The player’s color.
- Output:
Returns a list of column indices that are considered safe moves.
- Purpose:
Updates the MCTS dictionary (md
) for each board state in the given path based on the simulation outcome. - Inputs:
winner
: The winning pattern from the simulation.path
: List of board states (as tuples) traversed during simulation.color0
: The player for whom MCTS is running.md
: Dictionary mapping board states to statistics[visits, score]
.
- Purpose:
Performs a randomized simulation (rollout) from the current board until a terminal state (win/tie) is reached. - Inputs:
board
: Current board state.next_player
: The color to move next.debug
: Optional flag to print debugging output.
- Output:
Returns the outcome of the simulation ('nobody'
,'tie'
, or a win code).
- Purpose:
Runs Monte Carlo Tree Search for a specified number of iterations (nsteps
) to determine the best move. - Inputs:
board_temp
: Current board state.color0
: The AI’s color.nsteps
: Number of iterations for the search (higher value improves quality but increases computation).
- Output:
Returns the column index of the best move found by MCTS.
- Purpose:
Prints an ASCII representation of the board to the console using:'X'
for player+1
'O'
for player-1
- Blanks for empty cells.
- Inputs:
board
: Current board state.
- Output:
Displays the board along with column indices.
In the main block (when running python connect4.py
), the following steps occur:
- Initialization:
- An empty board is created.
- The player chooses whether to go first.
- Game Loop:
- The board is displayed.
- If it's the human's turn, input is requested until a legal move is made.
- If it's the AI's turn, MCTS is used to determine the move.
- The move is applied using
update_board()
, and win status is checked. - Turns alternate until a winner or tie is declared.
- Final Display:
- The final board state is printed and the result (win/tie) is announced.
To train our Connect 4 AI effectively, we needed a vast dataset of board states and optimal moves. Manually labeling millions of positions was impractical, so we leveraged Monte Carlo Tree Search (MCTS) to generate high-quality training data through self-play.
-
Simulating Possible Games:
MCTS simulates numerous games from various board positions. In each simulation (or rollout), the algorithm randomly plays out moves to explore potential outcomes. -
Tracking Move Effectiveness:
Every move is recorded along with the eventual win/loss result from the simulation. This statistical evaluation helps the AI determine which moves lead to better outcomes. -
Balancing Exploration and Exploitation:
The algorithm employs the Upper Confidence Bound (UCB1) formula to strike a balance between:- Exploration: Trying new moves to discover their potential.
- Exploitation: Refining moves that are already known to be effective.
-
Refining Move Selection:
As more simulations are run, MCTS learns to select statistically better moves. It also includes pre-checks for immediate wins or blocks, ensuring that obvious tactical moves are prioritized.
-
Self-Play:
MCTS is used for self-play, where the AI plays roughly 40,000 games against itself, each time recording the best move for every board state. -
Recording Moves:
Each board state is captured as a 6×7 tensor (or optionally as a 6×7×2 tensor with separate channels for each player), and the optimal move is stored. -
Data Augmentation:
The board states are mirrored horizontally to double the dataset size, preserving the strategic context while enhancing data diversity. -
Parallel Processing:
Multiple simulations run concurrently via multiprocessing, drastically reducing the time needed to generate the large dataset. -
Efficient Storage:
Given the large volume (over 1.8 million snapshots), we use NumPy memmaps to manage and update the dataset without exceeding memory limits.
This extensive MCTS-driven self-play process provided the robust training data necessary to develop our AI models.
Our first AI agent is built using Convolutional Neural Networks (CNNs), which are particularly well-suited for grid-based games like Connect 4. CNNs excel at capturing local spatial patterns—critical for detecting winning formations, blocking moves, and setting up tactical traps.
-
Input Representation:
The game board is represented as a 6×7×2 tensor. Each channel corresponds to one player's pieces:- Channel 0: Indicates a 'plus' (represented by +1)
- Channel 1: Indicates a 'minus' (represented by -1)
-
Initial Convolutional Layer:
- Uses 64 filters of size 3×3 with "same" padding to preserve the board dimensions.
- Applies L2 regularization (1e-4), followed by Batch Normalization and ReLU activation to extract low-level spatial features.
-
Residual Blocks:
- Inspired by ResNet, multiple residual blocks are used to allow the network to learn deeper features without the vanishing gradient problem.
- Each block consists of two convolutional layers (with batch normalization and ReLU) and includes a skip connection that adds the block’s input to its output.
-
Max Pooling Layers:
- Applied after selected residual blocks to downsample the spatial dimensions.
- This reduction in size helps the model focus on larger, more abstract patterns and reduces computational complexity.
-
Fully Connected Layers:
- The output from the convolutional and pooling layers is flattened into a one-dimensional vector.
- A dense layer with 512 units (with ReLU activation) processes these features, followed by Batch Normalization and Dropout (50%) for regularization.
-
Output Layer:
- A final dense layer with 7 units and Softmax activation produces a probability distribution over the 7 possible moves (columns), indicating the likelihood of each move leading to a win.
-
Data:
The CNN is trained on a vast dataset of board states generated via MCTS self-play. Each board is converted into a 6×7×2 tensor. -
Optimization:
- The model is optimized using Adam (or another suitable optimizer) with a scheduled learning rate.
- Regularization through dropout and batch normalization ensures the network generalizes well.
-
Results:
- The CNN-based model achieves a validation accuracy of approximately 76%.
- It is particularly effective at recognizing localized spatial patterns—vital for blocking opponent moves and executing immediate winning strategies.
-
Spatial Pattern Recognition:
CNNs naturally excel at detecting local patterns, which is crucial for recognizing winning configurations in Connect 4. -
Efficiency and Robustness:
The architecture is computationally efficient and benefits from residual connections and regularization, resulting in a model that trains quickly and performs reliably across various board states.
# Input shape: 6x7x2
inputs = tf.keras.Input(shape=input_shape)
# Initial convolution with 3x3 filters
x = tf.keras.layers.Conv2D(64, (3, 3), padding='same',
kernel_regularizer=tf.keras.regularizers.l2(1e-4),
use_bias=False)(inputs)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.ReLU()(x)
# Residual Blocks
x = residual_block(x, 64)
x = residual_block(x, 64)
# Max Pooling to downsample spatial dimensions
x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
x = residual_block(x, 128)
x = residual_block(x, 128)
x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2))(x)
x = residual_block(x, 256)
x = residual_block(x, 256)
# Fully connected layers
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(512, activation='relu',
kernel_regularizer=tf.keras.regularizers.l2(1e-4))(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Dropout(0.5)(x)
# Output layer with softmax for 7 possible moves
outputs = tf.keras.layers.Dense(num_classes, activation='softmax')(x)

While Convolutional Neural Networks (CNNs) excel at spatial pattern recognition, Transformers offer a unique approach by focusing on sequence-based decision-making. In Connect 4, understanding board states holistically—beyond just local patterns—is crucial for strategic play. Our Transformer-based model leverages self-attention mechanisms to analyze the game board and predict optimal moves.
Unlike CNNs, which focus on nearby spatial dependencies, the Transformer model considers global relationships across the board. The architecture consists of several key components:
- The board is represented as a 6×7×2 tensor, where:
- Channel 0 represents the
plus
player's pieces. - Channel 1 represents the
minus
player's pieces.
- Channel 0 represents the
- This format ensures that the model understands both player perspectives.
- The board tensor is passed through a dense (fully connected) layer to map board values into a high-dimensional feature space.
- This step helps the model extract meaningful numerical representations of board states.
- Since Transformers were originally designed for sequences (e.g., text data), they lack an inherent sense of position.
- A Positional Encoding Layer is added to embed spatial information into the board representation.
- This enables the Transformer to recognize board locations and their significance.
The core of our model is built using multiple Transformer blocks, each consisting of:
-
Multi-Head Self-Attention:
- This mechanism enables the model to focus on different board areas simultaneously.
- It helps detect relationships between pieces that may be far apart on the board.
-
Feedforward Network (FFN):
- A two-layer dense network that applies non-linear transformations to refine feature representations.
- The first layer uses GELU activation, known for smoother gradient updates.
-
Layer Normalization & Dropout:
- Used for stabilizing training and reducing overfitting.
- The final board representation is flattened and passed through dense layers:
- 256 dense units → ReLU activation
- 128 dense units → ReLU activation
- Dropout layers for regularization
- The output layer is a softmax layer with 7 units, corresponding to the probability distribution over valid columns (0–6).
class PositionalEncoding(tf.keras.layers.Layer):
def __init__(self, embed_dim, height, width):
super(PositionalEncoding, self).__init__()
self.embed_dim = embed_dim
self.height = height
self.width = width
self.position_embeddings = Embedding(input_dim=height * width, output_dim=embed_dim)
def call(self, inputs):
position_indices = tf.range(start=0, limit=self.height * self.width, delta=1)
position_embeddings = self.position_embeddings(position_indices)
position_embeddings = tf.expand_dims(position_embeddings, axis=0)
return inputs + position_embeddings
def create_transformer_model(input_shape, embed_dim=128, num_heads=8, ff_dim=256, num_transformer_blocks=3, dropout_rate=0.2):
inputs = Input(shape=input_shape) # Expected shape: (6, 7, 2)
x = Dense(embed_dim)(inputs) # Linear projection
# Flatten the 6x7 board into a sequence of 42 tokens
x = Reshape((input_shape[0] * input_shape[1], embed_dim))(x)
# Add positional encoding
x = PositionalEncoding(embed_dim, input_shape[0], input_shape[1])(x)
# Transformer blocks
for _ in range(num_transformer_blocks):
x = TransformerBlock(embed_dim, num_heads, ff_dim, dropout_rate)(x)
x = Flatten()(x)
x = Dense(256, activation='relu')(x)
x = Dropout(0.3)(x)
x = Dense(128, activation='relu')(x)
x = Dropout(0.2)(x)
# Output layer for move prediction
outputs = Dense(7, activation='softmax')(x)
model = Model(inputs=inputs, outputs=outputs)
return model

In our exploration of deep learning approaches for Connect 4, we implemented two distinct models:
- CNN-based Model (ResNet-inspired)
- Transformer-based Model (Self-Attention Mechanism)
Each model was trained on the same dataset generated by Monte Carlo Tree Search (MCTS), but their performances varied significantly due to differences in architectural strengths and limitations.
Model | Accuracy | Training Speed | Computational Cost | Strengths | Weaknesses |
---|---|---|---|---|---|
CNN-based Model | 76% | Fast | Lower | Excellent pattern recognition, effective at tactical moves | Struggles with long-term planning |
Transformer Model | 67% | Slow | Higher | Holistic board awareness, flexible for long-term strategies | Requires more data, struggles with short-term tactics |
🔹 CNNs excelled at recognizing localized spatial patterns essential for short-term tactical play.
🔹 Transformers, while better at long-term strategic planning, struggled with Connect 4’s spatial dependencies.
This section details the Docker-based deployment of the Connect 4 AI game on AWS Lightsail. The project integrates Django, TensorFlow, and Anvil Uplink within a containerized environment for efficient deployment and management.
By using Docker and Docker Compose, we ensure:
- Portability: The application runs in an isolated container across different environments.
- Scalability: Future improvements can leverage Kubernetes for container orchestration.
- Automation: CI/CD pipelines can be incorporated for continuous deployment.
The deployment process includes:
- Setting up Docker on the AWS Lightsail instance.
- Transferring project files (codebase, models, and dependencies).
- Building & launching the Docker container to host the backend AI service.
The Dockerfile defines the containerized environment, ensuring all dependencies (Python, TensorFlow, Django, Anvil Uplink) are installed and managed.
This section outlines the process of containerizing the Connect 4 game using Docker and deploying it on AWS Lightsail. The implementation, which incorporates Django, TensorFlow, and Anvil Uplink, was containerized through a carefully configured Dockerfile. The Dockerfile copies project files, sets the working directory, and manages dependencies within the container, while Docker Compose streamlines the management of multiple services, including restarting containers automatically. We encountered challenges, such as understanding image vs. container workflows, handling large dependencies like TensorFlow models, and ensuring proper file transfer to AWS using FileZilla.
The Docker Compose file simplifies the process by defining multiple services, including automatic restarts.
Below are the essential commands to build, launch, and debug the container:
# Build the Docker image
sudo docker compose build
# Run the container in detached mode
sudo docker compose up -d
# View logs for debugging
sudo docker compose logs
# Verify running containers
sudo docker compose ps
# Stop and remove containers
sudo docker compose down
Debugging best practices include verifying file permissions, fine-tuning TensorFlow GPU settings, and routinely removing unused images and containers to maintain an efficient environment.
Future enhancements focus on automating deployments through CI/CD pipelines, leveraging Kubernetes for scalable container orchestration, and optimizing GPU utilization to improve performance within the Docker environment.
The game experiences some lag due to the backend server processing board images, handling move logic, and returning responses to the frontend (Code for reference provided below). This delay impacts responsiveness, and further optimizations can be implemented as a future scope to improve the user experience by accelerating these processes.
docker system prune -a
Our models performed exceptionally well in recognizing obvious wins and blocking immediate threats. Some examples of boards where the AI excelled included:
- Direct wins – The AI instantly played in the winning column when a four-in-a-row opportunity was available.
- Immediate blocks – When the opponent was one move away from winning, the model correctly placed a piece to prevent a loss.
- Tactical setups – The AI identified double trap strategies, in which it set up two different winning moves simultaneously. This forced the opponent into a losing position, as they could only block one threat while the AI capitalized on the other.
We encountered multiple challenges during deployment:
- Understanding Image vs. Container Workflows: Differentiating between image creation and container execution required fine-tuning.
- Handling Large TensorFlow Dependencies: The first deployment failed due to excessive memory usage when loading deep learning models.
- File Transfer Issues: Using FileZilla to transfer project files to AWS ensured complete dependency management.
To enhance performance, we propose:
- CI/CD Pipeline Automation: Automating deployments via GitHub Actions or Jenkins.
- Using Kubernetes: For scalable container orchestration and efficient GPU allocation.
- Optimizing AI Response Time: Currently, the backend processes the board state before returning the next move. Future improvements will focus on reducing response delay.