Skip to content

Commit 2ae429b

Browse files
committed
Update readme
1 parent b25d9e5 commit 2ae429b

File tree

1 file changed

+28
-11
lines changed

1 file changed

+28
-11
lines changed

README.md

Lines changed: 28 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -48,12 +48,14 @@ pip install -r requirements.txt
4848
> [!NOTE]
4949
> This section is highly technical; feel free to skip it and play with the code right away.
5050
51-
#### Agents
51+
#### Algorithms for Board Games
5252

5353
Most computer algorithms discretize the game into states and actions. Here, the state is the position of the pawns and the available actions are the possible moves of the pawns.
5454

5555
Squadro is a finite state machine, meaning that the next state of the game is completely determined by the current state and the action played. With this definition, one can see that the game is a Markov Decision Process (MDP). At each state, the current player can play different actions, which lead to different states. Then the next player can play different actions from any of those new states, etc. The future of the game can be represented as a tree, whose branches are the actions that lead to different states.
5656

57+
#### Exploration - Exploitation trade-off
58+
5759
An algorithm can explore that space of possibilities to infer the best move to play now. As the tree is huge, it is not possible to explore all the possible paths until the end of the game. Typically, they explore only a small fraction of the tree and then use the information gathered from those states to make a decision. More precisely, those two phases are:
5860

5961
* **State exploration**: exploring the space of states by a careful choice of actions. The most common exploration methods are Minimax and Monte Carlo Tree Search (MCTS). Minimax explores all the states up to a specific depth, while MCTS navigates until it finds a state that has not been visited yet. Minimax can be sped up by skipping the search in the parts of the tree that won't affect the final decision; this method is called alpha-beta pruning.
@@ -63,13 +65,15 @@ An algorithm can explore that space of possibilities to infer the best move to p
6365
* Q value function, a lookup table for each state and action;
6466
* deep Q network (DQN), a neural network that approximates the Q value function, which is necessary when the state space is huge (i.e., cannot be stored in memory).
6567

66-
List of available agents:
68+
#### Agents
69+
70+
At least 8 agents, each running a different algorithm, have been implemented to play the game:
6771

6872
* _human_: another local human player (i.e., both playing on the same computer)
6973
* _random_: a computer that plays randomly among all available moves
7074
* _ab_relative_advancement_: a computer that lists the possible moves from the current position and evaluates them directly (i.e., it "thinks" only one move ahead), where the evaluation function is the player's advancement
7175
* _relative_advancement_: a computer that lists the possible moves from the current position and evaluates them directly (i.e., it "thinks" only one move ahead), where the evaluation function is the player's advancement compared to the other player
72-
* _ab_relative_advancement_: a computer that plays minimax with alpha-beta pruning (depth ~4), where the evaluation function is the player's advancement compared to the other player
76+
* _ab_relative_advancement_: a computer that plays minimax with alpha-beta pruning, where the evaluation function is the player's advancement compared to the other player
7377
* _mcts_advancement_: Monte Carlo tree search, where the evaluation function is the player's advancement compared to the other player
7478
* _mcts_rollout_: Monte Carlo tree search, where the evaluation function is determined by a random playout until the end of the game
7579
* _mcts_q_learning_: Monte Carlo tree search, where the evaluation function is determined by a lookup table
@@ -83,6 +87,18 @@ import squadro
8387
print(squadro.AVAILABLE_AGENTS)
8488
```
8589

90+
#### Benchmark
91+
92+
All the agents have been evaluated against each other under controlled conditions:
93+
- Max 3 sec per move
94+
- 100 games (exactly balanced across the four starting configurations—which color and who starts) except when a human is involved (only 5 games then)
95+
- Original grid (5 x 5)
96+
97+
Here is the comparison:
98+
99+
TODO
100+
101+
The deep Q-learning algorithm outperforms all other players, including the human (myself, an average player).
86102

87103
## Usage
88104

@@ -150,22 +166,23 @@ Here are the online pre-trained models:
150166
| Q-Learning | 2 | 18 kB |
151167
| Q-Learning | 3 | 6.2 MB |
152168
153-
| Agent | # pawns | # CNN layers | # blocks | # params | size |
154-
| --------------- | ------- | ------------ | -------- | -------- | ------ |
155-
| Deep Q-Learning | 3 | 64 | 4 | 380 k | 1.5 MB |
156-
| Deep Q-Learning | 4 | 128 | 6 | 1.8 M | 7.1 MB |
157-
| Deep Q-Learning | 5 | 128 | 6 | 1.8 M | 7.1 MB |
169+
| Agent | # pawns | # CNN layers | # res blocks | # params | size |
170+
| --------------- | ------- | ------------ | ------------ | -------- | ------ |
171+
| Deep Q-Learning | 3 | 64 | 4 | 380 k | 1.5 MB |
172+
| Deep Q-Learning | 4 | 128 | 6 | 1.8 M | 7.1 MB |
173+
| Deep Q-Learning | 5 | 128 | 6 | 1.8 M | 7.1 MB |
158174
159175
Those models are all very lightweight, making them convenient even for machines with limited resources and fast games.
160176
161177
To use those models, simply instantiate the corresponding agent **without** passing the `model_path` argument (this is how the package makes the distinction between loading an online model and creating a new model).
162178
163179
```python
164-
from squadro import MonteCarloDeepQLearningAgent, MonteCarloQLearningAgent
180+
import squadro
165181
166-
agent_ql = MonteCarloQLearningAgent() # Deep Q-Learning
167-
agent_dql = MonteCarloDeepQLearningAgent() # Q-Learning
182+
dql = squadro.MonteCarloDeepQLearningAgent() # Deep Q-Learning Opponent
183+
ql = squadro.MonteCarloQLearningAgent() # Q-Learning Opponent
168184
185+
squadro.GamePlay(agent_1=dql).run()
169186
```
170187
171188
### Training

0 commit comments

Comments
 (0)