You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+43-39Lines changed: 43 additions & 39 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,13 +43,54 @@ If some dependencies release changes that break the code, you can install the pr
43
43
pip install -r requirements.txt
44
44
```
45
45
46
+
## Background
47
+
48
+
> [!NOTE]
49
+
> This section is highly technical; feel free to skip it and play with the code right away.
50
+
51
+
#### Agents
52
+
53
+
Most computer algorithms discretize the game into states and actions. Here, the state is the position of the pawns and the available actions are the possible moves of the pawns.
54
+
55
+
Squadro is a finite state machine, meaning that the next state of the game is completely determined by the current state and the action played. With this definition, one can see that the game is a Markov Decision Process (MDP). At each state, the current player can play different actions, which lead to different states. Then the next player can play different actions from any of those new states, etc. The future of the game can be represented as a tree, whose branches are the actions that lead to different states.
56
+
57
+
An algorithm can explore that space of possibilities to infer the best move to play now. As the tree is huge, it is not possible to explore all the possible paths until the end of the game. Typically, they explore only a small fraction of the tree and then use the information gathered from those states to make a decision. More precisely, those two phases are:
58
+
59
+
***State exploration**: exploring the space of states by a careful choice of actions. The most common exploration methods are Minimax and Monte Carlo Tree Search (MCTS). Minimax explores all the states up to a specific depth, while MCTS navigates until it finds a state that has not been visited yet. Minimax can be sped up by skipping the search in the parts of the tree that won't affect the final decision; this method is called alpha-beta pruning.
60
+
***State evaluation**: evaluating a state. If we have a basic understanding of the game and how to win, one can design a heuristic (state evaluation function) that gives an estimate of how good it is to be in that state / position. Otherwise, it can often be better to use a computer algorithm to evaluate the state.
61
+
* The simplest algorithm to estimate the state is to randomly let the game play until it is over (i.e., pick random actions for both players). When played enough times, it can give the probability to win in that state.
62
+
* More complex, and hence accurate, algorithms are using reinforcement learning (AI). They learn from experience by storing information about each state/action in one of:
63
+
* Q value function, a lookup table for each state and action;
64
+
* deep Q network (DQN), a neural network that approximates the Q value function, which is necessary when the state space is huge (i.e., cannot be stored in memory).
65
+
66
+
List of available agents:
67
+
68
+
*_human_: another local human player (i.e., both playing on the same computer)
69
+
*_random_: a computer that plays randomly among all available moves
70
+
*_ab_relative_advancement_: a computer that lists the possible moves from the current position and evaluates them directly (i.e., it "thinks" only one move ahead), where the evaluation function is the player's advancement
71
+
*_relative_advancement_: a computer that lists the possible moves from the current position and evaluates them directly (i.e., it "thinks" only one move ahead), where the evaluation function is the player's advancement compared to the other player
72
+
*_ab_relative_advancement_: a computer that plays minimax with alpha-beta pruning (depth ~4), where the evaluation function is the player's advancement compared to the other player
73
+
*_mcts_advancement_: Monte Carlo tree search, where the evaluation function is the player's advancement compared to the other player
74
+
*_mcts_rollout_: Monte Carlo tree search, where the evaluation function is determined by a random playout until the end of the game
75
+
*_mcts_q_learning_: Monte Carlo tree search, where the evaluation function is determined by a lookup table
76
+
*_mcts_deep_q_learning_: Monte Carlo tree search, where the evaluation function is determined by a convolutional neural network
77
+
78
+
You can also access the most updated list of available agents with:
79
+
80
+
```python
81
+
import squadro
82
+
83
+
print(squadro.AVAILABLE_AGENTS)
84
+
```
85
+
86
+
46
87
## Usage
47
88
48
-
This package can be used in the following ways:
89
+
This package can be used in many interesting ways. You can play the game, train an AI agent, run simulations, and analyze animations.
49
90
50
91
### Play
51
92
52
-
You can play against someone else or many different types of computer algorithms. See the [Agents](#Agents) section below for more details.
93
+
You can play against someone else or many different types of computer algorithms.
53
94
54
95
> [!TIP]
55
96
> If you run into the following error on a Linux machine when launching the game:
Most computer algorithms discretize the game into states and actions. Here, the state is the position of the pawns and the available actions are the possible moves of the pawns.
133
-
134
-
Squadro is a finite state machine, meaning that the next state of the game is completely determined by the current state and the action played. With this definition, one can see that the game is a Markov Decision Process (MDP). At each state, the current player can play different actions, which lead to different states. Then the next player can play different actions fromany of those new states, etc. The future of the game can be represented as a tree, whose branches are the actions that lead to different states.
135
-
136
-
An algorithm can explore that space of possibilities to infer the best move to play now. As the tree is huge, it isnot possible to explore all the possible paths until the end of the game. Typically, they explore only a small fraction of the tree and then use the information gathered from those states to make a decision. More precisely, those two phases are:
137
-
138
-
***State exploration**: exploring the space of states by a careful choice of actions. The most common exploration methods are Minimax and Monte Carlo Tree Search (MCTS). Minimax explores all the states up to a specific depth, whileMCTS navigates until it finds a state that has not been visited yet. Minimax can be sped up by skipping the search in the parts of the tree that won't affect the final decision; this method is called alpha-beta pruning.
139
-
***State evaluation**: evaluating a state. If we have a basic understanding of the game and how to win, one can design a heuristic (state evaluation function) that gives an estimate of how good it is to be in that state / position. Otherwise, it can often be better to use a computer algorithm to evaluate the state.
140
-
* The simplest algorithm to estimate the state is to randomly let the game play until it is over (i.e., pick random actions for both players). When played enough times, it can give the probability to win in that state.
141
-
* More complex, and hence accurate, algorithms are using reinforcement learning (AI). They learn from experience by storing information about each state/action in one of:
142
-
* Q value function, a lookup table for each state and action;
143
-
* deep Q network (DQN), a neural network that approximates the Q value function, which is necessary when the state space is huge (i.e., cannot be stored in memory).
144
-
145
-
List of available agents:
146
-
147
-
* _human_: another local human player (i.e., both playing on the same computer)
148
-
* _random_: a computer that plays randomly among all available moves
149
-
* _ab_relative_advancement_: a computer that lists the possible moves from the current position and evaluates them directly (i.e., it "thinks" only one move ahead), where the evaluation function is the player's advancement
150
-
* _relative_advancement_: a computer that lists the possible moves from the current position and evaluates them directly (i.e., it "thinks" only one move ahead), where the evaluation function is the player's advancement compared to the other player
151
-
* _ab_relative_advancement_: a computer that plays minimax with alpha-beta pruning (depth ~4), where the evaluation function is the player's advancement compared to the other player
152
-
* _mcts_advancement_: Monte Carlo tree search, where the evaluation function is the player's advancement compared to the other player
153
-
* _mcts_rollout_: Monte Carlo tree search, where the evaluation function is determined by a random playout until the end of the game
154
-
* _mcts_q_learning_: Monte Carlo tree search, where the evaluation function is determined by a lookup table
155
-
* _mcts_deep_q_learning_: Monte Carlo tree search, where the evaluation function is determined by a convolutional neural network
156
-
157
-
158
-
You can also access the most updated list of available agents with:
159
-
160
-
```python
161
-
import squadro
162
-
163
-
print(squadro.AVAILABLE_AGENTS)
164
-
```
165
-
166
-
167
171
### Training
168
172
169
173
One can train a model using reinforcement learning (RL) algorithms. Currently, Squadro supports two such algorithms:
0 commit comments