Skip to content

Commit c29dcae

Browse files
committed
Add Play against an online pre-trained AI
1 parent 0acf709 commit c29dcae

File tree

1 file changed

+21
-18
lines changed

1 file changed

+21
-18
lines changed

README.md

Lines changed: 21 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -95,34 +95,37 @@ squadro.GamePlay(agent_1='random').run()
9595
9696
#### Play against your trained AI
9797
98-
After training your AI, you will be able to play against her (see [Training](#Training) section)
98+
After training your AI (see [Training](#Training) section), you will be able to play against her.
9999
100+
#### Play against an online pre-trained AI
100101
101-
[//]: # (TODO)
102-
[//]: # (#### Play against a benchmarked AI)
103102
104-
[//]: # ()
105-
[//]: # (If you do not want to train a model, as described in the [Training](#Training) section, you can still play against a benchmarked model available online. After passing `init_from='online'`, you can set `model_path` to any of those currently supported models:)
103+
If you do not want to train a model, as described in the [Training](#Training) section, you can still play against a model that the developers have already trained for you. They are available on [Hugging Face](https://huggingface.co/martin-shark/squadro/tree/main)—no need to download them from the browser, as they will automatically be downloaded once you run the code below.
106104
107-
[//]: # ()
108-
[//]: # (| `model_path` | # layers | # heads | embed dims | # params | size |)
105+
Here are the online pre-trained models:
109106
110-
[//]: # (|--------------|----------|---------|------------|----------|--------|)
107+
| Agent | # pawns | size |
108+
| ---------- | ------- | ------ |
109+
| Q-Learning | 2 | 18 kB |
110+
| Q-Learning | 3 | 6.2 MB |
111111
112-
[//]: # (| `...` | 12 | 12 | 768 | 124M | 500 MB |)
112+
| Agent | # pawns | # CNN layers | # blocks | # params | size |
113+
| --------------- | ------- | ------------ | -------- | -------- | ------ |
114+
| Deep Q-Learning | 3 | 64 | 4 | 380 k | 1.5 MB |
115+
| Deep Q-Learning | 4 | 128 | 6 | 1.8 M | 7.1 MB |
116+
| Deep Q-Learning | 5 | 128 | 6 | 1.8 M | 7.1 MB |
113117
114-
[//]: # ()
115-
[//]: # (Note that the first time you use a model, it needs to be downloaded from the internet; so it can take a few minutes.)
118+
Those models are all very lightweight, making them convenient even for machines with limited resources and fast games.
116119
117-
[//]: # ()
118-
[//]: # (Example:)
120+
To use those models, simply instantiate the corresponding agent **without** passing the `model_path` argument (this is how the package makes the distinction between loading an online model and creating a new model).
119121
120-
[//]: # ()
121-
[//]: # (```python)
122+
```python
123+
from squadro import MonteCarloDeepQLearningAgent, MonteCarloQLearningAgent
122124
123-
[//]: # (...)
125+
agent_ql = MonteCarloQLearningAgent() # Deep Q-Learning
126+
agent_dql = MonteCarloDeepQLearningAgent() # Q-Learning
124127
125-
[//]: # (```)
128+
```
126129
127130
#### Agents
128131
@@ -223,7 +226,7 @@ Below is an example of good training metrics.
223226
- The self-play win rate stays around 50%.
224227
- The replay buffer samples remain diverse (above 80%).
225228
- The policy and value losses slowly decrease.
226-
- The win rate against its checkpoint is above 50% (checkpoint is replaced by the current model when win rate goes above 70%)
229+
- The win rate against its checkpoint is above 50% (checkpoint is replaced by the current model when the win rate goes above 70%)
227230
- The elo is smoothly increasing.
228231
229232
Note that in reinforcement learning, the loss is not a key metrics to measure model improvement, as the training samples are constantly improving. Better metrics include the win rate against its checkpoint and elo. Once the latter metrics stabilize, the model has reached its peak.

0 commit comments

Comments
 (0)