You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After training your AI, you will be able to play against her (see [Training](#Training) section)
98
+
After training your AI (see [Training](#Training) section), you will be able to play against her.
99
99
100
+
#### Play against an online pre-trained AI
100
101
101
-
[//]: # (TODO)
102
-
[//]: # (#### Play against a benchmarked AI)
103
102
104
-
[//]: # ()
105
-
[//]: # (If you do not want to train a model, as described in the [Training](#Training) section, you can still play against a benchmarked model available online. After passing `init_from='online'`, you can set `model_path` to any of those currently supported models:)
103
+
If you do not want to train a model, as described in the [Training](#Training) section, you can still play against a model that the developers have already trained for you. They are available on [Hugging Face](https://huggingface.co/martin-shark/squadro/tree/main)—no need to download them from the browser, as they will automatically be downloaded once you run the code below.
[//]: # (Note that the first time you use a model, it needs to be downloaded from the internet; so it can take a few minutes.)
118
+
Those models are all very lightweight, making them convenient even for machines with limited resources and fast games.
116
119
117
-
[//]: # ()
118
-
[//]: # (Example:)
120
+
To use those models, simply instantiate the corresponding agent **without** passing the `model_path` argument (this is how the package makes the distinction between loading an online model and creating a new model).
119
121
120
-
[//]: # ()
121
-
[//]: # (```python)
122
+
```python
123
+
from squadro import MonteCarloDeepQLearningAgent, MonteCarloQLearningAgent
122
124
123
-
[//]: # (...)
125
+
agent_ql= MonteCarloQLearningAgent() # Deep Q-Learning
@@ -223,7 +226,7 @@ Below is an example of good training metrics.
223
226
- The self-play win rate stays around 50%.
224
227
- The replay buffer samples remain diverse (above 80%).
225
228
- The policy and value losses slowly decrease.
226
-
- The win rate against its checkpoint is above 50% (checkpoint is replaced by the current model when win rate goes above 70%)
229
+
- The win rate against its checkpoint is above 50% (checkpoint is replaced by the current model when the win rate goes above 70%)
227
230
- The elo is smoothly increasing.
228
231
229
232
Note that in reinforcement learning, the loss isnot a key metrics to measure model improvement, as the training samples are constantly improving. Better metrics include the win rate against its checkpoint and elo. Once the latter metrics stabilize, the model has reached its peak.
0 commit comments