Skip to content

Commit 4c5e8a9

Browse files
committed
fix pre code bug, add alt texts
1 parent 9c5e0c9 commit 4c5e8a9

File tree

1 file changed

+13
-13
lines changed

1 file changed

+13
-13
lines changed

_posts/2022-09-21-reinforcement-learning-beginners.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ def update(state, action, value, value_dict):
133133
else:
134134
value_dict[state] = {}
135135
value_dict[state][action] = value
136-
</pre></code>
136+
</code></pre>
137137
{% endraw %}
138138

139139
The policy:
@@ -157,7 +157,7 @@ def policy(values, state, epsilon = 0.1):
157157
best_value = q(state, best_action, values)
158158

159159
return best_action, best_value
160-
</pre></code>
160+
</code></pre>
161161
{% endraw %}
162162
</div>
163163

@@ -190,7 +190,7 @@ for episoden in range(EPISODES):
190190
current_state = next_state
191191
action = next_action
192192
action_v = next_action_v
193-
</pre></code>
193+
</code></pre>
194194
{% endraw %}
195195
</div>
196196

@@ -201,44 +201,44 @@ I ran the code for multiple mazes, and was happy to see all of the results were
201201
Here is our agent solving a very simple maze: a wall running across the middle. The agent is the blue square, the goal -an apple- is the red one.
202202

203203
Before training:
204-
![](resources/post_image/first_iter_1.gif){: loading='lazy' style="width:30%"}
204+
![First iteration before training (the algorithm fails to solve the maze)](resources/post_image/first_iter_1.gif){: loading='lazy' style="width:30%"}
205205

206206
After training:
207-
![](resources/post_image/last_iter_1.gif){: loading='lazy' style="width:30%"}
207+
![Last iteration after training (algorithm solved the maze fast)](resources/post_image/last_iter_1.gif){: loading='lazy' style="width:30%"}
208208

209209

210210
For a more advanced challenge, I tried a hockey-stick shape, where it needs to go through a narrow passage. It actually took it less time to learn this pattern, I guess because it was more constrained in the possible movements it could make.
211211

212212
Before training:
213-
![](resources/post_image/first_iter_2.gif){: loading='lazy' style="width:30%"}
213+
![First iteration before training (the algorithm fails to solve the maze except by random walk)](resources/post_image/first_iter_2.gif){: loading='lazy' style="width:30%"}
214214

215215
After training:
216-
![](resources/post_image/last_iter_2.gif){: loading='lazy' style="width:30%"}
216+
![Last iteration after training (the algorithm solved the maze fast)](resources/post_image/last_iter_2.gif){: loading='lazy' style="width:30%"}
217217

218218
It performed similarly with a cross, even though in this case it had to back-pedal a bit.
219219

220220
Before training:
221-
![](resources/post_image/first_iter_3.gif){: loading='lazy' style="width:30%"}
221+
![First iteration before training (the algorithm fails to solve the cross shaped maze except by random walk)](resources/post_image/first_iter_3.gif){: loading='lazy' style="width:30%"}
222222

223223
After training:
224-
![](resources/post_image/last_iter_3.gif){: loading='lazy' style="width:30%"}
224+
![Last iteration after training (the algorithm solved the cross shaped maze fast)](resources/post_image/last_iter_3.gif){: loading='lazy' style="width:30%"}
225225

226226
Then I tried making it go through narrow passages, one way and the other. This one took a long time for the random agent to crack.
227227

228228
Before training:
229-
![](resources/post_image/first_iter_4.gif){: loading='lazy' style="width:30%"}
229+
![First iteration before training (the algorithm fails to solve the maze except by random walk)](resources/post_image/first_iter_4.gif){: loading='lazy' style="width:30%"}
230230

231231
After training:
232-
![](resources/post_image/last_iter_4.gif){: loading='lazy' style="width:30%"}
232+
![Last iteration after training (the algorithm solved the maze quickly)](resources/post_image/last_iter_4.gif){: loading='lazy' style="width:30%"}
233233

234234

235235
And finally, just to see it could learn anything: what if it had to go through a wall that divided the whole map in half, and then follow it closely back in the other direction?
236236

237237
Before training:
238-
![](resources/post_image/first_iter_5.gif){: loading='lazy' style="width:30%"}
238+
![First iteration before training (the algorithm fails to solve the trickiest maze except by random walk)](resources/post_image/first_iter_5.gif){: loading='lazy' style="width:30%"}
239239

240240
After training:
241-
![](resources/post_image/last_iter_5.gif){: loading='lazy' style="width:30%"}
241+
![Last iteration after training (the algorithm solved the trickiest maze quickly)](resources/post_image/last_iter_5.gif){: loading='lazy' style="width:30%"}
242242

243243
In conclusion, this maze solver is a-mazing!
244244

0 commit comments

Comments
 (0)