Skip to content

Commit 9c5e0c9

Browse files
committed
gist -> hljs in RL post
1 parent 24be359 commit 9c5e0c9

File tree

2 files changed

+78
-4
lines changed

2 files changed

+78
-4
lines changed

_posts/2022-09-06-ant-colony-optimization-tsp.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -272,5 +272,4 @@ I would like to try Ant Colony Optimization for problems other than TSP in the f
272272
});
273273
});
274274
</script>
275-
276275
{% endraw %}

_posts/2022-09-21-reinforcement-learning-beginners.md

Lines changed: 78 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,11 @@ importance: 8
99
sitemap: true
1010
---
1111

12+
{% raw %}
13+
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/styles/default.min.css">
14+
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/highlight.min.js"></script>
15+
{% endraw %}
16+
1217
What could we do if, instead of coding programs from the ground up, we could just specify the rules for a task, the success criteria, and make AI learn to complete it?
1318

1419
Imagine the blessings humanity could unlock if we could automate all the tasks nobody wants to do, all the unsafe, unhealthy or uninspiring jobs. Or the ways Scientific progress could be sped up if big parts of the research process were accelerated.
@@ -115,18 +120,78 @@ Here are the relevant snippets, but you can also check the [GitHub Project](http
115120

116121
The lookup table:
117122

118-
{% raw %} <script src="https://gist.github.com/StrikingLoo/1a7abd14725455e9200ae2fa256e8a83.js"></script> {% endraw %}
123+
{% raw %}
124+
<pre><code class="language-python">
125+
def q(state, action, value_dict):
126+
result = value_dict.get(state, {}).get(action, 10)
127+
return result
128+
129+
def update(state, action, value, value_dict):
130+
state_values = value_dict.get(state, None)
131+
if state_values:
132+
state_values[action] = value
133+
else:
134+
value_dict[state] = {}
135+
value_dict[state][action] = value
136+
</pre></code>
137+
{% endraw %}
119138

120139
The policy:
121140

122141
<div class="wide-eighty">
123-
{% raw %} <script src="https://gist.github.com/StrikingLoo/6055bf081061b74c3d71adaaf3155cdc.js"></script> {% endraw %}
142+
{% raw %}
143+
<pre><code class="language-python">
144+
def policy(values, state, epsilon = 0.1):
145+
best_action = None
146+
best_value = float('-inf')
147+
allowed = allowed_actions(state) #filter by possible actions. No bumping into walls.
148+
random.shuffle(allowed) # shuffle to avoid bias
149+
for action in allowed:
150+
if q(state, action, values) > best_value:
151+
best_value = q(state, action, values)
152+
best_action = action
153+
154+
r_var = random.random()
155+
if r_var < epsilon: #with probability epsilon...
156+
best_action = random.choice(allowed) #choose at random
157+
best_value = q(state, best_action, values)
158+
159+
return best_action, best_value
160+
</pre></code>
161+
{% endraw %}
124162
</div>
125163

126164
The main loop (minus the prints and after cleaning). I used an *α* value of 0.5, and a *γ* of 0.9.
127165

128166
<div class="wide-eighty">
129-
{% raw %} <script src="https://gist.github.com/StrikingLoo/a73489773ca8c11028047d142b8024f2.js"></script> {% endraw %}
167+
{% raw %}
168+
<pre><code class="language-python">
169+
for episoden in range(EPISODES):
170+
env.reset() # Make player go back to square one
171+
current_state = env.compressed_state_rep() # get unique representation of state
172+
use_epsilon = 0.1
173+
if episoden > 200:
174+
use_epsilon = 0.0
175+
action, action_v = policy(values, current_state, epsilon = use_epsilon)
176+
177+
while (not env.over):
178+
reward = env.move(action)
179+
total_reward += reward
180+
next_state = env.compressed_state_rep()
181+
182+
next_action, next_action_v = policy(values, next_state, epsilon = use_epsilon) #obtain action from policy
183+
184+
if env.over: #one can only win.
185+
next_action_v = 100
186+
187+
delta = next_action_v*GAMMA + reward - action_v #sarsa update rule
188+
new_value = action_v + delta*alpha #apply update
189+
update(current_state, action, new_value, values)
190+
current_state = next_state
191+
action = next_action
192+
action_v = next_action_v
193+
</pre></code>
194+
{% endraw %}
130195
</div>
131196

132197
I polished them a bit compared with the ones on GitHub, for clarity's sake.
@@ -219,3 +284,13 @@ From the wiki:
219284

220285

221286
_If this post was useful or interesting, please share it on social media._
287+
288+
{% raw %}
289+
<script>
290+
window.addEventListener('load', (event) => {
291+
document.querySelectorAll('pre code').forEach((el) => {
292+
hljs.highlightElement(el);
293+
});
294+
});
295+
</script>
296+
{% endraw %}

0 commit comments

Comments
 (0)