You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What could we do if, instead of coding programs from the ground up, we could just specify the rules for a task, the success criteria, and make AI learn to complete it?
13
18
14
19
Imagine the blessings humanity could unlock if we could automate all the tasks nobody wants to do, all the unsafe, unhealthy or uninspiring jobs. Or the ways Scientific progress could be sped up if big parts of the research process were accelerated.
@@ -115,18 +120,78 @@ Here are the relevant snippets, but you can also check the [GitHub Project](http
115
120
116
121
The lookup table:
117
122
118
-
{% raw %} <scriptsrc="https://gist.github.com/StrikingLoo/1a7abd14725455e9200ae2fa256e8a83.js"></script> {% endraw %}
123
+
{% raw %}
124
+
<pre><codeclass="language-python">
125
+
def q(state, action, value_dict):
126
+
result = value_dict.get(state, {}).get(action, 10)
127
+
return result
128
+
129
+
def update(state, action, value, value_dict):
130
+
state_values = value_dict.get(state, None)
131
+
if state_values:
132
+
state_values[action] = value
133
+
else:
134
+
value_dict[state] = {}
135
+
value_dict[state][action] = value
136
+
</pre></code>
137
+
{% endraw %}
119
138
120
139
The policy:
121
140
122
141
<divclass="wide-eighty">
123
-
{% raw %} <scriptsrc="https://gist.github.com/StrikingLoo/6055bf081061b74c3d71adaaf3155cdc.js"></script> {% endraw %}
142
+
{% raw %}
143
+
<pre><codeclass="language-python">
144
+
def policy(values, state, epsilon = 0.1):
145
+
best_action = None
146
+
best_value = float('-inf')
147
+
allowed = allowed_actions(state) #filter by possible actions. No bumping into walls.
148
+
random.shuffle(allowed) # shuffle to avoid bias
149
+
for action in allowed:
150
+
if q(state, action, values) > best_value:
151
+
best_value = q(state, action, values)
152
+
best_action = action
153
+
154
+
r_var = random.random()
155
+
if r_var < epsilon: #with probability epsilon...
156
+
best_action = random.choice(allowed) #choose at random
157
+
best_value = q(state, best_action, values)
158
+
159
+
return best_action, best_value
160
+
</pre></code>
161
+
{% endraw %}
124
162
</div>
125
163
126
164
The main loop (minus the prints and after cleaning). I used an *α* value of 0.5, and a *γ* of 0.9.
127
165
128
166
<divclass="wide-eighty">
129
-
{% raw %} <scriptsrc="https://gist.github.com/StrikingLoo/a73489773ca8c11028047d142b8024f2.js"></script> {% endraw %}
167
+
{% raw %}
168
+
<pre><codeclass="language-python">
169
+
for episoden in range(EPISODES):
170
+
env.reset() # Make player go back to square one
171
+
current_state = env.compressed_state_rep() # get unique representation of state
0 commit comments