Skip to content

Commit 9a0fc58

Browse files
authored
Update README.md
Added some instructions for the demo.
1 parent fc23f12 commit 9a0fc58

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

README.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,3 +71,14 @@ export default defineConfig([
7171
},
7272
])
7373
```
74+
75+
## Some Instructions
76+
77+
The goal of the demo is to get the agent (black dot) to reach the goal cell in the smallest number of steps consistent with the grid world, which will show up in the middle column of the interface as bright arrows along a certain path that the agent always follows, and on the rightmost column in the reward plots as increasing lines (or flat with relatively large values).
78+
79+
The sliders in the far left column control the environment parameters and all but the "step cost" slider are self-explanatory. The "step cost" slider allows you to add a penalty to the reward the agent sees per episode to encourage them to take less steps. The sliders in the middle column control the agent parameters.
80+
81+
- Memory damping slider: determines how easily the agent remembers their past actions, with a high value indicating that the agent will quickly forget what they did in past episodes.
82+
- Reward coupling slider: controls by what factor the agent feels the reward, with higher values magnifying the reward they receive.
83+
- Glow decay slider: controls the strength of past rewards, so a small value means that all past rewards have equal strength.
84+
- Exploration and Temperature parameter sliders: determine how often the agent will act based on what they learned versus randomly, with larger values of each representing more random movements. The difference between the two is that "exploration" is based on a probability to use what they learned or act randomly, whereas "temperature parameter" just temporarily scrambles what they learned into noise if it is large.

0 commit comments

Comments
 (0)