You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<br/>r – reward that is got in the current position,
27
-
<br/>γ – gamma (reward decay, discount factor),
28
-
<br/>s_ - next chosen position according to the next chosen action,
29
-
<br/>a_ - next chosen action.
30
-
31
-
The major component of the RL method is the table of weights - Q-table of the system state. Matrix Q is a set of all possible states of the system and the system response weights to different actions. During trying to go through the given environment, mobile robot learns how to avoid obstacles and find the path to the destination point. As a result, the Q-table is built. Looking at the values of the table it is possible to see the decision for the next action made by agent (mobile robot).
23
+
<br/><b>s</b> – current position of the agent,
24
+
<br/><b>a</b> – current action,
25
+
<br/><b>λ</b> – learning rate,
26
+
<br/><b>r</b> – reward that is got in the current position,
<br/><b>s_</b> - next chosen position according to the next chosen action,
29
+
<br/><b>a_</b> - next chosen action.
30
+
31
+
The major component of the RL method is the table of weights - <b>Q-table</b> of the system state. <b>Matrix Q</b> is a set of all possible states of the system and the system response weights to different actions. During trying to go through the given environment, mobile robot learns how to avoid obstacles and find the path to the destination point. As a result, the <b>Q-table</b> is built. Looking at the values of the table it is possible to see the decision for the next action made by agent (mobile robot).
32
32
33
33
<br/>Experimental results with different Environments sre shown and described below.
34
34
<br/>Code is supported with a lot of comments. It will guide you step by step through entire idea of implementation.
0 commit comments