|
4 | 4 |
|
5 | 5 | Inverse Reinforcement Learning Algorithm implementation with python. |
6 | 6 |
|
7 | | -# Exploring Maximum Entropy Inverse Reinforcement Learning |
8 | | - |
9 | | -My seminar paper can be found in [paper](https://github.com/HokageM/IRLwPython/tree/main/paper), which is based on |
10 | | -IRLwPython version 0.0.1 |
11 | | - |
12 | 7 | # Implemented Algorithms |
13 | 8 |
|
14 | | -## Maximum Entropy IRL (MEIRL): |
15 | | -Implementation of the maximum entropy inverse reinforcement learning algorithm from [1] and is based on the implementation |
| 9 | +## Maximum Entropy IRL: |
| 10 | + |
| 11 | +Implementation of the Maximum Entropy inverse reinforcement learning algorithm from [1] and is based on the implementation |
16 | 12 | of [lets-do-irl](https://github.com/reinforcement-learning-kr/lets-do-irl/tree/master/mountaincar/maxent). |
17 | | -It is an IRL algorithm using q-learning with a maximum entropy update function for the IRL reward estimation. |
18 | | -The next action is selected based on the maximum of the q-values. |
| 13 | +It is an IRL algorithm using Q-Learning with a Maximum Entropy update function. |
19 | 14 |
|
20 | | -## Maximum Entropy Deep IRL (MEDIRL: |
21 | | -An implementation of the maximum entropy inverse reinforcement learning algorithm, which uses a neural-network for the |
| 15 | +## Maximum Entropy Deep IRL: |
| 16 | + |
| 17 | +An implementation of the Maximum Entropy inverse reinforcement learning algorithm, which uses a neural-network for the |
22 | 18 | actor. |
23 | | -The estimated irl-reward is learned similar as in MEIRL. |
24 | | -It is an IRL algorithm using deep q-learning with a maximum entropy update function. |
25 | | -The next action is selected based on an epsilon-greedy algorithm and the maximum of the q-values. |
26 | | - |
27 | | -## Maximum Entropy Deep RL (MEDRL): |
28 | | -MEDRL is a RL implementation of the MEDIRL algorithm. |
29 | | -This algorithm gets the real rewards directly from the environment, |
30 | | -instead of estimating IRL rewards. |
31 | | -The NN architecture and action selection is the same as in MEDIRL. |
| 19 | +The estimated irl-reward is learned similar as in Maximum Entropy IRL. |
| 20 | +It is an IRL algorithm using Deep Q-Learning with a Maximum Entropy update function. |
| 21 | + |
| 22 | +## Maximum Entropy Deep RL: |
| 23 | + |
| 24 | +An implementation of the Maximum Entropy reinforcement learning algorithm. |
| 25 | +This algorithm is used to compare the IRL algorithms with an RL algorithm. |
32 | 26 |
|
33 | 27 | # Experiment |
34 | 28 |
|
|
0 commit comments