|
4 | 4 |
|
5 | 5 | Inverse Reinforcement Learning Algorithm implementation with python. |
6 | 6 |
|
| 7 | +# Exploring Maximum Entropy Inverse Reinforcement Learning |
| 8 | + |
| 9 | +My seminar paper can be found in [paper](https://github.com/HokageM/IRLwPython/tree/main/paper), which is based on |
| 10 | +IRLwPython version 0.0.1 |
| 11 | + |
7 | 12 | # Implemented Algorithms |
8 | 13 |
|
9 | 14 | ## Maximum Entropy IRL: |
10 | 15 |
|
11 | | -Implementation of the Maximum Entropy inverse reinforcement learning algorithm from [1] and is based on the implementation |
| 16 | +Implementation of the Maximum Entropy inverse reinforcement learning algorithm from [1] and is based on the |
| 17 | +implementation |
12 | 18 | of [lets-do-irl](https://github.com/reinforcement-learning-kr/lets-do-irl/tree/master/mountaincar/maxent). |
13 | 19 | It is an IRL algorithm using Q-Learning with a Maximum Entropy update function. |
14 | 20 |
|
15 | | -## Maximum Entropy Deep IRL: |
| 21 | +## Maximum Entropy IRL (MEIRL): |
| 22 | + |
| 23 | +Implementation of the maximum entropy inverse reinforcement learning algorithm from [1] and is based on the |
| 24 | +implementation |
| 25 | +of [lets-do-irl](https://github.com/reinforcement-learning-kr/lets-do-irl/tree/master/mountaincar/maxent). |
| 26 | +It is an IRL algorithm using q-learning with a maximum entropy update function for the IRL reward estimation. |
| 27 | +The next action is selected based on the maximum of the q-values. |
| 28 | + |
| 29 | +## Maximum Entropy Deep IRL (MEDIRL: |
16 | 30 |
|
17 | | -An implementation of the Maximum Entropy inverse reinforcement learning algorithm, which uses a neural-network for the |
18 | | -actor. |
19 | | -The estimated irl-reward is learned similar as in Maximum Entropy IRL. |
20 | | -It is an IRL algorithm using Deep Q-Learning with a Maximum Entropy update function. |
| 31 | +An implementation of the maximum entropy inverse reinforcement learning algorithm, which uses a neural-network for the |
| 32 | +actor. |
| 33 | +The estimated irl-reward is learned similar as in MEIRL. |
| 34 | +It is an IRL algorithm using deep q-learning with a maximum entropy update function. |
| 35 | +The next action is selected based on an epsilon-greedy algorithm and the maximum of the q-values. |
21 | 36 |
|
22 | | -## Maximum Entropy Deep RL: |
| 37 | +## Maximum Entropy Deep RL (MEDRL): |
23 | 38 |
|
24 | | -An implementation of the Maximum Entropy reinforcement learning algorithm. |
25 | | -This algorithm is used to compare the IRL algorithms with an RL algorithm. |
| 39 | +MEDRL is a RL implementation of the MEDIRL algorithm. |
| 40 | +This algorithm gets the real rewards directly from the environment, |
| 41 | +instead of estimating IRL rewards. |
| 42 | +The NN architecture and action selection is the same as in MEDIRL. |
26 | 43 |
|
27 | 44 | # Experiment |
28 | 45 |
|
|
0 commit comments