|
4 | 4 |
|
5 | 5 | Inverse Reinforcement Learning Algorithm implementation with python. |
6 | 6 |
|
7 | | -# Implemented Algorithms |
| 7 | +# Exploring Maximum Entropy Inverse Reinforcement Learning |
8 | 8 |
|
9 | | -## Maximum Entropy IRL: |
| 9 | +My seminar paper can be found in [paper](https://github.com/HokageM/IRLwPython/tree/main/paper), which is based on |
| 10 | +IRLwPython version 0.0.1 |
10 | 11 |
|
11 | | -Implementation of the Maximum Entropy inverse reinforcement learning algorithm from [1] and is based on the implementation |
12 | | -of [lets-do-irl](https://github.com/reinforcement-learning-kr/lets-do-irl/tree/master/mountaincar/maxent). |
13 | | -It is an IRL algorithm using Q-Learning with a Maximum Entropy update function. |
| 12 | +# Implemented Algorithms |
14 | 13 |
|
15 | | -## Maximum Entropy Deep IRL: |
| 14 | +## Maximum Entropy IRL (MEIRL): |
| 15 | +Implementation of the maximum entropy inverse reinforcement learning algorithm from [1] and is based on the implementation |
| 16 | +of [lets-do-irl](https://github.com/reinforcement-learning-kr/lets-do-irl/tree/master/mountaincar/maxent). |
| 17 | +It is an IRL algorithm using q-learning with a maximum entropy update function for the IRL reward estimation. |
| 18 | +The next action is selected based on the maximum of the q-values. |
16 | 19 |
|
17 | | -An implementation of the Maximum Entropy inverse reinforcement learning algorithm, which uses a neural-network for the |
| 20 | +## Maximum Entropy Deep IRL (MEDIRL: |
| 21 | +An implementation of the maximum entropy inverse reinforcement learning algorithm, which uses a neural-network for the |
18 | 22 | actor. |
19 | | -The estimated irl-reward is learned similar as in Maximum Entropy IRL. |
20 | | -It is an IRL algorithm using Deep Q-Learning with a Maximum Entropy update function. |
21 | | - |
22 | | -## Maximum Entropy Deep RL: |
23 | | - |
24 | | -An implementation of the Maximum Entropy reinforcement learning algorithm. |
25 | | -This algorithm is used to compare the IRL algorithms with an RL algorithm. |
| 23 | +The estimated irl-reward is learned similar as in MEIRL. |
| 24 | +It is an IRL algorithm using deep q-learning with a maximum entropy update function. |
| 25 | +The next action is selected based on an epsilon-greedy algorithm and the maximum of the q-values. |
| 26 | + |
| 27 | +## Maximum Entropy Deep RL (MEDRL): |
| 28 | +MEDRL is a RL implementation of the MEDIRL algorithm. |
| 29 | +This algorithm gets the real rewards directly from the environment, |
| 30 | +instead of estimating IRL rewards. |
| 31 | +The NN architecture and action selection is the same as in MEDIRL. |
26 | 32 |
|
27 | 33 | # Experiment |
28 | 34 |
|
|
0 commit comments