Skip to content

Commit 4736757

Browse files
authored
Feat: MaxEntropyDeep (#10)
* initial iq learn * update Max entropy algorithms * version using max not softmax with working grad * working MaxEntropyDeepIRL * refactor * refactor demos * final MaxEntropyDeep
1 parent 43b964e commit 4736757

File tree

48 files changed

+625
-407
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

48 files changed

+625
-407
lines changed

README.md

Lines changed: 90 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -4,24 +4,101 @@
44

55
Inverse Reinforcement Learning Algorithm implementation with python.
66

7-
Implemented Algorithms:
8-
- Maximum Entropy IRL: [1]
9-
- Discrete Maximum Entropy Deep IRL: [2, 3]
10-
- IQ-Learn
7+
# Implemented Algorithms
118

12-
Experiment:
13-
- Mountaincar: [gym](https://www.gymlibrary.dev/environments/classic_control/mountain_car/)
9+
## Maximum Entropy IRL: [1]
1410

15-
The implementation of MaxEntropyIRL and MountainCar is based on the implementation of:
16-
[lets-do-irl](https://github.com/reinforcement-learning-kr/lets-do-irl/tree/master/mountaincar/maxent)
11+
## Maximum Entropy Deep IRL
1712

18-
# References
13+
# Experiments
1914

20-
[1] [BD. Ziebart, et al., "Maximum Entropy Inverse Reinforcement Learning", AAAI 2008](https://cdn.aaai.org/AAAI/2008/AAAI08-227.pdf).
15+
## Mountaincar-v0
16+
[gym](https://www.gymlibrary.dev/environments/classic_control/mountain_car/)
17+
18+
The expert demonstrations for the Mountaincar-v0 are the same as used in [lets-do-irl](https://github.com/reinforcement-learning-kr/lets-do-irl/tree/master/mountaincar/maxent).
19+
20+
*Heatmap of Expert demonstrations with 400 states*:
21+
22+
<img src="demo/heatmaps/expert_state_frequencies_mountaincar.png">
23+
24+
### Maximum Entropy Inverse Reinforcement Learning
25+
26+
IRL using Q-Learning with a Maximum Entropy update function.
27+
28+
#### Training
29+
30+
*Learner training for 29000 episodes*:
31+
32+
<img src="demo/learning_curves/leaner_maxent_29000_episodes.png">
33+
34+
#### Heatmaps
35+
36+
*Learner state frequencies after 1000 episodes*:
37+
38+
<img src="demo/heatmaps/learner_maxent_1000_episodes.png">
39+
40+
*Learner state frequencies after 29000 episodes*:
41+
42+
<img src="demo/heatmaps/leaner_maxent_29000_episodes.png">
43+
44+
*State rewards heatmap after 1000 episodes*:
45+
46+
<img src="demo/heatmaps/rewards_maxent_1000_episodes.png">
47+
48+
*State rewards heatmap after 29000 episodes*:
49+
50+
<img src="demo/heatmaps/rewards_maxent_29000_episodes.png">
51+
52+
#### Testing
53+
54+
*Testing results of the model after 29000 episodes*:
55+
56+
<img src="demo/test_results/test_maxent_29000_episodes.png">
2157

22-
[2] [Wulfmeier, et al., "Maximum entropy deep inverse reinforcement learning." arXiv preprint arXiv:1507.04888 (2015).](https://arxiv.org/abs/1507.04888)
2358

24-
[3] [Xi-liang Chen, et al., "A Study of Continuous Maximum Entropy Deep Inverse Reinforcement Learning", Mathematical Problems in Engineering, vol. 2019, Article ID 4834516, 8 pages, 2019. https://doi.org/10.1155/2019/4834516](https://www.hindawi.com/journals/mpe/2019/4834516/)
59+
### Deep Maximum Entropy Inverse Reinforcement Learning
60+
61+
IRL using Deep Q-Learning with a Maximum Entropy update function.
62+
63+
#### Training
64+
65+
*Learner training for 29000 episodes*:
66+
67+
<img src="demo/learning_curves/learner_maxentropy_deep_29000_episodes.png">
68+
69+
#### Heatmaps
70+
71+
*Learner state frequencies after 1000 episodes*:
72+
73+
<img src="demo/heatmaps/learner_maxentropydeep_1000_episodes.png">
74+
75+
*Learner state frequencies after 29000 episodes*:
76+
77+
<img src="demo/heatmaps/learner_maxentropydeep_29000_episodes.png">
78+
79+
*State rewards heatmap after 1000 episodes*:
80+
81+
<img src="demo/heatmaps/rewards_maxentropydeep_1000_episodes.png">
82+
83+
*State rewards heatmap after 29000 episodes*:
84+
85+
<img src="demo/heatmaps/rewards_maxentropydeep_29000_episodes.png">
86+
87+
#### Testing
88+
89+
*Testing results of the model after 29000 episodes*:
90+
91+
<img src="demo/test_results/test_maxentropydeep_best_model_results.png">
92+
93+
### Deep Maximum Entropy Inverse Reinforcement Learning with Critic
94+
95+
Coming soon...
96+
97+
# References
98+
The implementation of MaxEntropyIRL and MountainCar is based on the implementation of:
99+
[lets-do-irl](https://github.com/reinforcement-learning-kr/lets-do-irl/tree/master/mountaincar/maxent)
100+
101+
[1] [BD. Ziebart, et al., "Maximum Entropy Inverse Reinforcement Learning", AAAI 2008](https://cdn.aaai.org/AAAI/2008/AAAI08-227.pdf).
25102

26103
# Installation
27104

@@ -38,7 +115,7 @@ usage: irl [-h] [--version] [--training] [--testing] [--render] ALGORITHM
38115
Implementation of IRL algorithms
39116
40117
positional arguments:
41-
ALGORITHM Currently supported training algorithm: [max-entropy, discrete-max-entropy-deep]
118+
ALGORITHM Currently supported training algorithm: [max-entropy, max-entropy-deep]
42119
43120
options:
44121
-h, --help show this help message and exit
62.6 KB
Binary file not shown.
10.9 KB
Loading
11.7 KB
Loading
11.7 KB
Loading
11.7 KB
Loading
24.4 KB
Loading
11.5 KB
Loading
24.5 KB
Loading
11.6 KB
Loading

0 commit comments

Comments
 (0)