HokageM
diff --git a/‎README.md‎
Lines changed: 8 additions & 3 deletions b/‎README.md‎
Lines changed: 8 additions & 3 deletions
diff --git a/‎src/irlwpython/DiscreteMaxEntropyDeepIRL.py‎
Lines changed: 7 additions & 4 deletions b/‎src/irlwpython/DiscreteMaxEntropyDeepIRL.py‎
Lines changed: 7 additions & 4 deletions
diff --git a/‎src/irlwpython/expert_demo/expert_demo.p‎
-5.34 MB b/‎src/irlwpython/expert_demo/expert_demo.p‎
-5.34 MB
diff --git a/‎src/irlwpython/learning_curves/maxent_30000.png‎
2.97 KB b/‎src/irlwpython/learning_curves/maxent_30000.png‎
2.97 KB
diff --git a/‎src/irlwpython/learning_curves/maxent_30000_network.png‎
9.02 KB b/‎src/irlwpython/learning_curves/maxent_30000_network.png‎
9.02 KB
diff --git a/‎src/irlwpython/learning_curves/maxent_test_30000.png‎
357 Bytes b/‎src/irlwpython/learning_curves/maxent_test_30000.png‎
357 Bytes
diff --git a/‎src/irlwpython/results/maxent_30000_table.npy‎
0 Bytes b/‎src/irlwpython/results/maxent_30000_table.npy‎
0 Bytes
diff --git a/‎src/irlwpython/utils/utils.py‎
Lines changed: 0 additions & 46 deletions b/‎src/irlwpython/utils/utils.py‎
Lines changed: 0 additions & 46 deletions
diff --git a/‎src/irlwpython/utils/zfilter.py‎
Lines changed: 0 additions & 86 deletions b/‎src/irlwpython/utils/zfilter.py‎
Lines changed: 0 additions & 86 deletions
@@ -5,8 +5,9 @@
 Inverse Reinforcement Learning Algorithm implementation with python.
 
 Implemented Algorithms:
-- Maximum Entropy IRL
-- Maximum Entropy Deep IRL
+- Maximum Entropy IRL: [1]
+- Discrete Maximum Entropy Deep IRL: [2, 3]
+- IQ-Learn
 
 Experiment:
 - Mountaincar: [gym](https://www.gymlibrary.dev/environments/classic_control/mountain_car/)
@@ -16,7 +17,11 @@ The implementation of MaxEntropyIRL and MountainCar is based on the implementati
 
 # References
 
-...
+[1] [BD. Ziebart, et al., "Maximum Entropy Inverse Reinforcement Learning", AAAI 2008](https://cdn.aaai.org/AAAI/2008/AAAI08-227.pdf).
+
+[2] [Wulfmeier, et al., "Maximum entropy deep inverse reinforcement learning." arXiv preprint arXiv:1507.04888 (2015).](https://arxiv.org/abs/1507.04888)
+
+[3] [Xi-liang Chen, et al., "A Study of Continuous Maximum Entropy Deep Inverse Reinforcement Learning", Mathematical Problems in Engineering, vol. 2019, Article ID 4834516, 8 pages, 2019. https://doi.org/10.1155/2019/4834516](https://www.hindawi.com/journals/mpe/2019/4834516/)
 
 # Installation
 
 
@@ -136,11 +136,14 @@ def train(self):
                 score_avg = np.mean(scores)
                 print('{} episode score is {:.2f}'.format(episode, score_avg))
                 plt.plot(episodes, scores, 'b')
-                plt.savefig("./learning_curves/maxent_30000_network.png")
+                plt.savefig("./learning_curves/discretemaxentdeep_30000.png")
 
-        torch.save(self.q_network.state_dict(), "./results/maxent_30000_q_network.pth")
+        torch.save(self.actor_network.state_dict(), "./results/discretemaxentdeep_30000_actor.pth")
+        torch.save(self.critic_network.state_dict(), "./results/discretemaxentdeep_30000_critic.pth")
 
     def test(self):
+        assert 1 == 0  # TODO: not implemented yet
+
         episodes, scores = [], []
 
         for episode in range(10):
@@ -151,7 +154,7 @@ def test(self):
                 self.target.env_render()
                 state_tensor = torch.tensor(state, dtype=torch.float32).unsqueeze(0)
 
-                action = torch.argmax(self.q_network(state_tensor)).item()
+                action = torch.argmax(self.actor_network(state_tensor)).item()
                 next_state, reward, done, _, _ = self.target.env_step(action)
 
                 score += reward
@@ -161,7 +164,7 @@ def test(self):
                     scores.append(score)
                     episodes.append(episode)
                     plt.plot(episodes, scores, 'b')
-                    plt.savefig("./learning_curves/maxent_test_30000_network.png")
+                    plt.savefig("./learning_curves/discretemaxentdeep_test_30000.png")
                     break
 
             if episode % 1 == 0: