In this project, I implement Deep Q-Networks and their training procedure, the Q-Learning algorithm.
In particular, I conduct experiments to explore the Q-learning algorithm deeply and understand how policy gradients algorithm compares to it.
Here is a list of experiments I conduct:
- Dependence on the discount factor gamma
- Policy gradients vs Q-Learning