Skip to content

Q-learning is an off-policy temporal-difference control algorithm. It learns the value of the optimal action, independent of the action actually taken by the agent.

License

Notifications You must be signed in to change notification settings

shaheennabi/Q-Learning-Off-policy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Q-Learning from Scratch (Model-Free Reinforcement Learning)

This repository contains a from-scratch, modular implementation of Q-learning,
an off-policy, model-free reinforcement learning algorithm, implemented on a custom
GridWorld environment built without any pre-made RL libraries.

The focus of this project is algorithmic clarity, correct temporal logic, and the on-/off-policy distinction,
not performance optimization or framework usage.


Why this project

Many reinforcement learning examples:

  • rely on Gym or other pre-built environments
  • hide the learning loop behind abstractions
  • obscure the difference between behavior policies and target policies

This project does the opposite:

  • the environment is implemented manually
  • the Q-learning update is written explicitly
  • the behavior policy is separated from value learning
  • the training loop shows the full (s, a, r, s') transition logic

The goal is to understand off-policy, model-free control from first principles.


What Q-learning is (core idea)

Q-learning is an off-policy temporal-difference control algorithm.
It learns the value of the optimal action, independent of the action actually taken by the agent.

At each step, Q-learning updates the value of the current state–action pair using
the maximum action-value in the next state, assuming greedy behavior in the future.

This single detail is the defining difference between Q-learning and SARSA.


About

Q-learning is an off-policy temporal-difference control algorithm. It learns the value of the optimal action, independent of the action actually taken by the agent.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages