Q-Learning from Scratch (Model-Free Reinforcement Learning)

This repository contains a from-scratch, modular implementation of Q-learning,
an off-policy, model-free reinforcement learning algorithm, implemented on a custom
GridWorld environment built without any pre-made RL libraries.

The focus of this project is algorithmic clarity, correct temporal logic, and the on-/off-policy distinction,
not performance optimization or framework usage.

Why this project

Many reinforcement learning examples:

rely on Gym or other pre-built environments
hide the learning loop behind abstractions
obscure the difference between behavior policies and target policies

This project does the opposite:

the environment is implemented manually
the Q-learning update is written explicitly
the behavior policy is separated from value learning
the training loop shows the full (s, a, r, s') transition logic

The goal is to understand off-policy, model-free control from first principles.

What Q-learning is (core idea)

Q-learning is an off-policy temporal-difference control algorithm.
It learns the value of the optimal action, independent of the action actually taken by the agent.

At each step, Q-learning updates the value of the current state–action pair using
the maximum action-value in the next state, assuming greedy behavior in the future.

This single detail is the defining difference between Q-learning and SARSA.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
__pycache__		__pycache__
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agent.py		agent.py
config.py		config.py
env.py		env.py
main.py		main.py
policy.py		policy.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Q-Learning from Scratch (Model-Free Reinforcement Learning)

Why this project

What Q-learning is (core idea)

About

Uh oh!

Releases

Packages

Languages

License

shaheennabi/Q-Learning-Off-policy

Folders and files

Latest commit

History

Repository files navigation

Q-Learning from Scratch (Model-Free Reinforcement Learning)

Why this project

What Q-learning is (core idea)

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages