This project provides a dynamic, visual comparison of two classic Reinforcement Learning (RL) algorithms: SARSA (State-Action-Reward-State-Action) and Expected SARSA.
The core of the project is a custom-built, procedurally generated grid-world environment. Agents are trained to navigate from a starting quadrant to a goal in the opposite quadrant while avoiding procedurally placed obstacles. The project emphasizes the visualization of the learning process, offering both real-time training animations via OpenCV and final static path comparisons to analyze how different algorithms converge on a solution.
- Dual Algorithm Implementation: Direct comparison of On-Policy SARSA vs. Expected SARSA learning behaviors.
- Procedural Environment: A highly modular grid-world where grid size and wall density are configurable, ensuring no two training runs are exactly the same.
- Sophisticated Reward Shaping:
- Goal: High positive reward (+).
- Walls/Obstacles: Large negative penalty (-).
- Heuristics: Small proximity "nudges" (positive/negative) based on distance to the objective to guide exploration.
- OpenCV Visualization:
- Live Training: Watch the agent explore (Blue Block) and learn in real-time.
- Static Comparison: Automatically generates a side-by-side image of the optimal paths learned by both agents.
- Language: Python
- Libraries:
numpy: Matrix operations, Q-Table management, and grid logic.opencv-python: Real-time rendering and final image generation.random: Procedural generation logic for obstacles and start/end points.
The agent operates in a flexible grid (e.g., 10x10 or 20x20). To maximize path complexity, start and end points are generated in opposite corners. Walls are generated randomly based on a density parameter.
Both agents utilize a Q-Table to store state-action values, but they differ in how they update those values during training.
SARSA updates the Q-value based on the action actually taken by the current policy. The update rule is:
Where
Expected SARSA updates the Q-value based on the expected value of the next state, averaging over all possible actions weighted by their probability. This usually results in lower variance updates.
The visualization window uses the following color scheme:
- 🟦 Blue Block: The Agent
- 🟩 Green Block: The Goal
- ⬛ Black Blocks: Walls/Obstacles
- 🔴 Red/Green Lines: The optimal path taken during final evaluation.
git clone [git@github.com:dagaaryan011/GridRunner.git]
cd ReinforcentsEnsure you have Python installed, then run:
pip install numpy opencv-pythonTo start the training process and visualization:
python Expected_sarsa.py(Note: If your main script has a different name, replace Expected_sarsa.py with your filename).
Upon completion of the training episodes, the program will:
- Save the Q-Tables.
- Display a Comparison Image.
This image visualizes the path taken by the SARSA agent versus the path taken by the Expected SARSA agent, allowing for immediate visual analysis of which algorithm found a safer or shorter path.
Contributions are welcome! If you have ideas for new environments, different algorithms (like Q-Learning), or better visualization features:
- Fork the Project
- Create your Branch
- Commit your Changes (
git commit -m 'commit msg') - Push to the Branch (
git push origin branch_name) - Open a Pull Request