A reinforcement learning environment where an agent navigates a grid to reach a goal while avoiding patrolling enemies.
This project uses Three.js for rendering and Q-learning for the agent's decision-making.
- 15x17 Grid Environment: Includes walls, cover, a start point (S), and a goal (G).
- Reinforcement Learning Agent: Learns optimal paths using Q-learning while avoiding enemies.
- Dynamic Enemies: 5 enemies with 4-step back-and-forth patrol patterns.
- Reward System: Balanced rewards and penalties for progress, safety, and efficiency.
- Visualization: Real-time 3D rendering using Three.js.
-
Clone the repository:
git clone https://github.com/Bachkhairi/Stealth-simulator cd Stealth-simulator -
Install dependencies:
npm install
-
Run the application:
npm run dev
-
Open your browser and visit:
http://localhost:3000
- Start / Pause: Click Start to begin the simulation, and Pause to stop it.
- Reset: Click Reset to reposition the agent at the Start (S).
- Simulation Speed: Adjust with the slider (100ms to 2000ms per step).
- Q-Learning Parameters: Modify learning rate, discount factor, and epsilon via UI sliders.
- Export Metrics: Save simulation metrics by clicking Export Metrics as CSV.
- Line of Sight: Toggle enemy LOS display:
radius,line, ornone.
-
Grid:
- Size: 15x17
- Symbols:
W: WallC: CoverS: StartG: Goal
-
Enemies:
- Count: 5
- Patrol: 4-step loops
- Detection: Adjustable LOS radius
-
Q-Learning:
- Set in
GridWorld.js - Default parameters:
alpha: 0.5 epsilon: 0.5 gamma: 0.9
- Set in
All parameters below are adjustable and influence how the RL agent behaves and learns. You can tweak these in the UI or code to experiment with different strategies:
| Parameter | Description |
|---|---|
alpha (α) |
Learning rate (e.g., 0.5) – how quickly the agent updates Q-values |
gamma (γ) |
Discount factor (e.g., 0.8) – weights future rewards over immediate ones |
epsilon (ε) |
Exploration rate (e.g., 0.5) – balance between exploring vs exploiting |
epsilonDecay |
Decay rate (e.g., 0.999) – gradually reduces ε to favor learning over time |
minEpsilon |
Minimum ε (e.g., 0.01) – ensures some randomness always remains |
timePenalty |
Penalty per step (e.g., -0.1) – encourages efficiency |
forwardReward |
Reward for progress toward goal (e.g., 1) – motivates forward movement |
detectionPenalty |
Penalty for enemy detection (e.g., -10) – discourages unsafe actions |
enemyRadius |
Enemy vision range (e.g., 1.5 tiles) – affects difficulty of stealth |
stealthReward |
Reward for using cover (e.g., 0.1) – promotes strategic hiding |
coverStreakBonus |
Bonus for consecutive cover use (e.g., 0.1) – reinforces stealth behavior |
These allow you to strike a balance between aggressive, stealthy, or safe navigation behaviors.
- Tech Stack:
- Three.js — 3D rendering
- TWEEN.js — smooth animations
- Tailwind CSS — UI styling via CDN
