RL Drone Surveillance Project - Development Log

Project Overview

Development of a reinforcement learning-based drone surveillance system for efficient area coverage and bandit detection in obstacle-rich environments.

Timeline and Development Phases

Phase 1: Basic Pygame Simulation (Initial)

Date: Early Development
Objective: Create foundational grid-based drone simulation

Implementation

10x10 grid environment with sprites (ground, obstacle, drone, bandit)
Multiple drones with random movement patterns
Basic obstacle avoidance
Coverage tracking with blue flags for visited areas
Red flags for bandit detection
Visual sensory field display (3x3 vision range)

Issues Identified

Random movement was inefficient
No learning mechanism
Drones often got stuck in local areas

Phase 2: Console Panel Integration

Date: Development Session 1
Objective: Improve visual debugging and information display

Implementation

Added console panel below game grid
Real-time statistics display (coverage, bandits detected, simulation status)
Message logging system with scrolling capability
Separation of game area from information display

Benefits

Better visual debugging
Non-intrusive information display
Enhanced user experience

Phase 3: Basic Q-Learning Integration

Date: Development Session 2
Objective: Introduce reinforcement learning for intelligent drone behavior

Implementation

Tabular Q-learning with state = (x, y) position
Action space: [Up, Down, Left, Right, Stay]
Reward structure:
- +1 for visiting new cells
- +10 for detecting bandits
- -10 for hitting obstacles
Epsilon-greedy exploration with decay

Issues Encountered

Problem: Drone repeatedly hit walls/obstacles
Problem: Drone stuck in repetitive movement loops
Problem: Insufficient exploration of right side of grid

Phase 4: Reward Engineering and Exploration Fixes

Date: Development Session 3
Objective: Address exploration and repetitive behavior issues

Fixes Applied

Enhanced reward structure:
- +2 reward for new cell visits (increased from +1)
- -1 penalty for revisiting already-covered areas
- Maintained +10 bonus for bandit detection
Improved epsilon scheduling:
- Initial epsilon: 1.0 (full exploration)
- Final epsilon: 0.05
- Decay rate: 0.995 per episode
Vision-based coverage:
- Drone covers entire 3x3 sensory field, not just current position
- Reward based on newly visible cells in sensory range

Partial Success

Reduced wall-hitting behavior
Improved exploration patterns
Still some repetitive behavior remained

Phase 5: Advanced State Representation

Date: Development Session 4
Objective: Solve persistent looping issues through better state encoding

Major Changes

Rich state representation:
- State = (x, y, tuple of 3x3 sensory field)
- Q-table changed from NumPy array to dictionary
- States include local environment context
Optimistic initialization:
- New states initialized with Q-values = 50 (optimistic)
- Encourages exploration of unseen state-action pairs
Legal action filtering:
- Prevented selection of actions leading to obstacles
- Eliminated wall-hitting behavior completely

Results

Significant reduction in repetitive behavior
More intelligent navigation around obstacles
Better coverage patterns

Phase 6: Training Optimization

Date: Development Session 5
Objective: Improve training efficiency and visualization

Implementation

Fast headless training: No rendering during learning phase
Selective visualization: Show only first 5, best 5, and last 5 episodes
Performance tracking: Episode statistics storage and analysis

Benefits

Dramatically reduced training time
Better insight into learning progression
Maintained visual debugging capabilities

Phase 7: Stuck Detection and Intervention System

Date: Current Session
Objective: Implement systematic solution for remaining stuck behaviors

New Features Implemented

Stuck Detection Algorithm:

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
Sprites		Sprites
01_RL_Implementation.py		01_RL_Implementation.py
02_RL_Implementation_revised.py		02_RL_Implementation_revised.py
03_RL_Implementation_sensory_field.py		03_RL_Implementation_sensory_field.py
04_RL_Implementation_limited_visualization.py		04_RL_Implementation_limited_visualization.py
05_RL_Implementation_antistuck_and_fuel_tracking.py		05_RL_Implementation_antistuck_and_fuel_tracking.py
06_simulationRandomPriorityFlagsForExplorationpy.py		06_simulationRandomPriorityFlagsForExplorationpy.py
07_banditInABoxCase.py		07_banditInABoxCase.py
08_simulationRandomMovement.py		08_simulationRandomMovement.py
GameEnvironmentSetupTrial.py		GameEnvironmentSetupTrial.py
LICENSE		LICENSE
README.md		README.md
RLIMplementationConsiderations.md		RLIMplementationConsiderations.md
RL_IplementationConsiderations.md		RL_IplementationConsiderations.md
dqn-swarm-sim.py		dqn-swarm-sim.py
simulationRandomMovement.txt		simulationRandomMovement.txt
swarm-sim.py		swarm-sim.py
swarm_sim_dqn.py		swarm_sim_dqn.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RL Drone Surveillance Project - Development Log

Project Overview

Timeline and Development Phases

Phase 1: Basic Pygame Simulation (Initial)

Implementation

Issues Identified

Phase 2: Console Panel Integration

Implementation

Benefits

Phase 3: Basic Q-Learning Integration

Implementation

Issues Encountered

Phase 4: Reward Engineering and Exploration Fixes

Fixes Applied

Partial Success

Phase 5: Advanced State Representation

Major Changes

Results

Phase 6: Training Optimization

Implementation

Benefits

Phase 7: Stuck Detection and Intervention System

New Features Implemented

About

Uh oh!

Releases

Packages

Languages

License

Prajwalkadam29/SentineRL

Folders and files

Latest commit

History

Repository files navigation

RL Drone Surveillance Project - Development Log

Project Overview

Timeline and Development Phases

Phase 1: Basic Pygame Simulation (Initial)

Implementation

Issues Identified

Phase 2: Console Panel Integration

Implementation

Benefits

Phase 3: Basic Q-Learning Integration

Implementation

Issues Encountered

Phase 4: Reward Engineering and Exploration Fixes

Fixes Applied

Partial Success

Phase 5: Advanced State Representation

Major Changes

Results

Phase 6: Training Optimization

Implementation

Benefits

Phase 7: Stuck Detection and Intervention System

New Features Implemented

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages