Skip to content

Prajwalkadam29/SentineRL

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RL Drone Surveillance Project - Development Log

Project Overview

Development of a reinforcement learning-based drone surveillance system for efficient area coverage and bandit detection in obstacle-rich environments.

Timeline and Development Phases

Phase 1: Basic Pygame Simulation (Initial)

Date: Early Development
Objective: Create foundational grid-based drone simulation

Implementation

  • 10x10 grid environment with sprites (ground, obstacle, drone, bandit)
  • Multiple drones with random movement patterns
  • Basic obstacle avoidance
  • Coverage tracking with blue flags for visited areas
  • Red flags for bandit detection
  • Visual sensory field display (3x3 vision range)

Issues Identified

  • Random movement was inefficient
  • No learning mechanism
  • Drones often got stuck in local areas

Phase 2: Console Panel Integration

Date: Development Session 1
Objective: Improve visual debugging and information display

Implementation

  • Added console panel below game grid
  • Real-time statistics display (coverage, bandits detected, simulation status)
  • Message logging system with scrolling capability
  • Separation of game area from information display

Benefits

  • Better visual debugging
  • Non-intrusive information display
  • Enhanced user experience

Phase 3: Basic Q-Learning Integration

Date: Development Session 2
Objective: Introduce reinforcement learning for intelligent drone behavior

Implementation

  • Tabular Q-learning with state = (x, y) position
  • Action space: [Up, Down, Left, Right, Stay]
  • Reward structure:
    • +1 for visiting new cells
    • +10 for detecting bandits
    • -10 for hitting obstacles
  • Epsilon-greedy exploration with decay

Issues Encountered

  • Problem: Drone repeatedly hit walls/obstacles
  • Problem: Drone stuck in repetitive movement loops
  • Problem: Insufficient exploration of right side of grid

Phase 4: Reward Engineering and Exploration Fixes

Date: Development Session 3
Objective: Address exploration and repetitive behavior issues

Fixes Applied

  1. Enhanced reward structure:

    • +2 reward for new cell visits (increased from +1)
    • -1 penalty for revisiting already-covered areas
    • Maintained +10 bonus for bandit detection
  2. Improved epsilon scheduling:

    • Initial epsilon: 1.0 (full exploration)
    • Final epsilon: 0.05
    • Decay rate: 0.995 per episode
  3. Vision-based coverage:

    • Drone covers entire 3x3 sensory field, not just current position
    • Reward based on newly visible cells in sensory range

Partial Success

  • Reduced wall-hitting behavior
  • Improved exploration patterns
  • Still some repetitive behavior remained

Phase 5: Advanced State Representation

Date: Development Session 4
Objective: Solve persistent looping issues through better state encoding

Major Changes

  1. Rich state representation:

    • State = (x, y, tuple of 3x3 sensory field)
    • Q-table changed from NumPy array to dictionary
    • States include local environment context
  2. Optimistic initialization:

    • New states initialized with Q-values = 50 (optimistic)
    • Encourages exploration of unseen state-action pairs
  3. Legal action filtering:

    • Prevented selection of actions leading to obstacles
    • Eliminated wall-hitting behavior completely

Results

  • Significant reduction in repetitive behavior
  • More intelligent navigation around obstacles
  • Better coverage patterns

Phase 6: Training Optimization

Date: Development Session 5
Objective: Improve training efficiency and visualization

Implementation

  • Fast headless training: No rendering during learning phase
  • Selective visualization: Show only first 5, best 5, and last 5 episodes
  • Performance tracking: Episode statistics storage and analysis

Benefits

  • Dramatically reduced training time
  • Better insight into learning progression
  • Maintained visual debugging capabilities

Phase 7: Stuck Detection and Intervention System

Date: Current Session
Objective: Implement systematic solution for remaining stuck behaviors

New Features Implemented

  1. Stuck Detection Algorithm:

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%