Skip to content

A deep reinforcement learning system for optimizing bridge maintenance decisions across municipal infrastructure fleets, implementing cross-subsidy budget sharing and cooperative multi-agent learning.

Notifications You must be signed in to change notification settings

tk-yasuno/dql-bridge-maintenance

Repository files navigation

Bridge Fleet Management with Deep Reinforcement Learning

Python 3.12+ PyTorch CUDA License: MIT

A deep reinforcement learning system for optimizing bridge maintenance decisions across municipal infrastructure fleets, implementing cross-subsidy budget sharing and cooperative multi-agent learning.

Overview

This project applies Deep Q-Networks (DQN) to the complex problem of managing 100+ bridge assets under budget constraints. The system learns optimal maintenance policies that balance infrastructure health, lifecycle costs, and inter-departmental cooperation.

Key Features

  • Multi-Agent Coordination: Urban (20 bridges) and Rural (80 bridges) agents cooperate
  • Cross-Subsidy Budget Sharing: Flexible budget reallocation between departments
  • Unified Municipality Reward: Agents optimize shared objectives, not individual metrics
  • Realistic Constraints: Budget deficits, no carryover, diverse bridge ages (20-50 years)
  • GPU Acceleration: CUDA-enabled training for 2-5x speedup
  • Comprehensive Experiments: 4 budget scenarios from surplus to extreme deficit

System Architecture

graph TB
    subgraph Municipality["Municipality Management System"]
        Budget["Unified Budget Pool<br/>$18k-$70k/year"]
        
        subgraph Urban["Urban Agent"]
            U_State["State: 81D<br/>20 bridges x 4 features"]
            U_DQN["DQN Network<br/>81→512→1024→512→100"]
            U_Action["Actions: 20 x 5<br/>None/Light/Medium/Major/Replace"]
        end
        
        subgraph Rural["Rural Agent"]
            R_State["State: 10D<br/>Fleet statistics + budget"]
            R_DQN["DQN Network<br/>10→256→512→256→8"]
            R_Action["Strategies: 8<br/>Reactive/Preventive/Adaptive"]
        end
        
        Budget -->|"60%"| U_State
        Budget -->|"40%"| R_State
        
        U_State --> U_DQN
        U_DQN --> U_Action
        U_Action --> Env["Environment<br/>100 Bridges"]
        
        R_State --> R_DQN
        R_DQN --> R_Action
        R_Action --> Env
        
        Env --> Reward["Municipality Reward<br/>Urban + Rural"]
        Reward -->|"Cooperative Bonus<br/>+10% if both > 0"| U_DQN
        Reward --> R_DQN
    end
    
    style Budget fill:#FFD700
    style Reward fill:#90EE90
    style Env fill:#87CEEB
Loading

DQN Learning Flow

The complete training pipeline with experience replay and target networks:

graph TB
    Start([Start Training]) --> Init[Initialize Environment<br/>and Agent Networks]
    Init --> Episode[Start Episode t]
    
    Episode --> Reset["Reset Environment<br/>Budget, Bridge States"]
    Reset --> Year[Year i = 1]
    
    Year --> GetState["Get State<br/>Urban: 81D | Rural: 10D"]
    GetState --> Explore{"ε-greedy<br/>Exploration?"}
    
    Explore -->|"rand < ε"| Random[Select Random<br/>Actions/Strategy]
    Explore -->|"rand ≥ ε"| Greedy["Select Greedy<br/>a = argmax Q(s)"]
    
    Random --> Execute[Execute Actions]
    Greedy --> Execute
    
    Execute --> ApplyActions["Apply Maintenance<br/>Update States, Spend Budget"]
    ApplyActions --> Degrade[Natural Degradation<br/>Age += 1]
    Degrade --> ComputeReward["Compute Rewards<br/>Urban & Rural"]
    
    ComputeReward --> Unify["Unified Reward<br/>R_muni = R_urban + R_rural"]
    Unify --> CoopCheck{"Both agents<br/>R > 0?"}
    CoopCheck -->|Yes| Bonus["Add Bonus<br/>R_muni += 0.1 × R_muni"]
    CoopCheck -->|No| Store
    Bonus --> Store["Store Experience<br/>(s, a, R_muni, s', done)"]
    
    Store --> BufferCheck{"Buffer Size<br/>≥ Batch?"}
    BufferCheck -->|No| NextYear
    BufferCheck -->|Yes| Sample[Sample Minibatch]
    
    Sample --> ComputeQ["Compute Q(s,a)<br/>from Policy Network"]
    ComputeQ --> ComputeTarget["Compute Target<br/>y = R + γ·max Q_target(s')"]
    ComputeTarget --> Loss["MSE Loss<br/>L = (Q - y)²"]
    Loss --> Backprop["Backpropagation<br/>Update θ_policy"]
    Backprop --> Clip[Gradient Clipping<br/>clip_norm = 10.0]
    Clip --> SyncCheck{"Step mod<br/>sync_freq = 0?"}
    
    SyncCheck -->|Yes| Sync["Sync Target Network<br/>θ_target ← θ_policy"]
    SyncCheck -->|No| NextYear
    Sync --> NextYear
    
    NextYear --> YearCheck{"Year i<br/>= 30?"}
    YearCheck -->|No| Year
    YearCheck -->|Yes| Decay["Decay ε<br/>ε = max(0.05, ε - decay)"]
    
    Decay --> EpisodeCheck{"Episode t<br/>= max_episodes?"}
    EpisodeCheck -->|No| Episode
    EpisodeCheck -->|Yes| Save[Save Models<br/>& Training Stats]
    
    Save --> End([Training Complete])
    
    style Start fill:#90EE90
    style End fill:#FFB6C1
    style Execute fill:#87CEEB
    style Backprop fill:#FFA07A
    style Bonus fill:#FFD700
Loading

Version History & Evolution

v0.1 - Single Bridge MVP (2024)

  • Goal: Proof of concept for DQN on bridge maintenance
  • Features:
    • 3-state MDP (Good/Fair/Poor)
    • 6 discrete actions based on NBI work types
    • Single bridge, 30-year horizon
    • GPU acceleration support
  • Key Learning: Successfully demonstrated DQN convergence

v0.2-0.3 - Enhanced Single Bridge (2024)

  • Improvements:
    • Refined reward function balancing health vs cost
    • Better hyperparameter tuning
    • Monte Carlo validation framework
    • Baseline comparisons (do-nothing, reactive)
  • Key Learning: Importance of reward engineering

v0.4 - Fleet Management (2025)

  • Scale-up: 1 bridge → 100 bridges
  • Architecture:
    • Urban Agent: 20 high-traffic bridges (individual management)
    • Rural Agent: 80 low-traffic bridges (strategy-based)
  • Challenges:
    • Urban bridges degraded continuously
    • Budget allocation inflexible
    • Agents competed for resources
  • Key Learning: Need for departmental cooperation

v0.5 - Cooperative Learning (2025-12-05)

  • Major Redesign: Cross-subsidy + unified rewards

  • New Features:

    1. Cross-Subsidy Budget Sharing

      • Unified budget pool with flexible allocation
      • Up to 30% budget transfer between departments
    2. Unified Municipality Reward

      • Both agents optimize same objective
      • 10% cooperative bonus when both succeed
    3. Diverse Bridge Ages

      • Realistic 20-50 year distribution
      • Age-dependent initialization
    4. Enhanced Urban Penalties

      • Degradation penalty: 10.0
      • Critical penalty: 30.0 (state < 6)
    5. Expanded Rural Strategies

      • Added: Preventive, Balanced, Adaptive
      • Total: 8 strategic options
  • Results: +7.1% total reward improvement

v0.5.1 - Budget Realism Experiments (2025-12-06)

Systematic exploration of budget constraints to find system limits:

Experiment Budget Deficit Episodes Goal
Test $70k 0% 100 Initial validation
Long $70k 0% 2000 Long-term learning
Lack $56k 20% 2000 Realistic shortage
Alert $37k 47% 2000 Critical deficit
Empty $18k 74% 2000 System collapse

Key Findings:

  • Long: 40.2% budget usage → unrealistic surplus
  • Lack: 60.5% usage, 89% cooperation → functional
  • Alert: [Running] Testing cooperation threshold
  • Empty: 64.0% usage, 1% cooperation → complete breakdown

Critical Threshold: System remains functional down to 20% deficit, collapses at 74% deficit.

Installation

Requirements

  • Python 3.12+
  • PyTorch 2.6+ (with CUDA 12.4+ for GPU)
  • NumPy, Matplotlib

Setup

# Clone repository
git clone https://github.com/yourusername/dql-bridge-maintenance.git
cd dql-bridge-maintenance

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install numpy matplotlib pyyaml

Usage

Training

# Quick test (100 episodes)
python train_fleet_v05.py --episodes 100 --output outputs_v051_test

# Long-term training (2000 episodes)
python train_fleet_v05.py --episodes 2000 --output outputs_v051_long

# Budget deficit scenario
python train_fleet_v05.py --episodes 2000 --output outputs_v051_lack

# CPU mode (no GPU)
python train_fleet_v05.py --episodes 100 --device cpu

Visualization

# Training curves
python visualize_fleet_v05.py --checkpoint outputs_v051_test/models/final_checkpoint.pt --output outputs_v051_test/plots

# Action analysis
python analyze_actions_v05.py --models_dir outputs_v051_test/models --output outputs_v051_test/plots

Experimental Results

Performance Comparison (2000 Episodes)

Metric Long ($70k) Lack ($56k) Empty ($18k)
Municipality Reward 7,184.86 6,446.76 (-10.3%) 4,224.75 (-41.2%)
Urban Component 1,322.63 581.44 (-56.0%) -983.86 (negative!)
Rural Component 5,295.99 5,416.89 (+2.3%) 5,045.18 (-7.1%)
Average Cost $587k $666k (+13.5%) $338k (-49.3%)
Budget Usage 40.2% 60.5% 64.0%
Cooperation Rate 100% 89% 1% (collapse)

Training Performance

  • Hardware: NVIDIA GeForce RTX 4060 Ti (16GB)
  • Speed: ~1.9 seconds/episode
  • 100 episodes: ~2 minutes
  • 2000 episodes: ~60 minutes

Project Structure

dql-bridge-maintenance/
├── README.md                           # This file
├── Cross-subsidy_Lessons.md           # Experimental insights
├── requirements.txt
├── src/
│   └── fleet_environment_v05.py       # Core environment
├── train_fleet_v05.py                 # Training script
├── visualize_fleet_v05.py             # Visualization tools
├── analyze_actions_v05.py             # Action analysis
├── outputs_v051_*/                     # Experiment outputs
│   ├── models/
│   │   ├── checkpoint_ep*.pt
│   │   ├── urban_agent_final.pt
│   │   ├── rural_agent_final.pt
│   │   └── final_checkpoint.pt
│   └── plots/
│       ├── training_curves_v05.png
│       └── action_analysis_v05.png
└── 0_LogBAK/                          # Version archives
    ├── v0.1/, v0.2/, v0.3/, v0.4/
    └── README_v*.md

Key Insights

1. Long-term Role Specialization

  • Test (100ep) → Long (2000ep):
    • Urban contribution: -38.9% (2,167 → 1,323)
    • Rural contribution: +16.3% (4,555 → 5,296)
    • Rural becomes dominant, handling 73.7% of reward

2. Budget Realism Matters

  • Long: 40.2% usage → agents too conservative
  • Lack: 60.5% usage → realistic behavior
  • Empty: 64.0% usage but cooperation collapses

3. Cooperation Under Stress

  • Flexible up to 20% deficit (89% cooperation)
  • Catastrophic failure at 74% deficit (1% cooperation)
  • Small departments (Urban) fail first

4. Cross-Subsidy Value

  • Enables 89% cooperation under 20% deficit
  • Without unified pool, likely total failure
  • Urban self-reduces to support system stability

5. Reward Design Limitation

  • Current: requires both agents positive for bonus
  • Problem: extreme deficit makes Urban negative
  • Solution: relative improvement or partial bonus

Future Directions

  1. Improved Reward Mechanism

    • Relative improvement bonuses
    • Partial cooperation rewards
    • Risk-adjusted metrics (CVaR)
  2. Dynamic Budget Allocation

    • Learn optimal allocation ratios
    • Demand forecasting
    • Multi-year planning
  3. Uncertainty Modeling

    • Budget fluctuations
    • Disaster events
    • Policy changes
  4. Explainability

    • Decision visualization
    • Strategy interpretation
    • Stakeholder communication
  5. Real-world Validation

    • Actual bridge data
    • Field testing
    • Human-AI collaboration

References

  • DQN: Mnih et al. (2015). Human-level control through deep reinforcement learning. Nature.
  • Multi-Agent RL: Lowe et al. (2017). Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments.
  • Bridge Management: AASHTO (2011). Manual for Bridge Evaluation.
  • Implementation: Lapan (2020). Deep Reinforcement Learning Hands-On (2nd ed.).

Citation

If you use this work, please cite:

@software{bridge_dqn_2025,
  author = {Your Name},
  title = {Bridge Fleet Management with Deep Reinforcement Learning},
  year = {2025},
  url = {https://github.com/yourusername/dql-bridge-maintenance}
}

License

MIT License - See LICENSE file for details.

Research and educational use encouraged.


Last Updated: 2025-12-06
Version: 0.5.1
Status: Active Development
Python: 3.12.10 | PyTorch: 2.6.0+cu124 | CUDA: 12.4