Bridge Fleet Management with Deep Reinforcement Learning

A deep reinforcement learning system for optimizing bridge maintenance decisions across municipal infrastructure fleets, implementing cross-subsidy budget sharing and cooperative multi-agent learning.

Overview

This project applies Deep Q-Networks (DQN) to the complex problem of managing 100+ bridge assets under budget constraints. The system learns optimal maintenance policies that balance infrastructure health, lifecycle costs, and inter-departmental cooperation.

Key Features

Multi-Agent Coordination: Urban (20 bridges) and Rural (80 bridges) agents cooperate
Cross-Subsidy Budget Sharing: Flexible budget reallocation between departments
Unified Municipality Reward: Agents optimize shared objectives, not individual metrics
Realistic Constraints: Budget deficits, no carryover, diverse bridge ages (20-50 years)
GPU Acceleration: CUDA-enabled training for 2-5x speedup
Comprehensive Experiments: 4 budget scenarios from surplus to extreme deficit

System Architecture

graph TB
    subgraph Municipality["Municipality Management System"]
        Budget["Unified Budget Pool<br/>$18k-$70k/year"]
        
        subgraph Urban["Urban Agent"]
            U_State["State: 81D<br/>20 bridges x 4 features"]
            U_DQN["DQN Network<br/>81→512→1024→512→100"]
            U_Action["Actions: 20 x 5<br/>None/Light/Medium/Major/Replace"]
        end
        
        subgraph Rural["Rural Agent"]
            R_State["State: 10D<br/>Fleet statistics + budget"]
            R_DQN["DQN Network<br/>10→256→512→256→8"]
            R_Action["Strategies: 8<br/>Reactive/Preventive/Adaptive"]
        end
        
        Budget -->|"60%"| U_State
        Budget -->|"40%"| R_State
        
        U_State --> U_DQN
        U_DQN --> U_Action
        U_Action --> Env["Environment<br/>100 Bridges"]
        
        R_State --> R_DQN
        R_DQN --> R_Action
        R_Action --> Env
        
        Env --> Reward["Municipality Reward<br/>Urban + Rural"]
        Reward -->|"Cooperative Bonus<br/>+10% if both > 0"| U_DQN
        Reward --> R_DQN
    end
    
    style Budget fill:#FFD700
    style Reward fill:#90EE90
    style Env fill:#87CEEB

DQN Learning Flow

The complete training pipeline with experience replay and target networks:

graph TB
    Start([Start Training]) --> Init[Initialize Environment<br/>and Agent Networks]
    Init --> Episode[Start Episode t]
    
    Episode --> Reset["Reset Environment<br/>Budget, Bridge States"]
    Reset --> Year[Year i = 1]
    
    Year --> GetState["Get State<br/>Urban: 81D | Rural: 10D"]
    GetState --> Explore{"ε-greedy<br/>Exploration?"}
    
    Explore -->|"rand < ε"| Random[Select Random<br/>Actions/Strategy]
    Explore -->|"rand ≥ ε"| Greedy["Select Greedy<br/>a = argmax Q(s)"]
    
    Random --> Execute[Execute Actions]
    Greedy --> Execute
    
    Execute --> ApplyActions["Apply Maintenance<br/>Update States, Spend Budget"]
    ApplyActions --> Degrade[Natural Degradation<br/>Age += 1]
    Degrade --> ComputeReward["Compute Rewards<br/>Urban & Rural"]
    
    ComputeReward --> Unify["Unified Reward<br/>R_muni = R_urban + R_rural"]
    Unify --> CoopCheck{"Both agents<br/>R > 0?"}
    CoopCheck -->|Yes| Bonus["Add Bonus<br/>R_muni += 0.1 × R_muni"]
    CoopCheck -->|No| Store
    Bonus --> Store["Store Experience<br/>(s, a, R_muni, s', done)"]
    
    Store --> BufferCheck{"Buffer Size<br/>≥ Batch?"}
    BufferCheck -->|No| NextYear
    BufferCheck -->|Yes| Sample[Sample Minibatch]
    
    Sample --> ComputeQ["Compute Q(s,a)<br/>from Policy Network"]
    ComputeQ --> ComputeTarget["Compute Target<br/>y = R + γ·max Q_target(s')"]
    ComputeTarget --> Loss["MSE Loss<br/>L = (Q - y)²"]
    Loss --> Backprop["Backpropagation<br/>Update θ_policy"]
    Backprop --> Clip[Gradient Clipping<br/>clip_norm = 10.0]
    Clip --> SyncCheck{"Step mod<br/>sync_freq = 0?"}
    
    SyncCheck -->|Yes| Sync["Sync Target Network<br/>θ_target ← θ_policy"]
    SyncCheck -->|No| NextYear
    Sync --> NextYear
    
    NextYear --> YearCheck{"Year i<br/>= 30?"}
    YearCheck -->|No| Year
    YearCheck -->|Yes| Decay["Decay ε<br/>ε = max(0.05, ε - decay)"]
    
    Decay --> EpisodeCheck{"Episode t<br/>= max_episodes?"}
    EpisodeCheck -->|No| Episode
    EpisodeCheck -->|Yes| Save[Save Models<br/>& Training Stats]
    
    Save --> End([Training Complete])
    
    style Start fill:#90EE90
    style End fill:#FFB6C1
    style Execute fill:#87CEEB
    style Backprop fill:#FFA07A
    style Bonus fill:#FFD700

Version History & Evolution

v0.1 - Single Bridge MVP (2024)

Goal: Proof of concept for DQN on bridge maintenance
Features:
- 3-state MDP (Good/Fair/Poor)
- 6 discrete actions based on NBI work types
- Single bridge, 30-year horizon
- GPU acceleration support
Key Learning: Successfully demonstrated DQN convergence

v0.2-0.3 - Enhanced Single Bridge (2024)

Improvements:
- Refined reward function balancing health vs cost
- Better hyperparameter tuning
- Monte Carlo validation framework
- Baseline comparisons (do-nothing, reactive)
Key Learning: Importance of reward engineering

v0.4 - Fleet Management (2025)

Scale-up: 1 bridge → 100 bridges
Architecture:
- Urban Agent: 20 high-traffic bridges (individual management)
- Rural Agent: 80 low-traffic bridges (strategy-based)
Challenges:
- Urban bridges degraded continuously
- Budget allocation inflexible
- Agents competed for resources
Key Learning: Need for departmental cooperation

v0.5 - Cooperative Learning (2025-12-05)

Major Redesign: Cross-subsidy + unified rewards
New Features:
1. Cross-Subsidy Budget Sharing
  - Unified budget pool with flexible allocation
  - Up to 30% budget transfer between departments
2. Unified Municipality Reward
  - Both agents optimize same objective
  - 10% cooperative bonus when both succeed
3. Diverse Bridge Ages
  - Realistic 20-50 year distribution
  - Age-dependent initialization
4. Enhanced Urban Penalties
  - Degradation penalty: 10.0
  - Critical penalty: 30.0 (state < 6)
5. Expanded Rural Strategies
  - Added: Preventive, Balanced, Adaptive
  - Total: 8 strategic options
Results: +7.1% total reward improvement

v0.5.1 - Budget Realism Experiments (2025-12-06)

Systematic exploration of budget constraints to find system limits:

Experiment	Budget	Deficit	Episodes	Goal
Test	$70k	0%	100	Initial validation
Long	$70k	0%	2000	Long-term learning
Lack	$56k	20%	2000	Realistic shortage
Alert	$37k	47%	2000	Critical deficit
Empty	$18k	74%	2000	System collapse

Key Findings:

Long: 40.2% budget usage → unrealistic surplus
Lack: 60.5% usage, 89% cooperation → functional
Alert: [Running] Testing cooperation threshold
Empty: 64.0% usage, 1% cooperation → complete breakdown

Critical Threshold: System remains functional down to 20% deficit, collapses at 74% deficit.

Installation

Requirements

Python 3.12+
PyTorch 2.6+ (with CUDA 12.4+ for GPU)
NumPy, Matplotlib

Setup

# Clone repository
git clone https://github.com/yourusername/dql-bridge-maintenance.git
cd dql-bridge-maintenance

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install numpy matplotlib pyyaml

Usage

Training

# Quick test (100 episodes)
python train_fleet_v05.py --episodes 100 --output outputs_v051_test

# Long-term training (2000 episodes)
python train_fleet_v05.py --episodes 2000 --output outputs_v051_long

# Budget deficit scenario
python train_fleet_v05.py --episodes 2000 --output outputs_v051_lack

# CPU mode (no GPU)
python train_fleet_v05.py --episodes 100 --device cpu

Visualization

# Training curves
python visualize_fleet_v05.py --checkpoint outputs_v051_test/models/final_checkpoint.pt --output outputs_v051_test/plots

# Action analysis
python analyze_actions_v05.py --models_dir outputs_v051_test/models --output outputs_v051_test/plots

Experimental Results

Performance Comparison (2000 Episodes)

Metric	Long ($70k)	Lack ($56k)	Empty ($18k)
Municipality Reward	7,184.86	6,446.76 (-10.3%)	4,224.75 (-41.2%)
Urban Component	1,322.63	581.44 (-56.0%)	-983.86 (negative!)
Rural Component	5,295.99	5,416.89 (+2.3%)	5,045.18 (-7.1%)
Average Cost	$587k	$666k (+13.5%)	$338k (-49.3%)
Budget Usage	40.2%	60.5%	64.0%
Cooperation Rate	100%	89%	1% (collapse)

Training Performance

Hardware: NVIDIA GeForce RTX 4060 Ti (16GB)
Speed: ~1.9 seconds/episode
100 episodes: ~2 minutes
2000 episodes: ~60 minutes

Project Structure

dql-bridge-maintenance/
├── README.md                           # This file
├── Cross-subsidy_Lessons.md           # Experimental insights
├── requirements.txt
├── src/
│   └── fleet_environment_v05.py       # Core environment
├── train_fleet_v05.py                 # Training script
├── visualize_fleet_v05.py             # Visualization tools
├── analyze_actions_v05.py             # Action analysis
├── outputs_v051_*/                     # Experiment outputs
│   ├── models/
│   │   ├── checkpoint_ep*.pt
│   │   ├── urban_agent_final.pt
│   │   ├── rural_agent_final.pt
│   │   └── final_checkpoint.pt
│   └── plots/
│       ├── training_curves_v05.png
│       └── action_analysis_v05.png
└── 0_LogBAK/                          # Version archives
    ├── v0.1/, v0.2/, v0.3/, v0.4/
    └── README_v*.md

Key Insights

1. Long-term Role Specialization

Test (100ep) → Long (2000ep):
- Urban contribution: -38.9% (2,167 → 1,323)
- Rural contribution: +16.3% (4,555 → 5,296)
- Rural becomes dominant, handling 73.7% of reward

2. Budget Realism Matters

Long: 40.2% usage → agents too conservative
Lack: 60.5% usage → realistic behavior
Empty: 64.0% usage but cooperation collapses

3. Cooperation Under Stress

Flexible up to 20% deficit (89% cooperation)
Catastrophic failure at 74% deficit (1% cooperation)
Small departments (Urban) fail first

4. Cross-Subsidy Value

Enables 89% cooperation under 20% deficit
Without unified pool, likely total failure
Urban self-reduces to support system stability

5. Reward Design Limitation

Current: requires both agents positive for bonus
Problem: extreme deficit makes Urban negative
Solution: relative improvement or partial bonus

Future Directions

Improved Reward Mechanism
- Relative improvement bonuses
- Partial cooperation rewards
- Risk-adjusted metrics (CVaR)
Dynamic Budget Allocation
- Learn optimal allocation ratios
- Demand forecasting
- Multi-year planning
Uncertainty Modeling
- Budget fluctuations
- Disaster events
- Policy changes
Explainability
- Decision visualization
- Strategy interpretation
- Stakeholder communication
Real-world Validation
- Actual bridge data
- Field testing
- Human-AI collaboration

References

DQN: Mnih et al. (2015). Human-level control through deep reinforcement learning. Nature.
Multi-Agent RL: Lowe et al. (2017). Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments.
Bridge Management: AASHTO (2011). Manual for Bridge Evaluation.
Implementation: Lapan (2020). Deep Reinforcement Learning Hands-On (2nd ed.).

Citation

If you use this work, please cite:

@software{bridge_dqn_2025,
  author = {Your Name},
  title = {Bridge Fleet Management with Deep Reinforcement Learning},
  year = {2025},
  url = {https://github.com/yourusername/dql-bridge-maintenance}
}

License

MIT License - See LICENSE file for details.

Research and educational use encouraged.

Last Updated: 2025-12-06
Version: 0.5.1
Status: Active Development
Python: 3.12.10 | PyTorch: 2.6.0+cu124 | CUDA: 12.4

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
Cross-subsidy_Lessons.md		Cross-subsidy_Lessons.md
Faster_acceleration.md		Faster_acceleration.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
README_v01.md		README_v01.md
README_v02.md		README_v02.md
README_v03.md		README_v03.md
README_v04.md		README_v04.md
README_v05.md		README_v05.md
analyze_actions_v05.py		analyze_actions_v05.py
config.yaml		config.yaml
requirements.txt		requirements.txt
train_fleet_v05.py		train_fleet_v05.py
visualize_fleet_v05.py		visualize_fleet_v05.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bridge Fleet Management with Deep Reinforcement Learning

Overview

Key Features

System Architecture

DQN Learning Flow

Version History & Evolution

v0.1 - Single Bridge MVP (2024)

v0.2-0.3 - Enhanced Single Bridge (2024)

v0.4 - Fleet Management (2025)

v0.5 - Cooperative Learning (2025-12-05)

v0.5.1 - Budget Realism Experiments (2025-12-06)

Installation

Requirements

Setup

Usage

Training

Visualization

Experimental Results

Performance Comparison (2000 Episodes)

Training Performance

Project Structure

Key Insights

1. Long-term Role Specialization

2. Budget Realism Matters

3. Cooperation Under Stress

4. Cross-Subsidy Value

5. Reward Design Limitation

Future Directions

References

Citation

License

About

Uh oh!

Releases

Packages

Languages

tk-yasuno/dql-bridge-maintenance

Folders and files

Latest commit

History

Repository files navigation

Bridge Fleet Management with Deep Reinforcement Learning

Overview

Key Features

System Architecture

DQN Learning Flow

Version History & Evolution

v0.1 - Single Bridge MVP (2024)

v0.2-0.3 - Enhanced Single Bridge (2024)

v0.4 - Fleet Management (2025)

v0.5 - Cooperative Learning (2025-12-05)

v0.5.1 - Budget Realism Experiments (2025-12-06)

Installation

Requirements

Setup

Usage

Training

Visualization

Experimental Results

Performance Comparison (2000 Episodes)

Training Performance

Project Structure

Key Insights

1. Long-term Role Specialization

2. Budget Realism Matters

3. Cooperation Under Stress

4. Cross-Subsidy Value

5. Reward Design Limitation

Future Directions

References

Citation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages