A deep reinforcement learning system for optimizing bridge maintenance decisions across municipal infrastructure fleets, implementing cross-subsidy budget sharing and cooperative multi-agent learning.
This project applies Deep Q-Networks (DQN) to the complex problem of managing 100+ bridge assets under budget constraints. The system learns optimal maintenance policies that balance infrastructure health, lifecycle costs, and inter-departmental cooperation.
- Multi-Agent Coordination: Urban (20 bridges) and Rural (80 bridges) agents cooperate
- Cross-Subsidy Budget Sharing: Flexible budget reallocation between departments
- Unified Municipality Reward: Agents optimize shared objectives, not individual metrics
- Realistic Constraints: Budget deficits, no carryover, diverse bridge ages (20-50 years)
- GPU Acceleration: CUDA-enabled training for 2-5x speedup
- Comprehensive Experiments: 4 budget scenarios from surplus to extreme deficit
graph TB
subgraph Municipality["Municipality Management System"]
Budget["Unified Budget Pool<br/>$18k-$70k/year"]
subgraph Urban["Urban Agent"]
U_State["State: 81D<br/>20 bridges x 4 features"]
U_DQN["DQN Network<br/>81→512→1024→512→100"]
U_Action["Actions: 20 x 5<br/>None/Light/Medium/Major/Replace"]
end
subgraph Rural["Rural Agent"]
R_State["State: 10D<br/>Fleet statistics + budget"]
R_DQN["DQN Network<br/>10→256→512→256→8"]
R_Action["Strategies: 8<br/>Reactive/Preventive/Adaptive"]
end
Budget -->|"60%"| U_State
Budget -->|"40%"| R_State
U_State --> U_DQN
U_DQN --> U_Action
U_Action --> Env["Environment<br/>100 Bridges"]
R_State --> R_DQN
R_DQN --> R_Action
R_Action --> Env
Env --> Reward["Municipality Reward<br/>Urban + Rural"]
Reward -->|"Cooperative Bonus<br/>+10% if both > 0"| U_DQN
Reward --> R_DQN
end
style Budget fill:#FFD700
style Reward fill:#90EE90
style Env fill:#87CEEB
The complete training pipeline with experience replay and target networks:
graph TB
Start([Start Training]) --> Init[Initialize Environment<br/>and Agent Networks]
Init --> Episode[Start Episode t]
Episode --> Reset["Reset Environment<br/>Budget, Bridge States"]
Reset --> Year[Year i = 1]
Year --> GetState["Get State<br/>Urban: 81D | Rural: 10D"]
GetState --> Explore{"ε-greedy<br/>Exploration?"}
Explore -->|"rand < ε"| Random[Select Random<br/>Actions/Strategy]
Explore -->|"rand ≥ ε"| Greedy["Select Greedy<br/>a = argmax Q(s)"]
Random --> Execute[Execute Actions]
Greedy --> Execute
Execute --> ApplyActions["Apply Maintenance<br/>Update States, Spend Budget"]
ApplyActions --> Degrade[Natural Degradation<br/>Age += 1]
Degrade --> ComputeReward["Compute Rewards<br/>Urban & Rural"]
ComputeReward --> Unify["Unified Reward<br/>R_muni = R_urban + R_rural"]
Unify --> CoopCheck{"Both agents<br/>R > 0?"}
CoopCheck -->|Yes| Bonus["Add Bonus<br/>R_muni += 0.1 × R_muni"]
CoopCheck -->|No| Store
Bonus --> Store["Store Experience<br/>(s, a, R_muni, s', done)"]
Store --> BufferCheck{"Buffer Size<br/>≥ Batch?"}
BufferCheck -->|No| NextYear
BufferCheck -->|Yes| Sample[Sample Minibatch]
Sample --> ComputeQ["Compute Q(s,a)<br/>from Policy Network"]
ComputeQ --> ComputeTarget["Compute Target<br/>y = R + γ·max Q_target(s')"]
ComputeTarget --> Loss["MSE Loss<br/>L = (Q - y)²"]
Loss --> Backprop["Backpropagation<br/>Update θ_policy"]
Backprop --> Clip[Gradient Clipping<br/>clip_norm = 10.0]
Clip --> SyncCheck{"Step mod<br/>sync_freq = 0?"}
SyncCheck -->|Yes| Sync["Sync Target Network<br/>θ_target ← θ_policy"]
SyncCheck -->|No| NextYear
Sync --> NextYear
NextYear --> YearCheck{"Year i<br/>= 30?"}
YearCheck -->|No| Year
YearCheck -->|Yes| Decay["Decay ε<br/>ε = max(0.05, ε - decay)"]
Decay --> EpisodeCheck{"Episode t<br/>= max_episodes?"}
EpisodeCheck -->|No| Episode
EpisodeCheck -->|Yes| Save[Save Models<br/>& Training Stats]
Save --> End([Training Complete])
style Start fill:#90EE90
style End fill:#FFB6C1
style Execute fill:#87CEEB
style Backprop fill:#FFA07A
style Bonus fill:#FFD700
- Goal: Proof of concept for DQN on bridge maintenance
- Features:
- 3-state MDP (Good/Fair/Poor)
- 6 discrete actions based on NBI work types
- Single bridge, 30-year horizon
- GPU acceleration support
- Key Learning: Successfully demonstrated DQN convergence
- Improvements:
- Refined reward function balancing health vs cost
- Better hyperparameter tuning
- Monte Carlo validation framework
- Baseline comparisons (do-nothing, reactive)
- Key Learning: Importance of reward engineering
- Scale-up: 1 bridge → 100 bridges
- Architecture:
- Urban Agent: 20 high-traffic bridges (individual management)
- Rural Agent: 80 low-traffic bridges (strategy-based)
- Challenges:
- Urban bridges degraded continuously
- Budget allocation inflexible
- Agents competed for resources
- Key Learning: Need for departmental cooperation
-
Major Redesign: Cross-subsidy + unified rewards
-
New Features:
-
Cross-Subsidy Budget Sharing
- Unified budget pool with flexible allocation
- Up to 30% budget transfer between departments
-
Unified Municipality Reward
- Both agents optimize same objective
- 10% cooperative bonus when both succeed
-
Diverse Bridge Ages
- Realistic 20-50 year distribution
- Age-dependent initialization
-
Enhanced Urban Penalties
- Degradation penalty: 10.0
- Critical penalty: 30.0 (state < 6)
-
Expanded Rural Strategies
- Added: Preventive, Balanced, Adaptive
- Total: 8 strategic options
-
-
Results: +7.1% total reward improvement
Systematic exploration of budget constraints to find system limits:
| Experiment | Budget | Deficit | Episodes | Goal |
|---|---|---|---|---|
| Test | $70k | 0% | 100 | Initial validation |
| Long | $70k | 0% | 2000 | Long-term learning |
| Lack | $56k | 20% | 2000 | Realistic shortage |
| Alert | $37k | 47% | 2000 | Critical deficit |
| Empty | $18k | 74% | 2000 | System collapse |
Key Findings:
- Long: 40.2% budget usage → unrealistic surplus
- Lack: 60.5% usage, 89% cooperation → functional
- Alert: [Running] Testing cooperation threshold
- Empty: 64.0% usage, 1% cooperation → complete breakdown
Critical Threshold: System remains functional down to 20% deficit, collapses at 74% deficit.
- Python 3.12+
- PyTorch 2.6+ (with CUDA 12.4+ for GPU)
- NumPy, Matplotlib
# Clone repository
git clone https://github.com/yourusername/dql-bridge-maintenance.git
cd dql-bridge-maintenance
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install numpy matplotlib pyyaml# Quick test (100 episodes)
python train_fleet_v05.py --episodes 100 --output outputs_v051_test
# Long-term training (2000 episodes)
python train_fleet_v05.py --episodes 2000 --output outputs_v051_long
# Budget deficit scenario
python train_fleet_v05.py --episodes 2000 --output outputs_v051_lack
# CPU mode (no GPU)
python train_fleet_v05.py --episodes 100 --device cpu# Training curves
python visualize_fleet_v05.py --checkpoint outputs_v051_test/models/final_checkpoint.pt --output outputs_v051_test/plots
# Action analysis
python analyze_actions_v05.py --models_dir outputs_v051_test/models --output outputs_v051_test/plots| Metric | Long ($70k) | Lack ($56k) | Empty ($18k) |
|---|---|---|---|
| Municipality Reward | 7,184.86 | 6,446.76 (-10.3%) | 4,224.75 (-41.2%) |
| Urban Component | 1,322.63 | 581.44 (-56.0%) | -983.86 (negative!) |
| Rural Component | 5,295.99 | 5,416.89 (+2.3%) | 5,045.18 (-7.1%) |
| Average Cost | $587k | $666k (+13.5%) | $338k (-49.3%) |
| Budget Usage | 40.2% | 60.5% | 64.0% |
| Cooperation Rate | 100% | 89% | 1% (collapse) |
- Hardware: NVIDIA GeForce RTX 4060 Ti (16GB)
- Speed: ~1.9 seconds/episode
- 100 episodes: ~2 minutes
- 2000 episodes: ~60 minutes
dql-bridge-maintenance/
├── README.md # This file
├── Cross-subsidy_Lessons.md # Experimental insights
├── requirements.txt
├── src/
│ └── fleet_environment_v05.py # Core environment
├── train_fleet_v05.py # Training script
├── visualize_fleet_v05.py # Visualization tools
├── analyze_actions_v05.py # Action analysis
├── outputs_v051_*/ # Experiment outputs
│ ├── models/
│ │ ├── checkpoint_ep*.pt
│ │ ├── urban_agent_final.pt
│ │ ├── rural_agent_final.pt
│ │ └── final_checkpoint.pt
│ └── plots/
│ ├── training_curves_v05.png
│ └── action_analysis_v05.png
└── 0_LogBAK/ # Version archives
├── v0.1/, v0.2/, v0.3/, v0.4/
└── README_v*.md
- Test (100ep) → Long (2000ep):
- Urban contribution: -38.9% (2,167 → 1,323)
- Rural contribution: +16.3% (4,555 → 5,296)
- Rural becomes dominant, handling 73.7% of reward
- Long: 40.2% usage → agents too conservative
- Lack: 60.5% usage → realistic behavior
- Empty: 64.0% usage but cooperation collapses
- Flexible up to 20% deficit (89% cooperation)
- Catastrophic failure at 74% deficit (1% cooperation)
- Small departments (Urban) fail first
- Enables 89% cooperation under 20% deficit
- Without unified pool, likely total failure
- Urban self-reduces to support system stability
- Current: requires both agents positive for bonus
- Problem: extreme deficit makes Urban negative
- Solution: relative improvement or partial bonus
-
Improved Reward Mechanism
- Relative improvement bonuses
- Partial cooperation rewards
- Risk-adjusted metrics (CVaR)
-
Dynamic Budget Allocation
- Learn optimal allocation ratios
- Demand forecasting
- Multi-year planning
-
Uncertainty Modeling
- Budget fluctuations
- Disaster events
- Policy changes
-
Explainability
- Decision visualization
- Strategy interpretation
- Stakeholder communication
-
Real-world Validation
- Actual bridge data
- Field testing
- Human-AI collaboration
- DQN: Mnih et al. (2015). Human-level control through deep reinforcement learning. Nature.
- Multi-Agent RL: Lowe et al. (2017). Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments.
- Bridge Management: AASHTO (2011). Manual for Bridge Evaluation.
- Implementation: Lapan (2020). Deep Reinforcement Learning Hands-On (2nd ed.).
If you use this work, please cite:
@software{bridge_dqn_2025,
author = {Your Name},
title = {Bridge Fleet Management with Deep Reinforcement Learning},
year = {2025},
url = {https://github.com/yourusername/dql-bridge-maintenance}
}MIT License - See LICENSE file for details.
Research and educational use encouraged.
Last Updated: 2025-12-06
Version: 0.5.1
Status: Active Development
Python: 3.12.10 | PyTorch: 2.6.0+cu124 | CUDA: 12.4