Skip to content

leadcatlab/RAL-2025---Capacity-and-Budget-Constrained-Multi-Agent-RL

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Capacity-Aware Planning and Scheduling in Budget-Constrained Multi-Agent MDPs

License: MIT IEEE RAL 2025

Official implementation of our IEEE RAL 2025 paper

Authors: Manav Vora, Ilan Shomorony, Melkior Ornik


Abstract

We study capacity- and budget-constrained multi-agent MDPs (CB-MA-MDPs), a class that captures many maintenance and scheduling tasks in which each agent can irreversibly fail and a planner must decide (i) when to apply a restorative action and (ii) which subset of agents to treat in parallel. The global budget limits the total number of restorations, while the capacity constraint bounds the number of simultaneous actions, turning naïve dynamic programming into a combinatorial search that scales exponentially with the number of agents.

We propose a two-stage solution that remains tractable for large systems:

  1. LSAP-based Grouping: Partitions agents into disjoint sets maximizing diversity in expected time-to-failure
  2. Meta-trained PPO: Solves each sub-MDP with transfer learning for rapid convergence

We validate our approach on industrial robot repair scheduling with limited technicians and budget. Results demonstrate that our method outperforms baselines in maximizing system uptime, particularly for large team sizes, with scalability confirmed for 1000+ agents.


Visual Results

Distribution of Survival Times

Distribution of survival times
(100 robots, 30 technicians)

Budget Sensitivity

Performance across budget levels
(100 robots, 30 technicians)

Computational Scalability

Computational time heatmap
(log seconds)


Installation

Requirements

  • Python 3.8+
  • PyTorch (for PPO)
  • Stable-Baselines3
  • See requirements.txt for full dependencies

Setup

# Clone repository
git clone https://github.com/yourusername/RAL-2025---Capacity-and-Budget-Constrained-Multi-Agent-RL.git
cd RAL-2025---Capacity-and-Budget-Constrained-Multi-Agent-RL

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Repository Structure

├── env/                              # Environment Implementation
│   ├── env_multi_repair.py          # CB-MA-MDP environment (Gymnasium)
│   ├── robot.py                     # Agent with Weibull degradation
│   ├── complete_components_data.csv # Real component failure data
│   └── all_robots_ttfs_*.npy       # Pre-computed ETTF statistics
│
└── PPO/                              # Algorithms and Experiments
    │
    ├── 🌟 PROPOSED METHOD (LSAP + Meta-PPO)
    │   ├── lsap_partitioning.py                 # LSAP grouping algorithm
    │   ├── lsap_partitioning_policy.py          # Main LSAP+PPO (stores results)
    │   ├── ppo_new.py                           # PPO training script
    │   ├── meta_ppo_policy.py                   # Meta-learning policy
    │   └── ppo_policy.py                        # Vanilla PPO evaluation
    │
    ├── 📊 BASELINE METHODS
    │   ├── baseline_partitioning_policy.py      # Random Partition + Meta-PPO (RP-PPO)
    │   ├── genetic_algorithm_2.py               # Genetic Algorithm (GA)
    │   ├── auction_method.py                    # Auction Heuristic
    │   ├── greedy_baseline.py                   # Greedy Heuristic
    │   ├── ilp.py                               # Integer Linear Programming (ILP)
    │   └── milp_partitioning_policy.py          # MILP Partition + Meta-PPO (MP-PPO)
    │
    └── 📈 COMPARISON & ANALYSIS
        ├── compare_repair_policies.py           # Main comparison (generates figures)
        └── compare_grouping.py                  # Grouping strategy comparison

Note: Only source code and essential data files are tracked. Generated results (.npy), models (.pth), and plots (.png) are excluded via .gitignore.


Running the Code

Step 1: Train Meta-PPO Policy

First, train the PPO policy that will be used by the partitioning methods:

cd PPO
python ppo_new.py

This trains the meta-PPO policy on diverse agent configurations.

Step 2: Run Proposed Method (LSAP + Meta-PPO)

cd PPO
python lsap_partitioning_policy.py

This will:

  1. Perform LSAP-based partitioning
  2. Apply the trained meta-PPO policy to each group
  3. Store results as .npy files (e.g., lsap_partitioning_policy_*_survival_times.npy)

Default configuration: 10 robots, 3 repairmen, budget=30

Step 3: Run with Custom Parameters

cd PPO
python lsap_partitioning_policy.py --num-robots 100 --num-repairmen 30 --repair-budget 1000

Running Baselines

1. Vanilla PPO

cd PPO
python ppo_policy.py --num-robots 100 --num-repairmen 30 --repair-budget 1000

2. Random Partition + Meta-PPO (RP-PPO)

cd PPO
python baseline_partitioning_policy.py --num-robots 100 --num-repairmen 30 --repair-budget 1000

3. Genetic Algorithm (GA)

cd PPO
python genetic_algorithm_2.py --num-robots 100 --num-repairmen 30 --repair-budget 1000

4. Auction Heuristic

cd PPO
python auction_method.py --num-robots 100 --num-repairmen 30 --repair-budget 1000

5. Integer Linear Programming (ILP)

cd PPO
python ilp.py --num-robots 10 --num-repairmen 3 --repair-budget 30

Note: ILP only tractable for N ≤ 20 robots due to computational complexity.

6. MILP Partition + Meta-PPO (MP-PPO)

cd PPO
python milp_partitioning_policy.py --num-robots 100 --num-repairmen 30 --repair-budget 1000

Generate Comparison Plots

After running experiments, generate comparison figures:

cd PPO
python compare_repair_policies.py

This reproduces the paper figures comparing all methods across different scales.


Method Overview

Two-Stage Approach

Stage 1: LSAP-based Partitioning

  • Compute expected time-to-failure (ETTF) for each agent using Weibull parameters
  • Formulate Linear Sum Assignment Problem (LSAP) to partition agents into r diverse groups
  • Allocate budget proportionally based on group ETTF
  • Complexity: O(N³) using Hungarian algorithm

Stage 2: Meta-trained PPO

  • Pre-train PPO policy on diverse synthetic agent configurations
  • Fine-tune policy for each group via transfer learning
  • Deploy policies independently per group in parallel
  • Complexity: O(r × PPO_training)

Overall: Scales to 1000+ agents, outperforms baselines while remaining computationally tractable.


Baseline Methods

Our paper compares LSAP + Meta-PPO against six baselines:

  1. Integer Linear Programming (ILP) – Exact integer linear program solved with Gurobi (optimal for small N)
  2. Vanilla PPO – Single PPO network trained on full CB-MA-MDP
  3. Genetic Algorithm (GA) – Rank selection, two-point crossover, bit-flip mutation
  4. Auction Heuristic – Agents bid based on failure risk; top-r bids receive repairs
  5. Random Partition + Meta-PPO (RP-PPO) – Random grouping followed by meta-PPO
  6. MILP Partition + Meta-PPO (MP-PPO) – Diversity-maximizing MILP with LP relaxation + rounding

Environment Details

CB-MA-MDP Formulation

State Space: Per-agent health h_i ∈ [0, 100] + available budget

Action Space:

  • Single repairman: Discrete(N+1) – which agent to repair or none
  • Multiple repairmen: MultiBinary(N) with constraint sum(action) ≤ r

Dynamics:

  • Weibull degradation: P(h' | h, a=0) ~ Weibull(shape, scale)
  • Repair: Probabilistic restoration to higher health values P(h' | h, a=1)

Objective: Maximize system survival time (timesteps until first agent failure)

Constraints:

  • Capacity: Maximum r simultaneous repairs per timestep
  • Budget: Total B repairs over horizon H

Configuration

Environment Parameters

Key settings in env/env_multi_repair.py:

max_steps = 100              # Episode horizon
initial_health = 100         # Starting health for all agents
failure_threshold = 0        # Agent fails when health ≤ 0

PPO Hyperparameters

Settings in PPO/ppo_new.py:

learning_rate = 3e-4
n_steps = 2048
batch_size = 64
n_epochs = 10
gamma = 0.99
gae_lambda = 0.95

Citation

If you use this code or build upon this work, please cite:

@article{vora2025capacity,
  title={Capacity-Aware Planning and Scheduling in Budget-Constrained Multi-Agent MDPs: A Meta-RL Approach},
  author={Vora, Manav and Shomorony, Ilan and Ornik, Melkior},
  journal={IEEE Robotics and Automation Letters},
  year={2025},
  publisher={IEEE}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.


Contact

For questions or issues:


Acknowledgments

  • Component failure data from industrial maintenance datasets
  • Built with Gymnasium, Stable-Baselines3, and PyTorch
  • Supported by [funding agencies/institutions]

About

Capacity-and-Budget-Constrained-Multi-Agent-RL (RAL 2025)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%