Capacity-Aware Planning and Scheduling in Budget-Constrained Multi-Agent MDPs

Official implementation of our IEEE RAL 2025 paper

Authors: Manav Vora, Ilan Shomorony, Melkior Ornik

Abstract

We study capacity- and budget-constrained multi-agent MDPs (CB-MA-MDPs), a class that captures many maintenance and scheduling tasks in which each agent can irreversibly fail and a planner must decide (i) when to apply a restorative action and (ii) which subset of agents to treat in parallel. The global budget limits the total number of restorations, while the capacity constraint bounds the number of simultaneous actions, turning naïve dynamic programming into a combinatorial search that scales exponentially with the number of agents.

We propose a two-stage solution that remains tractable for large systems:

LSAP-based Grouping: Partitions agents into disjoint sets maximizing diversity in expected time-to-failure
Meta-trained PPO: Solves each sub-MDP with transfer learning for rapid convergence

We validate our approach on industrial robot repair scheduling with limited technicians and budget. Results demonstrate that our method outperforms baselines in maximizing system uptime, particularly for large team sizes, with scalability confirmed for 1000+ agents.

Visual Results

Distribution of survival times
(100 robots, 30 technicians)

Performance across budget levels
(100 robots, 30 technicians)

Computational time heatmap
(log seconds)

Installation

Requirements

Python 3.8+
PyTorch (for PPO)
Stable-Baselines3
See requirements.txt for full dependencies

Setup

# Clone repository
git clone https://github.com/yourusername/RAL-2025---Capacity-and-Budget-Constrained-Multi-Agent-RL.git
cd RAL-2025---Capacity-and-Budget-Constrained-Multi-Agent-RL

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

Repository Structure

├── env/                              # Environment Implementation
│   ├── env_multi_repair.py          # CB-MA-MDP environment (Gymnasium)
│   ├── robot.py                     # Agent with Weibull degradation
│   ├── complete_components_data.csv # Real component failure data
│   └── all_robots_ttfs_*.npy       # Pre-computed ETTF statistics
│
└── PPO/                              # Algorithms and Experiments
    │
    ├── 🌟 PROPOSED METHOD (LSAP + Meta-PPO)
    │   ├── lsap_partitioning.py                 # LSAP grouping algorithm
    │   ├── lsap_partitioning_policy.py          # Main LSAP+PPO (stores results)
    │   ├── ppo_new.py                           # PPO training script
    │   ├── meta_ppo_policy.py                   # Meta-learning policy
    │   └── ppo_policy.py                        # Vanilla PPO evaluation
    │
    ├── 📊 BASELINE METHODS
    │   ├── baseline_partitioning_policy.py      # Random Partition + Meta-PPO (RP-PPO)
    │   ├── genetic_algorithm_2.py               # Genetic Algorithm (GA)
    │   ├── auction_method.py                    # Auction Heuristic
    │   ├── greedy_baseline.py                   # Greedy Heuristic
    │   ├── ilp.py                               # Integer Linear Programming (ILP)
    │   └── milp_partitioning_policy.py          # MILP Partition + Meta-PPO (MP-PPO)
    │
    └── 📈 COMPARISON & ANALYSIS
        ├── compare_repair_policies.py           # Main comparison (generates figures)
        └── compare_grouping.py                  # Grouping strategy comparison

Note: Only source code and essential data files are tracked. Generated results (.npy), models (.pth), and plots (.png) are excluded via .gitignore.

Running the Code

Step 1: Train Meta-PPO Policy

First, train the PPO policy that will be used by the partitioning methods:

cd PPO
python ppo_new.py

This trains the meta-PPO policy on diverse agent configurations.

Step 2: Run Proposed Method (LSAP + Meta-PPO)

cd PPO
python lsap_partitioning_policy.py

This will:

Perform LSAP-based partitioning
Apply the trained meta-PPO policy to each group
Store results as .npy files (e.g., lsap_partitioning_policy_*_survival_times.npy)

Default configuration: 10 robots, 3 repairmen, budget=30

Step 3: Run with Custom Parameters

cd PPO
python lsap_partitioning_policy.py --num-robots 100 --num-repairmen 30 --repair-budget 1000

Running Baselines

1. Vanilla PPO

cd PPO
python ppo_policy.py --num-robots 100 --num-repairmen 30 --repair-budget 1000

2. Random Partition + Meta-PPO (RP-PPO)

cd PPO
python baseline_partitioning_policy.py --num-robots 100 --num-repairmen 30 --repair-budget 1000

3. Genetic Algorithm (GA)

cd PPO
python genetic_algorithm_2.py --num-robots 100 --num-repairmen 30 --repair-budget 1000

4. Auction Heuristic

cd PPO
python auction_method.py --num-robots 100 --num-repairmen 30 --repair-budget 1000

5. Integer Linear Programming (ILP)

cd PPO
python ilp.py --num-robots 10 --num-repairmen 3 --repair-budget 30

Note: ILP only tractable for N ≤ 20 robots due to computational complexity.

6. MILP Partition + Meta-PPO (MP-PPO)

cd PPO
python milp_partitioning_policy.py --num-robots 100 --num-repairmen 30 --repair-budget 1000

Generate Comparison Plots

After running experiments, generate comparison figures:

cd PPO
python compare_repair_policies.py

This reproduces the paper figures comparing all methods across different scales.

Method Overview

Two-Stage Approach

Stage 1: LSAP-based Partitioning

Compute expected time-to-failure (ETTF) for each agent using Weibull parameters
Formulate Linear Sum Assignment Problem (LSAP) to partition agents into r diverse groups
Allocate budget proportionally based on group ETTF
Complexity: O(N³) using Hungarian algorithm

Stage 2: Meta-trained PPO

Pre-train PPO policy on diverse synthetic agent configurations
Fine-tune policy for each group via transfer learning
Deploy policies independently per group in parallel
Complexity: O(r × PPO_training)

Overall: Scales to 1000+ agents, outperforms baselines while remaining computationally tractable.

Baseline Methods

Our paper compares LSAP + Meta-PPO against six baselines:

Integer Linear Programming (ILP) – Exact integer linear program solved with Gurobi (optimal for small N)
Vanilla PPO – Single PPO network trained on full CB-MA-MDP
Genetic Algorithm (GA) – Rank selection, two-point crossover, bit-flip mutation
Auction Heuristic – Agents bid based on failure risk; top-r bids receive repairs
Random Partition + Meta-PPO (RP-PPO) – Random grouping followed by meta-PPO
MILP Partition + Meta-PPO (MP-PPO) – Diversity-maximizing MILP with LP relaxation + rounding

Environment Details

CB-MA-MDP Formulation

State Space: Per-agent health h_i ∈ [0, 100] + available budget

Action Space:

Single repairman: Discrete(N+1) – which agent to repair or none
Multiple repairmen: MultiBinary(N) with constraint sum(action) ≤ r

Dynamics:

Weibull degradation: P(h' | h, a=0) ~ Weibull(shape, scale)
Repair: Probabilistic restoration to higher health values P(h' | h, a=1)

Objective: Maximize system survival time (timesteps until first agent failure)

Constraints:

Capacity: Maximum r simultaneous repairs per timestep
Budget: Total B repairs over horizon H

Configuration

Environment Parameters

Key settings in env/env_multi_repair.py:

max_steps = 100              # Episode horizon
initial_health = 100         # Starting health for all agents
failure_threshold = 0        # Agent fails when health ≤ 0

PPO Hyperparameters

Settings in PPO/ppo_new.py:

learning_rate = 3e-4
n_steps = 2048
batch_size = 64
n_epochs = 10
gamma = 0.99
gae_lambda = 0.95

Citation

If you use this code or build upon this work, please cite:

@article{vora2025capacity,
  title={Capacity-Aware Planning and Scheduling in Budget-Constrained Multi-Agent MDPs: A Meta-RL Approach},
  author={Vora, Manav and Shomorony, Ilan and Ornik, Melkior},
  journal={IEEE Robotics and Automation Letters},
  year={2025},
  publisher={IEEE}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For questions or issues:

Email: mkvora2@illinois.edu
Paper: IEEE Xplore
Issues: Open an issue on GitHub

Acknowledgments

Component failure data from industrial maintenance datasets
Built with Gymnasium, Stable-Baselines3, and PyTorch
Supported by [funding agencies/institutions]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Capacity-Aware Planning and Scheduling in Budget-Constrained Multi-Agent MDPs

Abstract

Visual Results

Installation

Requirements

Setup

Repository Structure

Running the Code

Step 1: Train Meta-PPO Policy

Step 2: Run Proposed Method (LSAP + Meta-PPO)

Step 3: Run with Custom Parameters

Running Baselines

1. Vanilla PPO

2. Random Partition + Meta-PPO (RP-PPO)

3. Genetic Algorithm (GA)

4. Auction Heuristic

5. Integer Linear Programming (ILP)

6. MILP Partition + Meta-PPO (MP-PPO)

Generate Comparison Plots

Method Overview

Two-Stage Approach

Baseline Methods

Environment Details

CB-MA-MDP Formulation

Configuration

Environment Parameters

PPO Hyperparameters

Citation

License

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
PPO		PPO
docs		docs
environment		environment
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

leadcatlab/RAL-2025---Capacity-and-Budget-Constrained-Multi-Agent-RL

Folders and files

Latest commit

History

Repository files navigation

Capacity-Aware Planning and Scheduling in Budget-Constrained Multi-Agent MDPs

Abstract

Visual Results

Installation

Requirements

Setup

Repository Structure

Running the Code

Step 1: Train Meta-PPO Policy

Step 2: Run Proposed Method (LSAP + Meta-PPO)

Step 3: Run with Custom Parameters

Running Baselines

1. Vanilla PPO

2. Random Partition + Meta-PPO (RP-PPO)

3. Genetic Algorithm (GA)

4. Auction Heuristic

5. Integer Linear Programming (ILP)

6. MILP Partition + Meta-PPO (MP-PPO)

Generate Comparison Plots

Method Overview

Two-Stage Approach

Baseline Methods

Environment Details

CB-MA-MDP Formulation

Configuration

Environment Parameters

PPO Hyperparameters

Citation

License

Contact

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages