Run Unconstrained PPO (Standard RL)

Federated Reinforcement Learning for Equitable Healthcare Allocation

Official Code Repository for: Federated Reinforcement Learning for Equitable and Privacy-Preserving Healthcare Resource Allocation Authors: Ronald R. Mahomane & Johnny Mahlangu Status: Preprint submitted to The Lancet Digital Health (Dec 2025)

🏥 Overview This repository contains the Proof of Concept (PoC) implementation of a Federated Reinforcement Learning (FRL) framework designed to optimize the allocation of scarce medical resources (specifically Factor VIII for hemophilia) across a network of hospitals. Unlike traditional optimization methods that maximize aggregate utility (Total QALYs) at the expense of vulnerable populations, this framework integrates Lagrangian Dual Optimization. This acts as an "Automated Ethicist," dynamically penalizing the AI agent when allocation inequity (measured by the Gini coefficient) exceeds a safety threshold.

Key Capabilities Federated Learning: Trains across 10 simulated hospitals without sharing patient-level data (Privacy-Preserving). Hub-and-Spoke Logistics: Simulates patient referrals and inventory management between rural clinics and central hubs. Attention Mechanism: Uses a Transformer-based state encoder to prioritize patients based on relative cohort severity.

⚙️ Installation

pip install -r requirements.txt 🚀 Usage

Run the Simulation To train the Federated Agent with the Equity Constraint enabled: code Bash python main.py --mode train --method fed_cpo --hospitals 10 --rounds 50 --fairness_threshold 0.15
Compare Baselines To reproduce the "Price of Fairness" results (comparing against an unconstrained agent): code Bash

Run Unconstrained PPO (Standard RL)

python main.py --mode train --method unconstrained_ppo

Run Static Heuristic Baseline

python main.py --mode evaluate --method static_heuristic 3. Visualize Results Generate the QALY vs. Gini Coefficient plots (as seen in Figure 3 of the paper): code Bash python visualize_results.py --log_dir ./logs

🧠 Code Spotlight: The "Automated Ethicist" Implementing Section 3.3.2 (Lagrangian Dual Optimization)

Here is the simplified logic implemented in the training loop: code Python class LagrangianPID: """ The 'Automated Ethicist' mechanism. Dynamically adjusts the penalty (Lambda) based on equity violations. """ def init(self, target_gini=0.15, lr_lambda=0.01): self.lambda_param = 1.0 # Initial penalty weight self.target_gini = target_gini self.lr = lr_lambda

def update(self, current_gini):
    """
    Dual Gradient Ascent Step:
    If Gini > Target: Increase Lambda (Make inequity expensive).
    If Gini < Target: Decrease Lambda (Relax constraints).
    """
    violation = current_gini - self.target_gini
    
    # Update Lambda (projected to be non-negative)
    self.lambda_param += self.lr * violation
    self.lambda_param = max(0.0, self.lambda_param)
    
    return self.lambda_param

def apply_penalty(self, rewards, gini_batch):
    """
    Modifies the reward function seen by the PPO agent.
    """
    # Reward = Clinical_Utility - (Lambda * Equity_Violation)
    equity_penalty = self.lambda_param * max(0, gini_batch - self.target_gini)
    return rewards - equity_penalty

This ensures that if the agent attempts to "game" the system by neglecting rural patients to maximize scores (The Mahlangu-Mahomane Effect), the parameter spikes, rendering that strategy mathematically suboptimal.

📂 Repository Structure code

Text

federated-equity-rl/ ├── agents/ │ ├── ppo_client.py # Local Client Update (PPO) │ ├── lagrangian.py # The Dual Optimization Logic (Sec 3.3.2) │ └── aggregator.py # Secure Aggregation (FedAvg) ├── env/ │ ├── hemophilia_env.py # OpenAI Gym Environment (Hub-and-Spoke) │ └── patient_generator.py # Synthetic Data Generation (Appendix C) ├── models/ │ └── transformer_policy.py # Multi-Head Attention Policy ├── utils/ │ ├── gini.py # Differentiable Gini calculation │ └── privacy.py # Differential Privacy (Gaussian Mechanism) ├── main.py # Entry point └── requirements.txt

📊 Reproduction Results Running the simulation with the default seeds should yield results comparable to Table 1 in the manuscript:

Metric Unconstrained PPO Federated Equity-PPO (Ours) Total QALY Gain 1.25x (Baseline) 1.18x Gini (Inequity) 0.52 (High) 0.18 (Low) Rural Response 50 mins 25 mins Price of Fairness N/A 5.6%

📝 Citation If you use this code or the concepts in your research, please cite: code Bibtex @article{mahomane2025federated, title={Federated Reinforcement Learning for Equitable and Privacy-Preserving Healthcare Resource Allocation}, author={Mahomane, Ronald R. and Mahlangu, Johnny}, journal={Preprint submitted to The Lancet Digital Health}, year={2025}, month={December} }

⚠️ Disclaimer Research Purpose Only: This software is a Proof of Concept (PoC) simulation. It is not intended for clinical use, real-world supply chain management, or diagnostic decision-making without further validation and regulatory approval. The patient data used is entirely synthetic.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
FRL_for_EHA.ipynb		FRL_for_EHA.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Run Unconstrained PPO (Standard RL)

Run Static Heuristic Baseline

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Run Unconstrained PPO (Standard RL)

Run Static Heuristic Baseline

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages