Federated Reinforcement Learning for Equitable Healthcare Allocation
Official Code Repository for: Federated Reinforcement Learning for Equitable and Privacy-Preserving Healthcare Resource Allocation Authors: Ronald R. Mahomane & Johnny Mahlangu Status: Preprint submitted to The Lancet Digital Health (Dec 2025)
π₯ Overview This repository contains the Proof of Concept (PoC) implementation of a Federated Reinforcement Learning (FRL) framework designed to optimize the allocation of scarce medical resources (specifically Factor VIII for hemophilia) across a network of hospitals. Unlike traditional optimization methods that maximize aggregate utility (Total QALYs) at the expense of vulnerable populations, this framework integrates Lagrangian Dual Optimization. This acts as an "Automated Ethicist," dynamically penalizing the AI agent when allocation inequity (measured by the Gini coefficient) exceeds a safety threshold.
Key Capabilities Federated Learning: Trains across 10 simulated hospitals without sharing patient-level data (Privacy-Preserving). Hub-and-Spoke Logistics: Simulates patient referrals and inventory management between rural clinics and central hubs. Attention Mechanism: Uses a Transformer-based state encoder to prioritize patients based on relative cohort severity.
βοΈ Installation
pip install -r requirements.txt π Usage
- Run the Simulation To train the Federated Agent with the Equity Constraint enabled: code Bash python main.py --mode train --method fed_cpo --hospitals 10 --rounds 50 --fairness_threshold 0.15
- Compare Baselines To reproduce the "Price of Fairness" results (comparing against an unconstrained agent): code Bash
python main.py --mode train --method unconstrained_ppo
python main.py --mode evaluate --method static_heuristic 3. Visualize Results Generate the QALY vs. Gini Coefficient plots (as seen in Figure 3 of the paper): code Bash python visualize_results.py --log_dir ./logs
π§ Code Spotlight: The "Automated Ethicist" Implementing Section 3.3.2 (Lagrangian Dual Optimization)
Here is the simplified logic implemented in the training loop: code Python class LagrangianPID: """ The 'Automated Ethicist' mechanism. Dynamically adjusts the penalty (Lambda) based on equity violations. """ def init(self, target_gini=0.15, lr_lambda=0.01): self.lambda_param = 1.0 # Initial penalty weight self.target_gini = target_gini self.lr = lr_lambda
def update(self, current_gini):
"""
Dual Gradient Ascent Step:
If Gini > Target: Increase Lambda (Make inequity expensive).
If Gini < Target: Decrease Lambda (Relax constraints).
"""
violation = current_gini - self.target_gini
# Update Lambda (projected to be non-negative)
self.lambda_param += self.lr * violation
self.lambda_param = max(0.0, self.lambda_param)
return self.lambda_param
def apply_penalty(self, rewards, gini_batch):
"""
Modifies the reward function seen by the PPO agent.
"""
# Reward = Clinical_Utility - (Lambda * Equity_Violation)
equity_penalty = self.lambda_param * max(0, gini_batch - self.target_gini)
return rewards - equity_penalty
This ensures that if the agent attempts to "game" the system by neglecting rural patients to maximize scores (The Mahlangu-Mahomane Effect), the parameter spikes, rendering that strategy mathematically suboptimal.
π Repository Structure code
Text
federated-equity-rl/ βββ agents/ β βββ ppo_client.py # Local Client Update (PPO) β βββ lagrangian.py # The Dual Optimization Logic (Sec 3.3.2) β βββ aggregator.py # Secure Aggregation (FedAvg) βββ env/ β βββ hemophilia_env.py # OpenAI Gym Environment (Hub-and-Spoke) β βββ patient_generator.py # Synthetic Data Generation (Appendix C) βββ models/ β βββ transformer_policy.py # Multi-Head Attention Policy βββ utils/ β βββ gini.py # Differentiable Gini calculation β βββ privacy.py # Differential Privacy (Gaussian Mechanism) βββ main.py # Entry point βββ requirements.txt
π Reproduction Results Running the simulation with the default seeds should yield results comparable to Table 1 in the manuscript:
Metric Unconstrained PPO Federated Equity-PPO (Ours) Total QALY Gain 1.25x (Baseline) 1.18x Gini (Inequity) 0.52 (High) 0.18 (Low) Rural Response 50 mins 25 mins Price of Fairness N/A 5.6%
π Citation If you use this code or the concepts in your research, please cite: code Bibtex @article{mahomane2025federated, title={Federated Reinforcement Learning for Equitable and Privacy-Preserving Healthcare Resource Allocation}, author={Mahomane, Ronald R. and Mahlangu, Johnny}, journal={Preprint submitted to The Lancet Digital Health}, year={2025}, month={December} }