This repository includes an offline RL algorithm, CORMPO, and a medical environment for RL evaluation. CORMPO addresses out-of-distribution (OOD) challenges in offline reinforcement learning by incorporating clinical domain knowledge and regularization techniques for safer policy optimization.
Install all required dependencies:
pip install -r requirements.txtSee the README in the abiomed_env folder for environment implementation details and example scripts for using the environment.
Train CORMPO with WS penalty on noiseless synthetic dataset:
python cormpo/mbpo_kde/mopo.py --config cormpo/config/noiseless_synthetic/mbpo_kde_ws.yamlon noiseless synthetic dataset:
python cormpo/mbpo_kde/mopo.py --config cormpo/config/noisy_synthetic/mbpo_kde.yamlEvaluate a saved policy trained on noisy synthetic dataset:
python cormpo/helpers/evaluate.py --config cormpo/config/evaluate/noisy/cormpo.yaml --policy_path "checkpoints/policy/noisy_synthetic/policy_abiomed.pth"To evaluate the policy trained on noiseless dataset, change policy_path to:
--policy_path "checkpoints/policy/noiseless_synthetic/policy_abiomed.pth"- The implementation of MOPO and MBPO-KDE is built largely on this implementation of MOPO algorithm: https://github.com/junming-yang/mopo