Model-based reinforcement learning (MBRL) methods, particularly those based on model predictive control (MPC), leverage environment models to pre-plan actions before execution. MPC optimizes a scoring function to determine optimal actions. However, incorporating global information in the scoring function accelerates learning but introduces variance. This project addresses this issue by proposing the use of the sum of advantage functions (Sum-Advantage) as a scoring function, contrasting with the previously employed sum of state-action values (Sum-Value). Our experiments on one Gym environment show that...
Follow these steps to set up the project on your local machine.
You can install mujoco using this guideline.
Clone the Repository and navigate to the code directory:
git clone https://github.com/re0078/advantage_sum_mbrl.git
cd advantage_sum_mbrlCreate a Virtual Environment:
virtualenv --python=<path-to-python-3.8.18> venv
source venv/bin/activate
# or
conda create --name venv --python=python3.8.18
conda activate venvInstalling requirements:
pip install -r requirements.txt --force-reinstallRun the Main Experiment Script:
python main.py --env_id Hopper-v3 --instance_number [inst_num] --scoring_method [advantage|value]You can view the logs in logs/<env_id>/<instance_number>/<scoring_method>/logs.txt.
Also, the rewards and the saved models are stored in checkpoints/<env_id>/<instance_number>/<scoring_method>.
You can modify hyperparameters in main.py for further exploration.
- If you face any issues while cythonizing mujoco_py module you can use this command:
python3.8.18 -m pip install "cython<3" - Make sure you set the
LD_LIBRARY_PATHenvironment variable to your mojuco bin files before running the main python script:
export LD_LIBRARY_PATH=<path-to-mujoco>/.mujoco/mujoco210/bin- Amir Noohian
- Alireza Isavand
- Reza Abdollahzadeh
We would like to express our gratitude to Prof. Machado for his advice and guidance throughout the development of this project.