❓ Question
Hi,
I’ve been adapting your code for PPO hyperparameter optimization for my custom environment and I have a question regarding the evaluation metric used.
In exp_manager.py, on line 810, I noticed that the optimization objective is defined using:
reward = eval_callback.last_mean_reward
This means that only the last evaluation is used to determine if the current trial is the best one. I was wondering if there’s a specific reason for this approach. Would you consider using:
'reward = eval_callback.best_mean_reward'
instead?
Checklist
❓ Question
Hi,
I’ve been adapting your code for PPO hyperparameter optimization for my custom environment and I have a question regarding the evaluation metric used.
In exp_manager.py, on line 810, I noticed that the optimization objective is defined using:
reward = eval_callback.last_mean_rewardThis means that only the last evaluation is used to determine if the current trial is the best one. I was wondering if there’s a specific reason for this approach. Would you consider using:
'reward = eval_callback.best_mean_reward'
instead?
Checklist