Attention to all friends conducting experiments with versions of ARPO prior to 8.31!
I just fixed a bug that might have caused the tool to malfunction when performing deep search tasks in previous versions.
https://github.com/RUC-NLPIR/ARPO/blob/main/ARPO/scripts/config/ppo_trainer_dr.yaml
You just need to update this YAML file to the latest version; no other code changes are required.