-
Notifications
You must be signed in to change notification settings - Fork 68
Description
Hi, thanks for open-sourcing your amazing work!
I have been trying to reproduce the RL fine-tuned results reported in the paper, but unfortunately, I am encountering some issues. Here is a brief overview of the steps I followed:
-
Fine-tuned the actor model with CE loss for 10 epochs with
train_actor.sh
and the CodeT5-NTP model. This fine-tuned model gives similar results to the paper (2.86 pass@5 compared to 2.90 in the paper) -
With some modifications to
generate.py
, generated 20 candidate samples per problem (following the sample files given in the repo) and greedy baseline codes for the training set with the CE fine-tuned model. Theresult
key required for the correspondinggen_solutions.json
andbaseline_solutions.json
was generated with this snippet. -
Generated the token level hidden states/critic scores with the released critic model through
generate_critic_scores.sh
. -
RL-finetuning with the default hyperparameters present in
train_actor_rl.sh
, the RL-finetuned model gives very degraded results. (0.84 pass@5)
I would greatly appreciate any suggestions you may have on hyperparameter choices or other settings that could help me reproduce the RL-finetuned results accurately.
Many thanks!