Problems in reproducing the RL fine-tuned results 

Hi, thanks for open-sourcing your amazing work! 

I have been trying to reproduce the RL fine-tuned results reported in the paper, but unfortunately, I am encountering some issues. Here is a brief overview of the steps I followed:

-  Fine-tuned the actor model with CE loss for 10 epochs with `train_actor.sh` and the [CodeT5-NTP](https://huggingface.co/Salesforce/codet5-large-ntp-py) model. This fine-tuned model gives similar results to the paper (**2.86** _pass@5_ compared to **2.90** in the paper)

- With some modifications to `generate.py`, generated 20 candidate samples per problem (following the sample files given in the repo) and greedy baseline codes for the training set with the CE fine-tuned model. The `result` key required for the corresponding `gen_solutions.json` and `baseline_solutions.json` was generated with [this snippet](https://github.com/abhik1505040/CodeRL/blob/045b98a09af53bb4d988ad201c39dda5af943a18/generate.py#L90-L120).

- Generated the token level hidden states/critic scores with the released critic model through `generate_critic_scores.sh`.

- RL-finetuning with the default hyperparameters present in `train_actor_rl.sh`, the RL-finetuned model gives very degraded results. (**0.84** _pass@5_)

I would greatly appreciate any suggestions you may have on hyperparameter choices or other settings that could help me reproduce the RL-finetuned results accurately.

Many thanks! 



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Problems in reproducing the RL fine-tuned results #30

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problems in reproducing the RL fine-tuned results #30

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions