Skip to content

Problems in reproducing the RL fine-tuned results Β #30

@abhik1505040

Description

@abhik1505040

Hi, thanks for open-sourcing your amazing work!

I have been trying to reproduce the RL fine-tuned results reported in the paper, but unfortunately, I am encountering some issues. Here is a brief overview of the steps I followed:

  • Fine-tuned the actor model with CE loss for 10 epochs with train_actor.sh and the CodeT5-NTP model. This fine-tuned model gives similar results to the paper (2.86 pass@5 compared to 2.90 in the paper)

  • With some modifications to generate.py, generated 20 candidate samples per problem (following the sample files given in the repo) and greedy baseline codes for the training set with the CE fine-tuned model. The result key required for the corresponding gen_solutions.json and baseline_solutions.json was generated with this snippet.

  • Generated the token level hidden states/critic scores with the released critic model through generate_critic_scores.sh.

  • RL-finetuning with the default hyperparameters present in train_actor_rl.sh, the RL-finetuned model gives very degraded results. (0.84 pass@5)

I would greatly appreciate any suggestions you may have on hyperparameter choices or other settings that could help me reproduce the RL-finetuned results accurately.

Many thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions