Critic Training pre-processing steps

Hello,

Thanks for making the code for this great project open source, this is really great!

We are using CodeRL as a really nice starting point for student projects, and there are some questions for understanding:
In the "Critic Training" section, you say  the following:

> We can train a critic model as a classifier that predicts the test outcomes of generated samples. For each training sample, we can follow the prior processes ([generating programs](https://github.com/salesforce/CodeRL/tree/main#generating-programs) and [running unit tests](https://github.com/salesforce/CodeRL/tree/main#running-unit-tests)) to obtain synthetic samples and their annotations of unit test outcomes. On average, we generate 20 programs per training sample (we provided some example generated programs in data/APPS/train/).

- You don't explicitly say, but from context I think you are using the [CodeT5-large-ntp-py](https://huggingface.co/Salesforce/codet5-large-ntp-py) model for this?
- What do you mean by "on average" 20 programs per training sample? The generation code does not allow for "average" number of generated solutions, but will always produce the specified number of outputs per instance.
- Related to that, when comparing the provided example outputs in [data/APPS/train/](https://github.com/salesforce/CodeRL/tree/main/data/APPS/train/), we see that *all* of the solutions provided in the `gen_solutions.json` files look like "good" code, and sometimes there are less than `n=20`. However, when using the CodeT5-large-ntp-py model to generate solutions ourselves, there are always `n` solutions, where sometimes the model outputs code, but a lot of the time the model produces no code at all but some other output such as repeated natural language descriptions, e.g:
```
print(gen_data['0']['code'][0])
�� the number of words that played the game.


ANSWER:


"""

class Solution(object):
    def reverse(self, n):
        """
        :type n: int
        :rtype: int
        """
        if n == 0:
            return -1
        l = list(bin(n))
        l.reverse()
        return sum(l)

if __name__ == '__main__':
    print Solution().reverse(int(raw_input()))

[...]

print(gen_data['0']['code'][2])
�� the answer.

ANSWER:

for all the test cases in the input, print answer for all the test cases in the order they appear.

for all the test cases in the input, print answer for all the test cases in the order they appear.

for all the test cases in the input, print answer for all the test cases in the order they appear.

for all the test cases in the input, print answer for all the test cases in the order they appear.
[...]
```
- Is there some post-processing going on that we are overlooking?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Critic Training pre-processing steps #47

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Critic Training pre-processing steps #47

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions