Skip to content

What is the training data format of commitpack-ft and oasst when finetune codegeex2?Β #9

@sxthunder

Description

@sxthunder

In your paper, commitpack using following format to train:
Question: <commit_before>xxx<commit_msg>
Answer: <commit_after>xxx

but in codegeex2's vocabulary, no special token like <commit_before> <commit_msg> added. I download the checkpoint of octogeex and using this format predict, the answer is wrong.

can you explain more specifily about how you transfer commitpack_ft and oasst to finetune data format?
(what's the input and what's the output)

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions