In your paper, commitpack using following format to train:
Question: <commit_before>xxx<commit_msg>
Answer: <commit_after>xxx
but in codegeex2's vocabulary, no special token like <commit_before> <commit_msg> added. I download the checkpoint of octogeex and using this format predict, the answer is wrong.
can you explain more specifily about how you transfer commitpack_ft and oasst to finetune data format?
(what's the input and what's the output)
Thanks