Skip to content

Unexpected generated sentences when using llama #36

@xiaxin1998

Description

@xiaxin1998

Hi,
I used the codes in this repo to finetune open llama model, to reduce the finetuning time, when I generate dataset, I only use one prompt for training, valadation and test set on Beauty. I use random indexing and use the original setting in your repo. And then when I evaluate, I found that the generated output sequences are full of unexpected chracters, like '(@*$^)(*Y(8'. And also, when I want to use the codes to finetune other llama series model, the generated sentences become to be full of '!' .
Can anyone give a hint about this? Is this the problem od tokenizer?

Thanks

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions