tokenizer.chat_template
中 special tokens 无法被 ChatGLMTokenizer 正确切分
#725
-
看到这里新增了对 举例: messages = [
{
"role": "user",
"content": "用户1"
},
{
"role": "assistant",
"content": "助手1"
},
{
"role": "user",
"content": "用户2"
}
]
model_inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True)
# [790, 30927, 30944, 2080, 30984, 30996, 30917, 404, 31002, 31007, 4865, 31007, 30994, 30910, 13, 30910, 32053, 30939, 31002, 31007, 530, 18971, 31007, 30994, 30910, 13, 30910, 42481, 30939, 31002, 31007, 4865, 31007, 30994, 30910, 13, 30910, 32053, 30943, 31002, 31007, 530, 18971, 31007, 30994]
model_input_tokens = tokenizer.convert_ids_to_tokens(model_inputs)
# ['▁[', 'g', 'M', 'AS', 'K', ']', 's', 'op', '<', '|', 'user', '|', '>', '▁', '<0x0A>', '▁', '用户', '1', '<', '|', 'ass', 'istant', '|', '>', '▁', '<0x0A>', '▁', '助手', '1', '<', '|', 'user', '|', '>', '▁', '<0x0A>', '▁', '用户', '2', '<', '|', 'ass', 'istant', '|', '>'] |
Beta Was this translation helpful? Give feedback.
Answered by
zRzRzRzRzRzRzR
Jan 16, 2024
Replies: 1 comment 8 replies
-
你没有防注入 |
Beta Was this translation helpful? Give feedback.
8 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
尝试这么写
对应的huggingface代码会进行更新
正常运行你将会得到这个反馈