-
Notifications
You must be signed in to change notification settings - Fork 10
Open
Description
В скрипте для smpo чат темплейт накидывается на chosen и rejected:
effective_llm_alignment/scripts/model_training/smpo.py
Lines 88 to 92 in a03cad4
| def apply_chat_templates(row): | |
| row["prompt"] = tokenizer.apply_chat_template(row["prompt"], tokenize=False) | |
| row["chosen"] = tokenizer.apply_chat_template(row["chosen"], tokenize=False) | |
| row["rejected"] = tokenizer.apply_chat_template(row["rejected"], tokenize=False) | |
| return row |
Если датасет в формате
{"id": 0, "prompt": [{"content": "", "role": "user"}], "chosen": [{"content": "", "role": "assistant"}], "rejected": [{"content": "", "role": "assistant"}]}
То, например, при использовании токенайзера Qwen 2.5 системное сообщение навесится и в chosen и в rejected:
'prompt': '<|im_start|>system\n...<|im_end|>\n<|im_start|>user\n...<|im_end|>\n'
'chosen': '<|im_start|>system\n...<|im_end|>\n<|im_start|>assistant\n...<|im_end|>\n'
'rejected': '<|im_start|>system\n...<|im_end|>\n<|im_start|>assistant\n...<|im_end|>\n'
И далее это летит в tokenize_row, где системные токены попадают в answer_input_ids
effective_llm_alignment/src/trainers/smpo_trainer.py
Lines 342 to 360 in a03cad4
| def build_tokenized_answer(self, prompt, answer): | |
| """ | |
| Llama tokenizer does satisfy `enc(a + b) = enc(a) + enc(b)`. | |
| It does ensure `enc(a + b) = enc(a) + enc(a + b)[len(enc(a)):]`. | |
| Reference: | |
| https://github.com/EleutherAI/lm-evaluation-harness/pull/531#issuecomment-1595586257 | |
| """ | |
| full_tokenized = self.processing_class( | |
| prompt + answer, add_special_tokens=False | |
| ) | |
| prompt_input_ids = self.processing_class(prompt, add_special_tokens=False)[ | |
| "input_ids" | |
| ] | |
| answer_input_ids = full_tokenized["input_ids"][len(prompt_input_ids) :] | |
| answer_attention_mask = full_tokenized["attention_mask"][ | |
| len(prompt_input_ids) : | |
| ] |
Т.е. обучение происходит на этих системных токенах в том числе.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels