Replies: 1 comment 6 replies
-
@ehartford has solved this. Do you mind sharing how you did so? |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I prepared a dataset to train a Qwen3-32B model with supervised fine-tuning (SFT) with full parameters training. During dataset preparation, I realized the
<think>
token is masked in loss calculation.By examining the reason on the source code, I found the code snippet that determines the label of the tokens, which is the
find_turn
insrc/axolotl/prompt_strategies/chat_template.py
:where
dummy_ids
andfull_ids
are the tokenized contents ofrespectively.
The tokenized
turns_with_empty
andturns_with_content
have the following formats:turns_with_empty:
turns_with_content:
As we can see, Qwen3's chate template will add empty
<think> </think>
to the dummy message. If we follow the code implementation infind_turn
, the<think>
will be omitted with label-100
. As a result, when we train a model with SFT, the<think>
token will not be included in the loss calculation.After fine-tuning my model with SFT, the fine-tuned model will not output the
<think>
token anymore. Weridly, it only happens in full parameters training setting. If I train the model with LoRA or QLoRA, the model can still output the<think>
token, even the<think>
is masked as well. I would like to know if it is a feature or a bug?Beta Was this translation helpful? Give feedback.
All reactions