Replies: 1 comment 2 replies
-
The loss logic is handled by the labels. For inputs, the label token ids are set to -100 and this prevents it from being included in the loss. In multi turn conversations, the entire conversation is handled as a single mini-batch, rather than creating multiple mini-batches for each assistant turn in order to improve training efficiency. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I'm trying to better understand how loss is calculated in multi-turn conversations when using Axolotl.
Assuming I set train_on_inputs: false, is the loss computed only on the responses, while the full context is always passed as input throughout the conversation?
For example, given this dialogue:
Would the first loss be calculated on "hello" with the input:
And the second loss on "good" with the input:
Could you point me to where this loss logic is implemented in case of multi-turn conversation?
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions