Question about Loss Calculation in Multi-Turn Conversations #2439

Ofir408 · 2025-03-23T13:51:30Z

Ofir408
Mar 23, 2025

Hi, I'm trying to better understand how loss is calculated in multi-turn conversations when using Axolotl.

Assuming I set train_on_inputs: false, is the loss computed only on the responses, while the full context is always passed as input throughout the conversation?

For example, given this dialogue:

customer: hello  
agent: hello, how are you?  
customer: fine, how about you?  
agent: good

Would the first loss be calculated on "hello" with the input:

customer: hello

And the second loss on "good" with the input:

customer: hello  
agent: hello, how are you?  
customer: fine, how about you?

Could you point me to where this loss logic is implemented in case of multi-turn conversation?

Thanks in advance!

winglian · 2025-03-23T14:27:13Z

winglian
Mar 23, 2025
Maintainer

The loss logic is handled by the labels. For inputs, the label token ids are set to -100 and this prevents it from being included in the loss. In multi turn conversations, the entire conversation is handled as a single mini-batch, rather than creating multiple mini-batches for each assistant turn in order to improve training efficiency.

2 replies

Ofir408 Mar 23, 2025
Author

@winglian Thank you for the explanation. Could you please point me to the relevant parts of the code? It's very interesting

NanoCode012 Mar 23, 2025
Maintainer

For setting labels to -100, it would depend on which processing method you use.

For ex: in chat_template, you can see our calculation here. We start with all labels to ignored then set it to input_ids for those we want to train on.

axolotl/src/axolotl/prompt_strategies/chat_template.py

Line 320 in a9b0733

labels = [IGNORE_TOKEN_ID] * len(input_ids)

However, about the calculation for loss, you can see how transformers pass in the ignored_index to the cross entropy calculation for loss https://github.com/huggingface/transformers/blob/c9d1e5238a752813ba91a8751a638a09b5efbb73/src/transformers/loss/loss_utils.py#L27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Question about Loss Calculation in Multi-Turn Conversations #2439

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Question about Loss Calculation in Multi-Turn Conversations #2439

Uh oh!

Ofir408 Mar 23, 2025

Replies: 1 comment · 2 replies

Uh oh!

winglian Mar 23, 2025 Maintainer

Uh oh!

Ofir408 Mar 23, 2025 Author

Uh oh!

NanoCode012 Mar 23, 2025 Maintainer

Ofir408
Mar 23, 2025

Replies: 1 comment 2 replies

winglian
Mar 23, 2025
Maintainer

Ofir408 Mar 23, 2025
Author

NanoCode012 Mar 23, 2025
Maintainer