Add `logits_to_keep` to many older CausalLM models #41335

philiproeleveld · 2025-10-03T21:46:46Z

What does this PR do?

Adds logits_to_keep to many (older) ForCausalLM models that inherit from GenerationMixin.
Also consistently renames to loss and logits, and removes some code for float casting and mapping labels to the logits' device for models where that is already handled by the loss function (e.g. gpt_neo).

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR. @Rocketknight1

philiproeleveld · 2025-10-03T22:12:58Z

I was wondering if I could also add this to the many seq2seq models that also inherit from GenerationMixin? T5 for example. In theory they would benefit when the user provides a very large decoder_input_ids, but that's not really what they're designed for...

github-actions · 2025-10-03T22:31:06Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: bart, bert, bert_generation, big_bird, bigbird_pegasus, biogpt, blenderbot, blenderbot_small, blip, bloom, camembert, chameleon, codegen, cpmant, ctrl, data2vec

Rocketknight1

Mostly looks good, with one comment that applies to a couple of models!

This is a fairly big change that standardizes a lot of older models with the modern API so cc core maintainers @ArthurZucker @Cyrilvallez

Rocketknight1 · 2025-10-06T14:00:52Z

src/transformers/models/xlm_roberta/modeling_xlm_roberta.py

-                vocab_size=self.config.vocab_size,
-                **kwargs,
-            )
+            loss = self.loss_function(logits=logits, labels=labels, vocab_size=self.config.vocab_size, **kwargs)


Some of these cases slightly change other behaviour (e.g. not moving labels to logits.device). Have you checked that this is equivalent?

philiproeleveld added 4 commits October 3, 2025 23:19

Add logits_to_keep to CausalLM models

6dd9515

Copied wrong in bert_generation

39fb202

Copied wrong in modeling_git

cb1bdb3

Copied wrong in xlnet

903fbca

Copied wrong in megatron_bert

99640d2

Rocketknight1 mentioned this pull request Oct 6, 2025

Add logits_to_keep parameter to BertForSequenceClassification #41369

Closed

Rocketknight1 reviewed Oct 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add `logits_to_keep` to many older CausalLM models #41335

Add `logits_to_keep` to many older CausalLM models #41335

philiproeleveld commented Oct 3, 2025

Uh oh!

philiproeleveld commented Oct 3, 2025

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

Rocketknight1 left a comment •

edited

Loading

Uh oh!

Rocketknight1 Oct 6, 2025

Uh oh!

Uh oh!

Add logits_to_keep to many older CausalLM models #41335

Are you sure you want to change the base?

Add logits_to_keep to many older CausalLM models #41335

Conversation

philiproeleveld commented Oct 3, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

philiproeleveld commented Oct 3, 2025

Uh oh!

github-actions bot commented Oct 3, 2025

Uh oh!

Rocketknight1 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Rocketknight1 Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Add `logits_to_keep` to many older CausalLM models #41335

Add `logits_to_keep` to many older CausalLM models #41335

Rocketknight1 left a comment •

edited

Loading