change tokenizer to pad to 'longest' sequence, instead of 'max_length' (#669)

xgwang · xgw · NathanHB · web-flow · commit 88e3a3bc9af9 · 2025-04-22T13:17:50.000+02:00
otherwise, the response length is always 1 which is unexpected

Co-authored-by: xgw &lt;xinguang.wxg@alibaba-inc.com&gt;
Co-authored-by: Nathan Habib &lt;30601243+NathanHB@users.noreply.github.com&gt;
diff --git a/src/lighteval/models/transformers/transformers_model.py b/src/lighteval/models/transformers/transformers_model.py
@@ -578,7 +578,7 @@ def greedy_until(
                 tokenized = self.tokenizer(
                     context,
                     truncation="longest_first",  # we truncate to the model max length if needed
-                    padding="max_length",  # we pad to the longest sequence
+                    padding="longest",  # we pad to the longest sequence
                     return_tensors="pt",
                     max_length=max_context_continuation_size_allowed,  # we always allow minimum one token of generation
                     add_special_tokens=self.add_special_tokens,