Added litellm model config options and improved `_prepare_max_new_tokens` by rolshoven · Pull Request #967 · huggingface/lighteval

rolshoven · 2025-09-16T11:55:44Z

Background

See Issue #966:

Currently, there are a few things that are note exposed through the LiteLLMModelConfig that can be very useful when running evaluations:

It would be nice to have a verbose flag, in case you want to debug something related to litellm

If one knows the maximum context length of your model, it would be nice if we could set that instead of relying on the default length of 4096 that is currently hardcoded in the max_length property

Different APIs can differ in their robustness, and they might have different rate limits. It would be nice to configure the number of retries that are performed when calling the API, as well as the waiting time in between requests, and maybe a timeout if a request takes too long.

Additionally, it would be nice to apply the current strategy for the o1 model in _prepare_max_new_tokens to other reasoning models as well.

Changes in this PR

This PR introduces the following new options in the LiteLLMModelConfig:

"""
(...)
verbose (bool):
    Whether to enable verbose logging. Default is False.
max_model_length (int | None):
    Maximum context length for the model. If None, infers the model's default max length.
api_max_retry (int):
    Maximum number of retries for API requests. Default is 8.
api_retry_sleep (float):
    Initial sleep time (in seconds) between retries. Default is 1.0.
api_retry_multiplier (float):
    Multiplier for increasing sleep time between retries. Default is 2.0.
timeout (float):
    Request timeout in seconds. Default is None (no timeout).
(...)
"""

The increase in the allowed number of tokens (see _prepare_max_new_tokens) is now calculated for all models that are recognized as reasoning models by litellm (as indicated by their supports_reasoning function). Instead of having hardcoded upper bounds, we use litellm's get_max_tokens helper function, or, if this fails, we query the maximum context length from different endpoints on OpenRouter. If the specified provider is present in that list, we get the information right from OpenRouter. Otherwise, we will choose the minimum context length among all OpenRouter providers to ensure that it works at least with all providers listed there. If this also fails, we will return the default context length of 4096, the same one as currently hardcoded.

In order to use the suggest_reasoning function of litellm, I had to update the minimum required version of litellm in the pyproject.toml file to 1.66.0.

src/lighteval/models/endpoints/litellm_model.py

HuggingFaceDocBuilderDev · 2025-09-17T07:15:57Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

NathanHB · 2025-09-23T08:08:59Z

src/lighteval/models/endpoints/litellm_model.py

                    response = litellm.completion(**kwargs)
+                    content = response.choices[0].message.content
+
+                if content and "<think>" in content:


this is handled but the remove thinking tag option in the cli

Oh I see! Then I guess this happens outside the model classes, should I just remove that from litellm_model.py then?

yeah ! you can also make sure that when you use --remove_thinking_tags it work as expected :)

I removed it, however I was unable to reproduce the case where the reasoning traces are in the output of the model because the reasoning is actually saved under the reasonings attribute in the ModelResponse as defined on lines 365-374 here.

I did however verify (using a breakpoint in my debugging config) that remove_reasoning_tags is executed as part of _post_process_outputs in the Pipeline (by default --remove-reasoning-tags is set to True). So I think it is safe to remove the code that you mentioned, and to assume that the stripping of reasoning content should work if at some point there is actual reasoning content in the text attribute of ModelResponse.

src/lighteval/models/endpoints/litellm_model.py

NathanHB

looks good ! only a few questions and good to emrge

…for reasoning models more general

Co-authored-by: Nathan Habib <30601243+NathanHB@users.noreply.github.com>

NathanHB reviewed Sep 16, 2025

View reviewed changes

src/lighteval/models/endpoints/litellm_model.py Outdated Show resolved Hide resolved

NathanHB added the feature label Sep 17, 2025

NathanHB added enhancement and removed feature labels Sep 23, 2025

NathanHB reviewed Sep 23, 2025

View reviewed changes

src/lighteval/models/endpoints/litellm_model.py Show resolved Hide resolved

NathanHB reviewed Sep 23, 2025

View reviewed changes

rolshoven and others added 3 commits September 26, 2025 09:55

Added litellm model config options and made increase of max_tokens …

b488a84

…for reasoning models more general

Updated type hint for timeout model config option

086bd3e

Co-authored-by: Nathan Habib <30601243+NathanHB@users.noreply.github.com>

Removed redundant reasoning tag stripping from litellm model

3b6101d

rolshoven force-pushed the litellm_model_changes branch from 1f36913 to 3b6101d Compare September 26, 2025 15:57

Merge branch 'main' into litellm_model_changes

f945c24

NathanHB approved these changes Oct 6, 2025

View reviewed changes

NathanHB merged commit e98e463 into huggingface:main Oct 6, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added litellm model config options and improved `_prepare_max_new_tokens`#967

Added litellm model config options and improved `_prepare_max_new_tokens`#967
NathanHB merged 4 commits intohuggingface:mainfrom
rolshoven:litellm_model_changes

rolshoven commented Sep 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Sep 17, 2025

Uh oh!

NathanHB Sep 23, 2025

Uh oh!

rolshoven Sep 23, 2025

Uh oh!

NathanHB Sep 23, 2025

Uh oh!

rolshoven Sep 26, 2025

Uh oh!

Uh oh!

NathanHB left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rolshoven commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Changes in this PR

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Sep 17, 2025

Uh oh!

NathanHB Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

rolshoven Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

NathanHB Sep 23, 2025

Choose a reason for hiding this comment

Uh oh!

rolshoven Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

NathanHB left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rolshoven commented Sep 16, 2025 •

edited

Loading