Skip to content

[BUG] LiteLLM reports wrong output token count (max_tokens vs max_output_tokens)Β #8454

@fabb

Description

@fabb

Problem (one or two sentences)

When trying to use Sonnet 4.5 via LiteLLM via Google Vertex, this error shows up:

LiteLLM streaming error: 400 litellm.BadRequestError: VertexAIException BadRequestError - b'{"type":"error","error":{"type":"invalid_request_error","message":"max_tokens: 200000 > 64000, which is the maximum allowed number of output tokens for claude-sonnet-4-5-20250929"},"request_id":"req_vrtx_011CTeGWyomNL2s6LacBN6w5"}'. Received Model Group=claude-sonnet-4-5

The issue seems to be a confusion of max_tokens and max_output_tokens:

Image Image

The problem is most likely in this line:

maxTokens: modelInfo.max_tokens || 8192,

I think in this line, max_output_tokens should be used if available, and max_tokens only as fallback.

Context (who is affected and when)

LiteLLM with Sonnet 4.5 on Google Vertex

Reproduction steps

  1. Add Sonnet 4.5 via LiteLLM via Google Vertex

Expected result

Prompts should work

Actual result

Prompts fail, because the requests ask for 200k output tokens, where the maximum is 64k

Variations tried (optional)

No response

App Version

3.28.14

API Provider (optional)

LiteLLM

Model Used (optional)

Sonnet 4.5 via Google Vertex

Roo Code Task Links (optional)

No response

Relevant logs or errors (optional)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Issue - In ProgressSomeone is actively working on this. Should link to a PR soon.bugSomething isn't working

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions