Token Limit Error When Using Custom Qwen Model #6252

HAL9990 · 2025-03-09T09:04:42Z

HAL9990
Mar 9, 2025

What happened?

When utilizing the custom Qwen model, whether set as a custom endpoint in YAML or included in the models list compatible with OpenAI, an error occurs when starting a new session and sending a "hi" message. The error message indicates that the token count of the latest message exceeds the limit: "Token count for the latest message is too long, exceeding the limit (4959 / 4095)." This issue did not exist in the previous version released approximately three months ago and only began occurring after the recent update yesterday.

Please let me know if you need any modifications or additional details!

Here is my console's error:

{
    "sender": "Qwen",
    "messageId": "ae7fa2ba-b56f-4ac3-92b2-b1b6fa06517a",
    "conversationId": "388b6894-b060-4e5f-9836-100a93590520",
    "parentMessageId": "d3b33663-b70e-4d9b-a8fe-2112ab2b78e5",
    "unfinished": false,
    "error": true,
    "final": true,
    "text": "{ \"type\": \"INPUT_LENGTH\", \"info\": \"4959 / 4095\" }",
    "isCreatedByUser": false
}

Here is my yaml:

  custom:
    - name: "Qwen"
      apiKey: "${QWEN_API_KEY}"
      baseURL: "${QWEN_REVERSE_PROXY}"
      models:
        default: ["qwen-max", "qwen-plus", "qwen-turbo", "Qwen2.5-72B-Instruct", "Qwen2.5-32B-Instruct", "Qwen2.5-14B-Instruct"]
        fetch: false # fetching list of models is not supported
      titleConvo: true
      titleModel: "qwen-turbo"
      summarize: false
      summaryModel: "qwen-turbo"
      forcePrompt: false
      modelDisplayLabel: "Qwen"

Version Information

ghcr.io/danny-avila/librechat-dev latest 59660cbf7aa2 19 hours ago 872MB
ghcr.io/danny-avila/librechat-rag-api-dev-lite latest 6550e7ddf180 42 hours ago 1.3GB

Steps to Reproduce

Update the software to the latest version.
Ensure that the custom Qwen model is configured correctly, either as a custom endpoint in YAML or listed among the OpenAI models.
Open the application and create a new session.
Send a message containing the text "hi".
Observe the error message that appears indicating the token count exceeds the limit: "Token count for the latest message is too long, exceeding the limit (4959 / 4095)."

What browsers are you seeing the problem on?

No response

Relevant log output

Screenshots

Code of Conduct

I agree to follow this project's Code of Conduct

xoChrisCo · 2025-03-10T21:03:52Z

xoChrisCo
Mar 10, 2025

I seem to be getting the same for Mistral codestral-latest as well:
The latest message token count is too long, exceeding the token limit (4959 / 4095 respectively). Please shorten your message, adjust the max context size from the conversation parameters, or fork the conversation to continue.

0 replies

danny-avila · 2025-03-10T21:16:20Z

danny-avila
Mar 10, 2025
Maintainer

This happens when the context window of the model you are using is not recognized (defaults to 4095). This will be alleviated with the following this week:

#1633

I will also add codestral, and qwen models to the default config.

For now, you can also specify the max context tokens via agent/preset.

0 replies

mapleroyal · 2025-08-03T19:53:31Z

mapleroyal
Aug 3, 2025

I am also experiencing this issue in librechat when using qwen3-30b-a3b-thinking-2507-mlx via LM Studio.

The latest message token count is too long, exceeding the token limit, or your token limit parameters are misconfigured, adversely affecting the context window. More info: 4013 / 3686. Please shorten your message, adjust the max context size from the conversation parameters, or fork the conversation to continue.

No issues with shorter prompts.
No issues with the same prompt directly in LM Studio.

The relevant part from my librechat.yaml:

- name: 'LM Studio'
  apiKey: 'not-needed'
  baseURL: 'http://host.docker.internal:1234/v1'
  models:
    default: ['qwen3-30b-a3b-thinking-2507-mlx']
    fetch: true
  titleConvo: true
  titleModel: 'deepseek/deepseek-chat-v3-0324:free'
  modelDisplayLabel: 'LM Studio'

Just installed LibreChat yesterday via docker.

0 replies

hopeseekr · 2025-08-06T22:16:38Z

hopeseekr
Aug 6, 2025

I'm having it on qwen3-coder and gpt-oss... very small limit. can't fit a 450-line JS file...

The latest message token count is too long, exceeding the token limit, or your token limit parameters are misconfigured, adversely affecting the context window. More info: 3873 / 3686.

3686 tokens is very small.

0 replies

mapleroyal · 2025-08-07T21:59:37Z

mapleroyal
Aug 7, 2025

Have there been any updates on this?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Token Limit Error When Using Custom Qwen Model #6252

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 5 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Token Limit Error When Using Custom Qwen Model #6252

Uh oh!

Uh oh!

HAL9990 Mar 9, 2025

What happened?

Version Information

Steps to Reproduce

What browsers are you seeing the problem on?

Relevant log output

Screenshots

Code of Conduct

Replies: 5 comments

Uh oh!

Uh oh!

xoChrisCo Mar 10, 2025

Uh oh!

Uh oh!

danny-avila Mar 10, 2025 Maintainer

Uh oh!

mapleroyal Aug 3, 2025

Uh oh!

hopeseekr Aug 6, 2025

Uh oh!

mapleroyal Aug 7, 2025

HAL9990
Mar 9, 2025

xoChrisCo
Mar 10, 2025

danny-avila
Mar 10, 2025
Maintainer

mapleroyal
Aug 3, 2025

hopeseekr
Aug 6, 2025

mapleroyal
Aug 7, 2025