Skip to content

[BUG] GLM 4.6 Turbo via Chutes doesn't work because of incorrect max output token count #8821

@enerage

Description

@enerage

Problem (one or two sentences)

GLM 4.6 Turbo via Chutes doesn't work because of the incorrect max output token count. I think we should add 20% of 200k, which is 40k, as the max output token count, in order to start working correctly.

Context (who is affected and when)

Everyone who tries to use GLM 4.6 Turbo via Chutes provider

Reproduction steps

  1. Create a new API configuration with GLM 4.6 turbo model via Chutes Provider
  2. Test a sample message
  3. Expect error similar to the following "Requested token count exceeds the model's maximum context length of 202752 tokens. You requested a total of 233093 tokens: 30341 tokens from the input messages and 202752 tokens for the completion. Please reduce the number of tokens in the input messages or the completion to fit within the limit."

Expected result

The model will be outputting messages correctly

Actual result

We receive error because of token count exceeding

Variations tried (optional)

No response

App Version

3.29.2

API Provider (optional)

Chutes AI

Model Used (optional)

GLM-4.6-turbo

Roo Code Task Links (optional)

No response

Relevant logs or errors (optional)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Issue/PR - TriageNew issue. Needs quick review to confirm validity and assign labels.bugSomething isn't working

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions