Skip to content

Chat with content: hitting TPM limits very fast, even with quite basic static content (R markdown) - can chunking be used #295

@FMKerckhof

Description

@FMKerckhof

Dear,

I have deployed "chat with content" and I am trying to use it on (quite minimal) R markdown reports. I have set "chat with content" up to use OpenAI - I tried multiple iterations with gpt-4.1-mini, gpt-5-mini, ... but I am always receiving an error: Request too large for gpt-4.1-long-context in organization <my-org> on tokens per min (TPM): Limit 500000, Requested <number larger than 500k>. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.

I am a bit surprised about the number of tokens that even relatively basic R markdown reports take up, but I am wondering:

  • Why, regardless of what I specify under CHATLAS_CHAT_ARGS, do I always see gpt-4.1-long-context in the error message?
  • Is there a way to implement some kind of RAG "chunking" to avoid exceeding rate limits?

Thanks in advence,

FM Kerckhof

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions