Chat with content: hitting TPM limits very fast, even with quite basic static content (R markdown) - can chunking be used

Dear,

I have deployed "chat with content" and I am trying to use it on (quite minimal) R markdown reports. I have set "chat with content" up to use OpenAI - I tried multiple iterations with gpt-4.1-mini, gpt-5-mini, ... but I am always receiving an error: `Request too large for gpt-4.1-long-context in organization <my-org> on tokens per min (TPM): Limit 500000, Requested <number larger than 500k>. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}`.

I am a bit surprised about the number of tokens that even relatively basic R markdown reports take up, but I am wondering:
- Why, regardless of what I specify under `CHATLAS_CHAT_ARGS`, do I always see `gpt-4.1-long-context` in the error message?
- Is there a way to implement some kind of RAG "chunking" to avoid exceeding rate limits?

Thanks in advence,

FM Kerckhof

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Chat with content: hitting TPM limits very fast, even with quite basic static content (R markdown) - can chunking be used #295

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Chat with content: hitting TPM limits very fast, even with quite basic static content (R markdown) - can chunking be used #295

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions