Dear,
I have deployed "chat with content" and I am trying to use it on (quite minimal) R markdown reports. I have set "chat with content" up to use OpenAI - I tried multiple iterations with gpt-4.1-mini, gpt-5-mini, ... but I am always receiving an error: Request too large for gpt-4.1-long-context in organization <my-org> on tokens per min (TPM): Limit 500000, Requested <number larger than 500k>. The input or output tokens must be reduced in order to run successfully. Visit https://platform.openai.com/account/rate-limits to learn more.', 'type': 'tokens', 'param': None, 'code': 'rate_limit_exceeded'}}.
I am a bit surprised about the number of tokens that even relatively basic R markdown reports take up, but I am wondering:
- Why, regardless of what I specify under
CHATLAS_CHAT_ARGS, do I always see gpt-4.1-long-context in the error message?
- Is there a way to implement some kind of RAG "chunking" to avoid exceeding rate limits?
Thanks in advence,
FM Kerckhof