Using GPT-4o with RAG results in error "type": "INPUT_LENGTH", "info": "NaN / 127500" #7708

paulfields · 2025-06-03T15:47:36Z

paulfields
Jun 3, 2025

What happened?

I've enabled RAG in my local deployment using a FastAPI service in a custom main.py script under rag-api/app that implements an /embed endpoint using my own deployment of Azure OpenAI’s text-embedding-ada-002. It inserting embeddings from a short document I've created directly into postgres and matching metadata into MongoDB. I see the indexes as available resources on the right hand side of Librechat's UI.

I have also enabled my own deployment of Azure OpenAI GPT-4o which I'm able to use for chat, BUT when I drag on of my indexes into chat prompt and execute a (short) prompt I get the following error in the UI

"The latest message token count is too long, exceeding the token limit, or your token limit parameters are misconfigured, adversely affecting the context window. More info: NaN 127500. . .

In the log output I get the following

error: [AskController] Error handling request { "type": "INPUT_LENGTH", "info": "NaN / 127500" }
error: [handleAbortError] AI response error; aborting request: { "type": "INPUT_LENGTH", "info": "NaN / 127500" }

As noted both my embedded content and prompt messages are very short, and I've not found a way to specify the input_length or content_length params in a way that resolves this error.

Version Information

I'm running LibreChat in docker using a version I downloaded from ghcr.io/danny-avila/librechat-dev:latest.

newlibrechat-rag_api latest 944580fb3e4c 17 hours ago 465MB
ghcr.io/danny-avila/librechat-dev latest d27883cb774d 4 days ago 1.77GB

Steps to Reproduce

Setting up these conditions is a log process that involves numerous edits to config files, deploying models externally to Librechat etc. Please let me know if you have specific questions on the setup and I'm happy to answer.

Please also refer to the screenshot which helps describe the issue.

What browsers are you seeing the problem on?

Chrome, Microsoft Edge

Relevant log output

******************************************************
*****Here's the content from debug-2025-06-03.log*****
******************************************************
2025-06-03T15:41:59.549Z debug: [AskController]
{
  text: "What is special about \"Four score and seven years ago?\"",
  conversationId: null,
  endpoint: "azureOpenAI",
  resendFiles: true,
    modelOptions.model: "gpt-4o",
  modelsConfig: "exists",
  attachments: [object Promise],
}
2025-06-03T15:41:59.594Z debug: [BaseClient] Loading history:
{
  conversationId: "4575a07c-afa2-4831-b943-b0d6ad1a7c36",
  parentMessageId: "00000000-0000-0000-0000-000000000000",
}
2025-06-03T15:41:59.637Z debug: [BaseClient] Context Count (1/2)
{
  remainingContextTokens: 127497,
  maxContextTokens: 127500,
}
2025-06-03T15:41:59.638Z debug: [BaseClient] Difference between original payload (1) and context (0): 1
2025-06-03T15:41:59.638Z warn: Prompt token count exceeds max token count (NaN / 127500).
2025-06-03T15:41:59.687Z error: [AskController] Error handling request { "type": "INPUT_LENGTH", "info": "NaN / 127500" }
2025-06-03T15:41:59.690Z error: [handleAbortError] AI response error; aborting request: { "type": "INPUT_LENGTH", "info": "NaN / 127500" }
2025-06-03T15:41:59.737Z debug: [AskController] Performing cleanup
2025-06-03T15:41:59.738Z debug: [AskController] Cleaning up abort controller
2025-06-03T15:41:59.739Z debug: [AskController] Cleanup completed

********************************************************
*****Here's the content from error02025-06-03.log*******
********************************************************
{"level":"error","message":"[AskController] Error handling request { \"type\": \"INPUT_LENGTH\", \"info\": \"NaN / 127500\" }","stack":"Error: { \"type\": \"INPUT_LENGTH\", \"info\": \"NaN / 127500\" }\n    at OpenAIClient.handleContextStrategy (/app/api/app/clients/BaseClient.js:493:13)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async OpenAIClient.buildMessages (/app/api/app/clients/OpenAIClient.js:494:61)\n    at async OpenAIClient.sendMessage (/app/api/app/clients/BaseClient.js:605:9)\n    at async AskController (/app/api/server/controllers/AskController.js:187:20)\n    at async /app/api/server/routes/ask/openAI.js:23:5"}
{"level":"error","message":"[handleAbortError] AI response error; aborting request: { \"type\": \"INPUT_LENGTH\", \"info\": \"NaN / 127500\" }","stack":"Error: { \"type\": \"INPUT_LENGTH\", \"info\": \"NaN / 127500\" }\n    at OpenAIClient.handleContextStrategy (/app/api/app/clients/BaseClient.js:493:13)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async OpenAIClient.buildMessages (/app/api/app/clients/OpenAIClient.js:494:61)\n    at async OpenAIClient.sendMessage (/app/api/app/clients/BaseClient.js:605:9)\n    at async AskController (/app/api/server/controllers/AskController.js:187:20)\n    at async /app/api/server/routes/ask/openAI.js:23:5"}

Screenshots

Code of Conduct

I agree to follow this project's Code of Conduct

paulfields · 2025-06-03T15:49:21Z

paulfields
Jun 3, 2025
Author

Correction for "Steps to Reproduce"

Setting up these conditions is a long process that involves numerous edits to config files

0 replies

danny-avila · 2025-06-03T15:56:48Z

danny-avila
Jun 3, 2025
Maintainer

@paulfields I'm not able to reproduce this issue, do you mind sharing the file?

1 reply

paulfields Jun 3, 2025
Author

Yes - thanks for your quick response. The file is attached below. Nothing special as far as I can tell. I would be interested in any guidance you can provide on alternate ways to add indexes (not just presets). The only way I've been able to do it so far is by:

Customizing librechat.yaml by adding an OpenAI endpoint with both a chat model (GPT-4o) and an embedding model (text-embedding-ada-002) and
Adding RAG embedding support by configuring rag_api in docker-compose.override.yml to point to my Azure embedding model, and
Adding my own main.py FastAPI app that:
- Accepts uploaded files or JSON
- Embeds them via Azure OpenAI
- Stores vectors into your PostgreSQL documents table
- Adds tracking metadata into MongoDB

Is this the correct / optimal way of going about this?

paulfields · 2025-06-03T15:58:22Z

paulfields
Jun 3, 2025
Author

gettysburg.txt

0 replies

paulfields · 2025-06-03T17:23:18Z

paulfields
Jun 3, 2025
Author

Update - I deployed GPT-35-turbo just to see if there was anything model-specific going on and got the same issue.

0 replies

danny-avila · 2025-06-03T17:30:25Z

danny-avila
Jun 3, 2025
Maintainer

interesting, still no issue

can you try updating to make sure everything is up to date? It shouldn't be the issue though. I will look into what could possibly be making it "NaN" as that is the only outlier

Updating Instructions (docker):
https://www.librechat.ai/docs/local/docker#update-librechat

1 reply

paulfields Jun 3, 2025
Author

Sorry if this is a dumb question, but is adding a file ("attachment") using an agent (versus just uploading a preset file) actually performs RAG or is it just adding the information to the context of the current chat? I know that the custom RAG solution I used via ./rag-api/main.py creates an index in my postgres db named "vector" but am interested if the "attachment" approach does the same thing -- and just for curiosity if I can verify by looking at the embeddings somewhere. The answer to this will be important for determining our direction forward in developing solutions for our clients. Thanks!

paulfields · 2025-06-03T21:12:13Z

paulfields
Jun 3, 2025
Author

Hi Danny - I followed the instructions in https://www.librechat.ai/docs/local/docker#update-librechat and rebuilt everything, except for the modified config (e.g. yml, env, yaml) files that were not changed by the git pull command, but still seeing the error (see attached).

I wonder if it could be something with how my ./rag-api/app/main.py is making entries to postgres and mongodb. I am assuming you are using a main.py to create the embeddings for RAG, so if you think that's a plausible cause would you mind sharing the insert statements from your main.py script?

Alternately is there something in my config files I should be checking for that could contribute to such an error being thrown?

Appreciate your support.

0 replies

paulfields · 2025-06-06T23:23:18Z

paulfields
Jun 6, 2025
Author

I was able to get past the NaN error by modifying BaseClient.js and OpenAIClient.js. Maybe I'm not thinking correctly but I didn't see a solution that fit my desired use case in the RAG setup instructions. .

Now that I have the ability to upload and create embeddings through a customized ./rag-api/app/main.py, and have it present in the documents set on the right side tools menu in the UI, I'm not able to get the chat to recognize it. See the following image. Any idea what might be going wrong?

0 replies

Uh oh!

Using GPT-4o with RAG results in error "type": "INPUT_LENGTH", "info": "NaN / 127500" #7708

Uh oh!

paulfields Jun 3, 2025

What happened?

Version Information

Steps to Reproduce

What browsers are you seeing the problem on?

Relevant log output

Screenshots

Code of Conduct

Replies: 7 comments · 2 replies

Uh oh!

paulfields Jun 3, 2025 Author

Uh oh!

Uh oh!

danny-avila Jun 3, 2025 Maintainer

Uh oh!

Uh oh!

paulfields Jun 3, 2025 Author

Uh oh!

paulfields Jun 3, 2025 Author

Uh oh!

paulfields Jun 3, 2025 Author

Uh oh!

danny-avila Jun 3, 2025 Maintainer

Uh oh!

Uh oh!

paulfields Jun 3, 2025 Author

Uh oh!

paulfields Jun 3, 2025 Author

Uh oh!

paulfields Jun 6, 2025 Author

paulfields
Jun 3, 2025

Replies: 7 comments 2 replies

paulfields
Jun 3, 2025
Author

danny-avila
Jun 3, 2025
Maintainer

paulfields Jun 3, 2025
Author

paulfields
Jun 3, 2025
Author

paulfields
Jun 3, 2025
Author

danny-avila
Jun 3, 2025
Maintainer

paulfields Jun 3, 2025
Author

paulfields
Jun 3, 2025
Author

paulfields
Jun 6, 2025
Author