How to optimize the cost of each Ask? #556
Replies: 2 comments
-
+1 |
Beta Was this translation helpful? Give feedback.
-
One way to reduce the tokens used is using the By increasing that value, you can exclude chunks with lower relevance. If you're using only vector search with cosine similarity, not using hybrid search nor custom search algos, you could start with a value like 0.75 and see if that helps. Note that by increasing the value, the RAG prompt will get smaller and smaller, giving LLMs less and less ground information to answer your questions. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Context / Scenario
• I have noticed that each request I make to the \ask endpoint is generating a cost of 2 cents using the OpenAI API.
• It has been using a large amount of tokens as context.
• Compared to smaller uses like asking on ChromaDB, I found that Kernel Memory is using a higher amount of tokens.
Question
• Do you have any tips on how to reduce the cost?
• Can I limit the amount of context tokens used for the LLM? Or limit the number of relevant partitions returned?
Beta Was this translation helpful? Give feedback.
All reactions