How to optimize the cost of each Ask? #556

joaolovatti · 2024-05-24T13:33:19Z

joaolovatti
May 24, 2024

Context / Scenario

• I have noticed that each request I make to the \ask endpoint is generating a cost of 2 cents using the OpenAI API.

• It has been using a large amount of tokens as context.

• Compared to smaller uses like asking on ChromaDB, I found that Kernel Memory is using a higher amount of tokens.

Question

• Do you have any tips on how to reduce the cost?

• Can I limit the amount of context tokens used for the LLM? Or limit the number of relevant partitions returned?

amitchaudhary · 2024-08-22T08:52:06Z

amitchaudhary
Aug 22, 2024

+1

0 replies

dluc · 2024-08-23T19:49:07Z

dluc
Aug 23, 2024
Maintainer

One way to reduce the tokens used is using the minRelevance parameter. The default value is 0 so the RAG prompt will include as many chunks as possible.

By increasing that value, you can exclude chunks with lower relevance.
There's no magic value, and the exact value depends on your data, on your questions, and the storage engine.

If you're using only vector search with cosine similarity, not using hybrid search nor custom search algos, you could start with a value like 0.75 and see if that helps. Note that by increasing the value, the RAG prompt will get smaller and smaller, giving LLMs less and less ground information to answer your questions.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to optimize the cost of each Ask? #556

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to optimize the cost of each Ask? #556

Uh oh!

Uh oh!

joaolovatti May 24, 2024

Context / Scenario

Question

Replies: 2 comments

Uh oh!

amitchaudhary Aug 22, 2024

Uh oh!

dluc Aug 23, 2024 Maintainer

joaolovatti
May 24, 2024

amitchaudhary
Aug 22, 2024

dluc
Aug 23, 2024
Maintainer