You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/concepts/use-your-data.md
+30Lines changed: 30 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -415,8 +415,36 @@ When you chat with a model, providing a history of the chat will help the model
415
415
416
416
## Token usage estimation for Azure OpenAI On Your Data
417
417
418
+
Azure OpenAI On Your Data Retrieval Augmented Generation (RAG) service that leverages both a search service (such as Azure AI Search) and generation (Azure OpenAI models) to let users get answers for their questions based on provided data.
418
419
420
+
As part of this RAG pipeline, there are are three steps at a high-level:
419
421
422
+
1. Reformulate the user query into a list of search intents. This is done by making a call to the model with a prompt that includes instructions, the user question, and conversation history. Let's call this an *intent prompt*.
423
+
424
+
1. For each intent, multiple document chunks are retrieved from the search service. After filtering out irrelevant chunks based on the user-specified threshold of strictness and reranking/aggregating the chunks based on internal logic, the user-specified number document chunks are chosen.
425
+
426
+
3. These document chunks, along with the user question, conversation history, role information, and instructions are sent to the model to generate the final model response. Let's call this the *generation prompt*.
427
+
428
+
In total, there are two calls made to GPT:
429
+
430
+
* For the intent, the token estimate for the *intent prompt* includes those for the user question, conversation history and the instructions sent to the model for intent generation.
431
+
432
+
* For generation, the token estimate for the *generation prompt* includes those for the user question, conversation history, the retrieved list of document chunks, role information and the instructions sent to it for generation.
433
+
434
+
The model generated output tokens (both intents and response) need to be taken into account for total token estimation.
0 commit comments