Skip to content

Commit 92a00f7

Browse files
committed
fixes
1 parent 5a8c577 commit 92a00f7

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

articles/ai-services/openai/concepts/use-your-data.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -421,17 +421,17 @@ As part of this RAG pipeline, there are are three steps at a high-level:
421421

422422
1. Reformulate the user query into a list of search intents. This is done by making a call to the model with a prompt that includes instructions, the user question, and conversation history. Let's call this an *intent prompt*.
423423

424-
1. For each intent, multiple document chunks are retrieved from the search service. After filtering out irrelevant chunks based on the user-specified threshold of strictness and reranking/aggregating the chunks based on internal logic, the user-specified number document chunks are chosen.
424+
1. For each intent, multiple document chunks are retrieved from the search service. After filtering out irrelevant chunks based on the user-specified threshold of strictness and reranking/aggregating the chunks based on internal logic, the user-specified number of document chunks are chosen.
425425

426426
3. These document chunks, along with the user question, conversation history, role information, and instructions are sent to the model to generate the final model response. Let's call this the *generation prompt*.
427427

428-
In total, there are two calls made to GPT:
428+
In total, there are two calls made to the model:
429429

430-
* For the intent, the token estimate for the *intent prompt* includes those for the user question, conversation history and the instructions sent to the model for intent generation.
430+
* For processing the intent: The token estimate for the *intent prompt* includes those for the user question, conversation history and the instructions sent to the model for intent generation.
431431

432-
* For generation, the token estimate for the *generation prompt* includes those for the user question, conversation history, the retrieved list of document chunks, role information and the instructions sent to it for generation.
432+
* For generating the response: The token estimate for the *generation prompt* includes those for the user question, conversation history, the retrieved list of document chunks, role information and the instructions sent to it for generation.
433433

434-
The model generated output tokens (both intents and response) need to be taken into account for total token estimation. Summing up all the four columns gives the average total tokens used for generating a response.
434+
The model generated output tokens (both intents and response) need to be taken into account for total token estimation. Summing up all the four columns below gives the average total tokens used for generating a response.
435435

436436
| Model | Generation prompt token count | Intent prompt | Token count | Response token count | Intent token count |
437437
|--|--|--|--|--|
@@ -454,7 +454,7 @@ And the following [parameters](#runtime-parameters).
454454
|Number of retrieved documents | 5 |
455455
|Strictness | 3 |
456456
|Chunk size | 1024 |
457-
|Limit responses to ingested data | True |
457+
|Limit responses to ingested data? | True |
458458

459459
These estimates will vary based on the values set for the above parameters. For example, if the number of retrieved documents is set to 10 and strictness is set to 1, the token count will go up. If returned responses aren't limited to the ingested data, there are fewer instructions given to the model and the number of tokens will go down.
460460

@@ -471,7 +471,7 @@ The table above shows the total number of tokens available for each model type.
471471

472472

473473

474-
* The meta prompt (MP): if you limit responses from the model to the grounding data content (`inScope=True` in the API), the maximum number of tokens is 4,036 tokens. Otherwise (for example if `inScope=False`) the maximum is 3,444 tokens. This number is variable depending on the token length of the user question and conversation history. This estimate includes the base prompt and the query rewriting prompts for retrieval.
474+
* The meta prompt: if you limit responses from the model to the grounding data content (`inScope=True` in the API), the maximum number of tokens is 4,036 tokens. Otherwise (for example if `inScope=False`) the maximum is 3,444 tokens. This number is variable depending on the token length of the user question and conversation history. This estimate includes the base prompt and the query rewriting prompts for retrieval.
475475
* User question and history: Variable but capped at 2,000 tokens.
476476
* Retrieved documents (chunks): The number of tokens used by the retrieved document chunks depends on multiple factors. The upper bound for this is the number of retrieved document chunks multiplied by the chunk size. It will, however, be truncated based on the tokens available tokens for the specific model being used after counting the rest of fields.
477477

0 commit comments

Comments
 (0)