Skip to content

Commit 5a8c577

Browse files
committed
updating token estimation
1 parent 6922b90 commit 5a8c577

File tree

1 file changed

+23
-9
lines changed

1 file changed

+23
-9
lines changed

articles/ai-services/openai/concepts/use-your-data.md

Lines changed: 23 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -431,7 +431,7 @@ In total, there are two calls made to GPT:
431431

432432
* For generation, the token estimate for the *generation prompt* includes those for the user question, conversation history, the retrieved list of document chunks, role information and the instructions sent to it for generation.
433433

434-
The model generated output tokens (both intents and response) need to be taken into account for total token estimation.
434+
The model generated output tokens (both intents and response) need to be taken into account for total token estimation. Summing up all the four columns gives the average total tokens used for generating a response.
435435

436436
| Model | Generation prompt token count | Intent prompt | Token count | Response token count | Intent token count |
437437
|--|--|--|--|--|
@@ -440,17 +440,32 @@ The model generated output tokens (both intents and response) need to be taken i
440440
| gpt-4-1106-preview | 4538 | 811 | 119 | 27 |
441441
| gpt-35-turbo-1106 | 4854 | 1372 | 110 | 26 |
442442

443+
The above numbers are based on testing on a data set with:
443444

445+
* 5 conversations
446+
* 250 questions
447+
* 10 average tokens per question
448+
* 4 conversational turns per conversation on average
444449

450+
And the following [parameters](#runtime-parameters).
445451

452+
|Setting |Value |
453+
|---------|---------|
454+
|Number of retrieved documents | 5 |
455+
|Strictness | 3 |
456+
|Chunk size | 1024 |
457+
|Limit responses to ingested data | True |
458+
459+
These estimates will vary based on the values set for the above parameters. For example, if the number of retrieved documents is set to 10 and strictness is set to 1, the token count will go up. If returned responses aren't limited to the ingested data, there are fewer instructions given to the model and the number of tokens will go down.
446460

447-
<!--
448-
| Model | Max tokens for system message | Max tokens for model response |
449-
|--|--|--|
450-
| GPT-35-0301 | 400 | 1500 |
451-
| GPT-35-0613-16K | 1000 | 3200 |
452-
| GPT-4-0613-8K | 400 | 1500 |
453-
| GPT-4-0613-32K | 2000 | 6400 |
461+
The estimates also depend on the nature of the documents and questions being asked. For example, if the questions are open-ended, the responses are likely to be longer. Similarly, a longer system message would contribute to a longer prompt that consumes more tokens, and if the conversation history is long, the prompt will be longer.
462+
463+
| Model | Total available tokens | Max tokens for system message | Max tokens for model response |
464+
|--|--|--|--|
465+
| GPT-35-0301 | 8000 | 400 | 1500 |
466+
| GPT-35-0613-16K | 16000 | 1000 | 3200 |
467+
| GPT-4-0613-8K | 8000 | 400 | 1500 |
468+
| GPT-4-0613-32K | 32000 | 2000 | 6400 |
454469

455470
The table above shows the total number of tokens available for each model type. It also determines the maximum number of tokens that can be used for the [system message](#system-message) and the model response. Additionally, the following also consume tokens:
456471

@@ -475,7 +490,6 @@ class TokenEstimator(object):
475490
token_output = TokenEstimator.estimate_tokens(input_text)
476491
```
477492

478-
-->
479493

480494
## Troubleshooting
481495

0 commit comments

Comments
 (0)