You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/concepts/use-your-data.md
+23-9Lines changed: 23 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -431,7 +431,7 @@ In total, there are two calls made to GPT:
431
431
432
432
* For generation, the token estimate for the *generation prompt* includes those for the user question, conversation history, the retrieved list of document chunks, role information and the instructions sent to it for generation.
433
433
434
-
The model generated output tokens (both intents and response) need to be taken into account for total token estimation.
434
+
The model generated output tokens (both intents and response) need to be taken into account for total token estimation. Summing up all the four columns gives the average total tokens used for generating a response.
@@ -440,17 +440,32 @@ The model generated output tokens (both intents and response) need to be taken i
440
440
| gpt-4-1106-preview | 4538 | 811 | 119 | 27 |
441
441
| gpt-35-turbo-1106 | 4854 | 1372 | 110 | 26 |
442
442
443
+
The above numbers are based on testing on a data set with:
443
444
445
+
* 5 conversations
446
+
* 250 questions
447
+
* 10 average tokens per question
448
+
* 4 conversational turns per conversation on average
444
449
450
+
And the following [parameters](#runtime-parameters).
445
451
452
+
|Setting |Value |
453
+
|---------|---------|
454
+
|Number of retrieved documents | 5 |
455
+
|Strictness | 3 |
456
+
|Chunk size | 1024 |
457
+
|Limit responses to ingested data | True |
458
+
459
+
These estimates will vary based on the values set for the above parameters. For example, if the number of retrieved documents is set to 10 and strictness is set to 1, the token count will go up. If returned responses aren't limited to the ingested data, there are fewer instructions given to the model and the number of tokens will go down.
446
460
447
-
<!--
448
-
| Model | Max tokens for system message | Max tokens for model response |
449
-
|--|--|--|
450
-
| GPT-35-0301 | 400 | 1500 |
451
-
| GPT-35-0613-16K | 1000 | 3200 |
452
-
| GPT-4-0613-8K | 400 | 1500 |
453
-
| GPT-4-0613-32K | 2000 | 6400 |
461
+
The estimates also depend on the nature of the documents and questions being asked. For example, if the questions are open-ended, the responses are likely to be longer. Similarly, a longer system message would contribute to a longer prompt that consumes more tokens, and if the conversation history is long, the prompt will be longer.
462
+
463
+
| Model | Total available tokens | Max tokens for system message | Max tokens for model response |
464
+
|--|--|--|--|
465
+
| GPT-35-0301 | 8000 | 400 | 1500 |
466
+
| GPT-35-0613-16K | 16000 | 1000 | 3200 |
467
+
| GPT-4-0613-8K | 8000 | 400 | 1500 |
468
+
| GPT-4-0613-32K | 32000 | 2000 | 6400 |
454
469
455
470
The table above shows the total number of tokens available for each model type. It also determines the maximum number of tokens that can be used for the [system message](#system-message) and the model response. Additionally, the following also consume tokens:
0 commit comments