Skip to content

Commit 74ddc89

Browse files
committed
update
1 parent 4f4e81c commit 74ddc89

File tree

1 file changed

+17
-17
lines changed

1 file changed

+17
-17
lines changed

articles/ai-services/openai/how-to/predicted-outputs.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ recommendations: false
1414

1515
# Predicted outputs
1616

17-
Predicted outputs can improve model response latency for chat completions calls where minimal changes are needed to a larger body of text. If you are asking the model to provide a response where a large portion of the expected response is already known, predicted outputs can significantly reduce the latency of this request. This capability is particularly well-suited for coding scenarios, including autocomplete, error detection, and real-time editing, where speed and responsiveness are critical for developers and end-users. Rather than have the model regenerate all the text from scratch, you can indicate to the model that most of the response is already known by passing the known text to the `prediction` parameter.
17+
Predicted outputs can improve model response latency for chat completions calls where minimal changes are needed to a larger body of text. If you're asking the model to provide a response where a large portion of the expected response is already known, predicted outputs can significantly reduce the latency of this request. This capability is particularly well-suited for coding scenarios, including autocomplete, error detection, and real-time editing, where speed and responsiveness are critical for developers and end-users. Rather than have the model regenerate all the text from scratch, you can indicate to the model that most of the response is already known by passing the known text to the `prediction` parameter.
1818

1919
## Model support
2020

@@ -29,26 +29,26 @@ Predicted outputs can improve model response latency for chat completions calls
2929

3030
## Unsupported features
3131

32-
Predicted outputs is currently text-only. These features cannot be used in conjunction with the `prediction` parameter and predicted outputs.
32+
Predicted outputs is currently text-only. These features can't be used in conjunction with the `prediction` parameter and predicted outputs.
3333

34-
- Function calling
34+
- Tools/Function calling
3535
- audio models/inputs and outputs
36-
- `n` values higher than 1
36+
- `n` values higher than `1`
3737
- `logprobs`
38-
- `presence_penalty` values greater than 0
39-
- `frequency_penalty` values greater than 0
38+
- `presence_penalty` values greater than `0`
39+
- `frequency_penalty` values greater than `0`
4040
- `max_completion_tokens`
4141

4242
> [!NOTE]
4343
> The predicted outputs feature is currently unavailable for models in the South East Asia region.
4444
4545
## Getting started
4646

47-
To demonstrate the basics of predicted outputs we'll start by asking a model to refactor the code from the common programming `FizzBuzz` problem to replace the instance of `FizzBuzz` with `MSFTBuzz`. We'll pass our example code to the model in two places. First as part of a user message in the `messages` array/list, and a second time as part of the content of the new `prediction` parameter.
47+
To demonstrate the basics of predicted outputs, we'll start by asking a model to refactor the code from the common programming `FizzBuzz` problem to replace the instance of `FizzBuzz` with `MSFTBuzz`. We'll pass our example code to the model in two places. First as part of a user message in the `messages` array/list, and a second time as part of the content of the new `prediction` parameter.
4848

4949
# [Python (Microsoft Entra ID)](#tab/python-secure)
5050

51-
You may need to upgrade your OpenAI client library to access the `prediction` parameter.
51+
You might need to upgrade your OpenAI client library to access the `prediction` parameter.
5252

5353
```cmd
5454
pip install openai --upgrade
@@ -110,7 +110,7 @@ print(completion.model_dump_json(indent=2))
110110

111111
# [Python (key-based auth)](#tab/python)
112112

113-
You may need to upgrade your OpenAI client library to access the `prediction` parameter.
113+
You might need to upgrade your OpenAI client library to access the `prediction` parameter.
114114

115115
```cmd
116116
pip install openai --upgrade
@@ -278,20 +278,20 @@ Notice in the output the new response parameters for `accepted_prediction_tokens
278278
}
279279
```
280280

281-
The `accepted_prediction_tokens` help reduce model response latency, but any `rejected_prediction_tokens` have the same cost implication as additional output tokens generated by the model. For this reason, while predicted outputs can improve model response times, it can result in greater costs. You will need to evaluate and balance the increased model performance against the potential increases in cost.
281+
The `accepted_prediction_tokens` help reduce model response latency, but any `rejected_prediction_tokens` have the same cost implication as additional output tokens generated by the model. For this reason, while predicted outputs can improve model response times, it can result in greater costs. You'll need to evaluate and balance the increased model performance against the potential increases in cost.
282282

283-
It is also important to understand, that using predictive outputs does not guarantee a reduction in latency. A large request with a greater percentage of rejected prediction tokens than accepted prediction tokens could result in an increase in model response latency, rather than a decrease.
283+
It's also important to understand, that using predictive outputs doesn't guarantee a reduction in latency. A large request with a greater percentage of rejected prediction tokens than accepted prediction tokens could result in an increase in model response latency, rather than a decrease.
284284

285285
> [!NOTE]
286-
> Unlike [prompt caching](./prompt-caching.md) which only works when a set minimum number of initial tokens at the beginning of a request are identical, predicted outputs is not constrained by token location. Even if your response text contains new output that will be returned prior to the predicted output, `accepted_prediction_tokens` can still occur.
286+
> Unlike [prompt caching](./prompt-caching.md) which only works when a set minimum number of initial tokens at the beginning of a request are identical, predicted outputs isn't constrained by token location. Even if your response text contains new output that will be returned prior to the predicted output, `accepted_prediction_tokens` can still occur.
287287

288288
## Streaming
289289

290-
Predicted outputs performance boost is often most obvious if you are returning your responses with streaming enabled.
290+
Predicted outputs performance boost is often most obvious if you're returning your responses with streaming enabled.
291291

292292
# [Python (Microsoft Entra ID)](#tab/python-secure)
293293

294-
You may need to upgrade your OpenAI client library to access the `prediction` parameter.
294+
You might need to upgrade your OpenAI client library to access the `prediction` parameter.
295295

296296
```cmd
297297
pip install openai --upgrade
@@ -331,7 +331,7 @@ with code, and with no markdown formatting.
331331

332332

333333
completion = client.chat.completions.create(
334-
model="gpt-4o-mini",
334+
model="gpt-4o-mini", # replace with your unique model deployment name
335335
messages=[
336336
{
337337
"role": "user",
@@ -357,7 +357,7 @@ for chunk in completion:
357357

358358
# [Python (key-based auth)](#tab/python)
359359

360-
You may need to upgrade your OpenAI client library to access the `prediction` parameter.
360+
You might need to upgrade your OpenAI client library to access the `prediction` parameter.
361361

362362
```cmd
363363
pip install openai --upgrade
@@ -392,7 +392,7 @@ with code, and with no markdown formatting.
392392

393393

394394
completion = client.chat.completions.create(
395-
model="gpt-4o-mini",
395+
model="gpt-4o-mini", # replace with your unique model deployment name
396396
messages=[
397397
{
398398
"role": "user",

0 commit comments

Comments
 (0)