update

mrbullwinkle · mrbullwinkle · commit 74ddc89606b3 · 2025-01-29T14:22:52.000-05:00
diff --git a/articles/ai-services/openai/how-to/predicted-outputs.md b/articles/ai-services/openai/how-to/predicted-outputs.md
@@ -14,7 +14,7 @@ recommendations: false
 
 # Predicted outputs
 
-Predicted outputs can improve model response latency for chat completions calls where minimal changes are needed to a larger body of text. If you are asking the model to provide a response where a large portion of the expected response is already known, predicted outputs can significantly reduce the latency of this request. This capability is particularly well-suited for coding scenarios, including autocomplete, error detection, and real-time editing, where speed and responsiveness are critical for developers and end-users. Rather than have the model regenerate all the text from scratch, you can indicate to the model that most of the response is already known by passing the known text to the `prediction` parameter.
+Predicted outputs can improve model response latency for chat completions calls where minimal changes are needed to a larger body of text. If you're asking the model to provide a response where a large portion of the expected response is already known, predicted outputs can significantly reduce the latency of this request. This capability is particularly well-suited for coding scenarios, including autocomplete, error detection, and real-time editing, where speed and responsiveness are critical for developers and end-users. Rather than have the model regenerate all the text from scratch, you can indicate to the model that most of the response is already known by passing the known text to the `prediction` parameter.
 
 ## Model support
 
@@ -29,26 +29,26 @@ Predicted outputs can improve model response latency for chat completions calls
 
 ## Unsupported features
 
-Predicted outputs is currently text-only. These features cannot be used in conjunction with the `prediction` parameter and predicted outputs.
+Predicted outputs is currently text-only. These features can't be used in conjunction with the `prediction` parameter and predicted outputs.
 
-- Function calling
+- Tools/Function calling
 - audio models/inputs and outputs
-- `n` values higher than 1
+- `n` values higher than `1`
 - `logprobs`
-- `presence_penalty` values greater than 0
-- `frequency_penalty` values greater than 0
+- `presence_penalty` values greater than `0`
+- `frequency_penalty` values greater than `0`
 - `max_completion_tokens`
 
 > [!NOTE]
 > The predicted outputs feature is currently unavailable for models in the South East Asia region.
 
 ## Getting started
 
-To demonstrate the basics of predicted outputs we'll start by asking a model to refactor the code from the common programming `FizzBuzz` problem to replace the instance of `FizzBuzz` with `MSFTBuzz`. We'll pass our example code to the model in two places. First as part of a user message in the `messages` array/list, and a second time as part of the content of the new `prediction` parameter.
+To demonstrate the basics of predicted outputs, we'll start by asking a model to refactor the code from the common programming `FizzBuzz` problem to replace the instance of `FizzBuzz` with `MSFTBuzz`. We'll pass our example code to the model in two places. First as part of a user message in the `messages` array/list, and a second time as part of the content of the new `prediction` parameter.
 
 # [Python (Microsoft Entra ID)](#tab/python-secure)
 
-You may need to upgrade your OpenAI client library to access the `prediction` parameter.
+You might need to upgrade your OpenAI client library to access the `prediction` parameter.
 
 ```cmd
 pip install openai --upgrade
@@ -110,7 +110,7 @@ print(completion.model_dump_json(indent=2))
 
 # [Python (key-based auth)](#tab/python)
 
-You may need to upgrade your OpenAI client library to access the `prediction` parameter.
+You might need to upgrade your OpenAI client library to access the `prediction` parameter.
 
 ```cmd
 pip install openai --upgrade
@@ -278,20 +278,20 @@ Notice in the output the new response parameters for `accepted_prediction_tokens
     }
 ```
 
-The `accepted_prediction_tokens` help reduce model response latency, but any `rejected_prediction_tokens` have the same cost implication as additional output tokens generated by the model. For this reason, while predicted outputs can improve model response times, it can result in greater costs. You will need to evaluate and balance the increased model performance against the potential increases in cost.
+The `accepted_prediction_tokens` help reduce model response latency, but any `rejected_prediction_tokens` have the same cost implication as additional output tokens generated by the model. For this reason, while predicted outputs can improve model response times, it can result in greater costs. You'll need to evaluate and balance the increased model performance against the potential increases in cost.
 
-It is also important to understand, that using predictive outputs does not guarantee a reduction in latency. A large request with a greater percentage of rejected prediction tokens than accepted prediction tokens could result in an increase in model response latency, rather than a decrease.  
+It's also important to understand, that using predictive outputs doesn't guarantee a reduction in latency. A large request with a greater percentage of rejected prediction tokens than accepted prediction tokens could result in an increase in model response latency, rather than a decrease.  
 
 > [!NOTE]
-> Unlike [prompt caching](./prompt-caching.md) which only works when a set minimum number of initial tokens at the beginning of a request are identical, predicted outputs is not constrained by token location. Even if your response text contains new output that will be returned prior to the predicted output, `accepted_prediction_tokens` can still occur.
+> Unlike [prompt caching](./prompt-caching.md) which only works when a set minimum number of initial tokens at the beginning of a request are identical, predicted outputs isn't constrained by token location. Even if your response text contains new output that will be returned prior to the predicted output, `accepted_prediction_tokens` can still occur.
 
 ## Streaming
 
-Predicted outputs performance boost is often most obvious if you are returning your responses with streaming enabled.
+Predicted outputs performance boost is often most obvious if you're returning your responses with streaming enabled.
 
 # [Python (Microsoft Entra ID)](#tab/python-secure)
 
-You may need to upgrade your OpenAI client library to access the `prediction` parameter.
+You might need to upgrade your OpenAI client library to access the `prediction` parameter.
 
 ```cmd
 pip install openai --upgrade
@@ -331,7 +331,7 @@ with code, and with no markdown formatting.
 
 
 completion = client.chat.completions.create(
-    model="gpt-4o-mini",
+    model="gpt-4o-mini", # replace with your unique model deployment name
     messages=[
         {
             "role": "user",
@@ -357,7 +357,7 @@ for chunk in completion:
 
 # [Python (key-based auth)](#tab/python)
 
-You may need to upgrade your OpenAI client library to access the `prediction` parameter.
+You might need to upgrade your OpenAI client library to access the `prediction` parameter.
 
 ```cmd
 pip install openai --upgrade
@@ -392,7 +392,7 @@ with code, and with no markdown formatting.
 
 
 completion = client.chat.completions.create(
-    model="gpt-4o-mini",
+    model="gpt-4o-mini", # replace with your unique model deployment name
     messages=[
         {
             "role": "user",