fix per PR review feedback

eric-urban · eric-urban · commit 19fdeec73d17 · 2025-01-22T13:15:41.000-08:00
diff --git a/articles/ai-foundry/model-inference/concepts/endpoints.md b/articles/ai-foundry/model-inference/concepts/endpoints.md
@@ -42,7 +42,7 @@ The Azure AI inference endpoint allows customers to use a single endpoint with t
 
 You can see the endpoint URL and credentials in the **Overview** section:
 
-:::image type="content" source="../media/overview/overview-endpoint-and-key.png" alt-text="An screenshot showing how to get the URL and key associated with the resource." lightbox="../media/overview/overview-endpoint-and-key.png":::
+:::image type="content" source="../media/overview/overview-endpoint-and-key.png" alt-text="Screenshot showing how to get the URL and key associated with the resource." lightbox="../media/overview/overview-endpoint-and-key.png":::
 
 ### Routing
 
diff --git a/articles/ai-foundry/model-inference/concepts/model-versions.md b/articles/ai-foundry/model-inference/concepts/model-versions.md
@@ -57,6 +57,6 @@ Azure works closely with model providers to release new model versions. When a n
 
 New model versions might result in a new model ID being published. For example, `Llama-3.3-70B-Instruct`, `Meta-Llama-3.1-70B-Instruct`, and `Meta-Llama-3-70B-Instruct`. In some cases, all the model versions might be available in the same API version. In other cases, you might also need to adjust the API version used to consume the model in case the API contract has changed from one model to another.
 
-## Next Step
+## Related content
 
 - [Learn more about working with Azure OpenAI models](../../../ai-services/openai/how-to/working-with-models.md)
diff --git a/articles/ai-foundry/model-inference/faq.yml b/articles/ai-foundry/model-inference/faq.yml
@@ -73,7 +73,7 @@ sections:
       - question: |
           Does Azure AI model inference support custom API headers? We append custom headers to our API requests and are seeing HTTP 431 failure errors.
         answer: |
-          Our current APIs allow up to 10 custom headers, which are passed through the pipeline, and returned. We noticed some customers now exceed this header count resulting in HTTP 431 errors. There's no solution for this error, other than to reduce header volume. In future API versions, we no longer pass through custom headers. We recommend customers not depend on custom headers in future system architectures. 
+          Our current APIs allow up to 10 custom headers, which are passed through the pipeline, and returned. We noticed some customers now exceed this header count resulting in HTTP 431 errors. There's no solution for this error, other than to reduce header volume. In future API versions, we no longer pass through custom headers. We recommend that you don't depend on custom headers in future system architectures. 
   - name: Pricing and Billing
     questions:
       - question: |
diff --git a/articles/ai-foundry/model-inference/includes/use-chat-completions/rest.md b/articles/ai-foundry/model-inference/includes/use-chat-completions/rest.md
@@ -28,7 +28,7 @@ To use chat completion models in your application, you need:
 
 ## Use chat completions
 
-To use the text embeddings, use the route `/chat/completions` along with you credential indicated in `api-key`. `Authorization` header is also supported with the format `Bearer <key>`.
+To use the text embeddings, use the route `/chat/completions` along with your credential indicated in `api-key`. `Authorization` header is also supported with the format `Bearer <key>`.
 
 ```http
 POST /chat/completions
@@ -554,7 +554,7 @@ Some models can reason across text and images and generate text completions base
 To see this capability, download an image and encode the information as `base64` string. The resulting data should be inside of a [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs):
 
 > [!TIP]
-> You will need to construct the data URL using an scripting or programming language. This tutorial use [this sample image](../../../../ai-studio/media/how-to/sdks/small-language-models-chart-example.jpg) in JPEG format. A data URL has a format as follows: `data:image/jpg;base64,0xABCDFGHIJKLMNOPQRSTUVWXYZ...`.
+> You will need to construct the data URL using a scripting or programming language. This tutorial use [this sample image](../../../../ai-studio/media/how-to/sdks/small-language-models-chart-example.jpg) in JPEG format. A data URL has a format as follows: `data:image/jpg;base64,0xABCDFGHIJKLMNOPQRSTUVWXYZ...`.
 
 Visualize the image:
 
diff --git a/articles/ai-foundry/model-inference/includes/use-embeddings/rest.md b/articles/ai-foundry/model-inference/includes/use-embeddings/rest.md
@@ -28,7 +28,7 @@ To use embedding models in your application, you need:
 
 ## Use embeddings
 
-To use the text embeddings, use the route `/embeddings` along with you credential indicated in `api-key`. `Authorization` header is also supported with the format `Bearer <key>`.
+To use the text embeddings, use the route `/embeddings` along with your credential indicated in `api-key`. `Authorization` header is also supported with the format `Bearer <key>`.
 
 ```http
 POST /embeddings
diff --git a/articles/ai-foundry/model-inference/quotas-limits.md b/articles/ai-foundry/model-inference/quotas-limits.md
@@ -45,7 +45,7 @@ The following sections provide you with a quick guide to the default quotas and
 
 ## Usage tiers
 
-Global Standard deployments use Azure's global infrastructure, dynamically routing customer traffic to the data center with best availability for the customer's inference requests. This enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see more variability in response latency.
+Global Standard deployments use Azure's global infrastructure, dynamically routing customer traffic to the data center with best availability for the customer's inference requests. This enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see more variabilities in response latency.
 
 The Usage Limit determines the level of usage above which customers might see larger variability in response latency. A customer's usage is defined per model and is the total tokens consumed across all deployments in all subscriptions in all regions for a given tenant.