Skip to content

Commit 19fdeec

Browse files
committed
fix per PR review feedback
1 parent 71cef84 commit 19fdeec

File tree

6 files changed

+7
-7
lines changed

6 files changed

+7
-7
lines changed

articles/ai-foundry/model-inference/concepts/endpoints.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ The Azure AI inference endpoint allows customers to use a single endpoint with t
4242

4343
You can see the endpoint URL and credentials in the **Overview** section:
4444

45-
:::image type="content" source="../media/overview/overview-endpoint-and-key.png" alt-text="An screenshot showing how to get the URL and key associated with the resource." lightbox="../media/overview/overview-endpoint-and-key.png":::
45+
:::image type="content" source="../media/overview/overview-endpoint-and-key.png" alt-text="Screenshot showing how to get the URL and key associated with the resource." lightbox="../media/overview/overview-endpoint-and-key.png":::
4646

4747
### Routing
4848

articles/ai-foundry/model-inference/concepts/model-versions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,6 @@ Azure works closely with model providers to release new model versions. When a n
5757

5858
New model versions might result in a new model ID being published. For example, `Llama-3.3-70B-Instruct`, `Meta-Llama-3.1-70B-Instruct`, and `Meta-Llama-3-70B-Instruct`. In some cases, all the model versions might be available in the same API version. In other cases, you might also need to adjust the API version used to consume the model in case the API contract has changed from one model to another.
5959

60-
## Next Step
60+
## Related content
6161

6262
- [Learn more about working with Azure OpenAI models](../../../ai-services/openai/how-to/working-with-models.md)

articles/ai-foundry/model-inference/faq.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,7 @@ sections:
7373
- question: |
7474
Does Azure AI model inference support custom API headers? We append custom headers to our API requests and are seeing HTTP 431 failure errors.
7575
answer: |
76-
Our current APIs allow up to 10 custom headers, which are passed through the pipeline, and returned. We noticed some customers now exceed this header count resulting in HTTP 431 errors. There's no solution for this error, other than to reduce header volume. In future API versions, we no longer pass through custom headers. We recommend customers not depend on custom headers in future system architectures.
76+
Our current APIs allow up to 10 custom headers, which are passed through the pipeline, and returned. We noticed some customers now exceed this header count resulting in HTTP 431 errors. There's no solution for this error, other than to reduce header volume. In future API versions, we no longer pass through custom headers. We recommend that you don't depend on custom headers in future system architectures.
7777
- name: Pricing and Billing
7878
questions:
7979
- question: |

articles/ai-foundry/model-inference/includes/use-chat-completions/rest.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ To use chat completion models in your application, you need:
2828

2929
## Use chat completions
3030

31-
To use the text embeddings, use the route `/chat/completions` along with you credential indicated in `api-key`. `Authorization` header is also supported with the format `Bearer <key>`.
31+
To use the text embeddings, use the route `/chat/completions` along with your credential indicated in `api-key`. `Authorization` header is also supported with the format `Bearer <key>`.
3232

3333
```http
3434
POST /chat/completions
@@ -554,7 +554,7 @@ Some models can reason across text and images and generate text completions base
554554
To see this capability, download an image and encode the information as `base64` string. The resulting data should be inside of a [data URL](https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URLs):
555555

556556
> [!TIP]
557-
> You will need to construct the data URL using an scripting or programming language. This tutorial use [this sample image](../../../../ai-studio/media/how-to/sdks/small-language-models-chart-example.jpg) in JPEG format. A data URL has a format as follows: `...`.
557+
> You will need to construct the data URL using a scripting or programming language. This tutorial use [this sample image](../../../../ai-studio/media/how-to/sdks/small-language-models-chart-example.jpg) in JPEG format. A data URL has a format as follows: `...`.
558558
559559
Visualize the image:
560560

articles/ai-foundry/model-inference/includes/use-embeddings/rest.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ To use embedding models in your application, you need:
2828

2929
## Use embeddings
3030

31-
To use the text embeddings, use the route `/embeddings` along with you credential indicated in `api-key`. `Authorization` header is also supported with the format `Bearer <key>`.
31+
To use the text embeddings, use the route `/embeddings` along with your credential indicated in `api-key`. `Authorization` header is also supported with the format `Bearer <key>`.
3232

3333
```http
3434
POST /embeddings

articles/ai-foundry/model-inference/quotas-limits.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ The following sections provide you with a quick guide to the default quotas and
4545

4646
## Usage tiers
4747

48-
Global Standard deployments use Azure's global infrastructure, dynamically routing customer traffic to the data center with best availability for the customer's inference requests. This enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see more variability in response latency.
48+
Global Standard deployments use Azure's global infrastructure, dynamically routing customer traffic to the data center with best availability for the customer's inference requests. This enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see more variabilities in response latency.
4949

5050
The Usage Limit determines the level of usage above which customers might see larger variability in response latency. A customer's usage is defined per model and is the total tokens consumed across all deployments in all subscriptions in all regions for a given tenant.
5151

0 commit comments

Comments
 (0)