Skip to content

Commit 77ca26a

Browse files
committed
use entra id for authentication
1 parent 6e3ffe2 commit 77ca26a

18 files changed

+317
-4
lines changed

articles/ai-services/openai/concepts/provisioned-throughput.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -152,8 +152,8 @@ The [Provisioned-Managed Utilization V2 metric](../how-to/monitoring.md#azure-op
152152
The 429 response isn't an error, but instead part of the design for telling users that a given deployment is fully utilized at a point in time. By providing a fast-fail response, you have control over how to handle these situations in a way that best fits your application requirements.
153153

154154
The `retry-after-ms` and `retry-after` headers in the response tell you the time to wait before the next call will be accepted. How you choose to handle this response depends on your application requirements. Here are some considerations:
155-
- You can consider redirecting the traffic to other models, deployments, or experiences. This option is the lowest-latency solution because the action can be taken as soon as you receive the 429 signal. For ideas on how to effectively implement this pattern see this [community post](https://github.com/Azure/aoai-apim).
156-
- If you're okay with longer per-call latencies, implement client-side retry logic. This option gives you the highest amount of throughput per PTU. The Azure OpenAI client libraries include built-in capabilities for handling retries.
155+
- You can consider redirecting the traffic to other models, deployments, or experiences. This option is the lowest-latency solution because the action can be taken as soon as you receive the 429 signal. For ideas on how to effectively implement this pattern see this [community post](https://github.com/Azure/aoai-apim).
156+
- If you're okay with longer per-call latencies, implement client-side retry logic. This option gives you the highest amount of throughput per PTU. The Azure OpenAI client libraries include built-in capabilities for handling retries.
157157

158158
#### How does the service decide when to send a 429?
159159

@@ -168,13 +168,13 @@ For provisioned deployments, we use a variation of the leaky bucket algorithm to
168168

169169
b. Otherwise, the service estimates the incremental change to utilization required to serve the request by combining prompt tokens and the specified `max_tokens` in the call. For requests that include at least 1024 cached tokens, the cached tokens are subtracted from the prompt token value. A customer can receive up to a 100% discount on their prompt tokens depending on the size of their cached tokens. If the `max_tokens` parameter is not specified, the service estimates a value. This estimation can lead to lower concurrency than expected when the number of actual generated tokens is small. For highest concurrency, ensure that the `max_tokens` value is as close as possible to the true generation size.
170170

171-
3. When a request finishes, we now know the actual compute cost for the call. To ensure an accurate accounting, we correct the utilization using the following logic:
171+
1. When a request finishes, we now know the actual compute cost for the call. To ensure an accurate accounting, we correct the utilization using the following logic:
172172

173173
a. If the actual > estimated, then the difference is added to the deployment's utilization
174174

175175
b. If the actual < estimated, then the difference is subtracted.
176176

177-
4. The overall utilization is decremented down at a continuous rate based on the number of PTUs deployed.
177+
1. The overall utilization is decremented down at a continuous rate based on the number of PTUs deployed.
178178

179179
> [!NOTE]
180180
> Calls are accepted until utilization reaches 100%. Bursts just over 100% may be permitted in short periods, but over time, your traffic is capped at 100% utilization.

articles/ai-services/translator/entra-id-authentication.md

Lines changed: 313 additions & 0 deletions
Large diffs are not rendered by default.

articles/ai-services/translator/entra/identity-platform/quickstart-register-app

Whitespace-only changes.
95.7 KB
Loading
55.9 KB
Loading
34.3 KB
Loading
170 KB
Loading
166 KB
Loading
48.4 KB
Loading
169 KB
Loading

0 commit comments

Comments
 (0)