Skip to content

Commit 714f160

Browse files
Merge pull request #272348 from mrbullwinkle/mrb_04_16_2024_monitoring_v3
[Azure OpenAI] Metric update
2 parents a328e31 + 6eaa8a8 commit 714f160

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

articles/ai-services/openai/how-to/monitoring.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.author: mbullwin
66
ms.service: azure-ai-openai
77
ms.topic: how-to
88
ms.custom: subject-monitoring
9-
ms.date: 03/29/2024
9+
ms.date: 04/16/2024
1010
---
1111

1212
# Monitoring Azure OpenAI Service
@@ -60,7 +60,9 @@ The following table summarizes the current subset of metrics available in Azure
6060
| `Processed FineTuned Training Hours` | Usage |Sum| Number of training hours processed on an Azure OpenAI fine-tuned model. | `ApiName`, `ModelDeploymentName`,`ModelName`, `Region`|
6161
| `Processed Inference Tokens` | Usage | Sum| Number of inference tokens processed by an Azure OpenAI model. Calculated as prompt tokens (input) + generated tokens. Applies to PayGo, PTU, and PTU-manged SKUs.|`ApiName`, `ModelDeploymentName`,`ModelName`, `Region`|
6262
| `Processed Prompt Tokens` | Usage | Sum | Total number of prompt tokens (input) processed on an Azure OpenAI model. Applies to PayGo, PTU, and PTU-managed SKUs.|`ApiName`, `ModelDeploymentName`,`ModelName`, `Region`|
63-
| `Provision-managed Utilization V2` | Usage | Average | Provision-managed utilization is the utilization percentage for a given provisioned-managed deployment. Calculated as (PTUs consumed/PTUs deployed)*100. When utilization is at or above 100%, calls are throttled and return a 429 error code. | `ModelDeploymentName`,`ModelName`,`ModelVersion`, `Region`, `StreamType`|
63+
| `Provision-managed Utilization V2` | HTTP | Average | Provision-managed utilization is the utilization percentage for a given provisioned-managed deployment. Calculated as (PTUs consumed/PTUs deployed)*100. When utilization is at or above 100%, calls are throttled and return a 429 error code. | `ModelDeploymentName`,`ModelName`,`ModelVersion`, `Region`, `StreamType`|
64+
|`Prompt Token Cache Match Rate` | HTTP | Average | **Provisioned-managed only**. The prompt token cache hit ration expressed as a percentage. | `ModelDeploymentName`, `ModelVersion`, `ModelName`, `Region`|
65+
|`Time to Response` | HTTP | Average | Recommended latency (responsiveness) measure for streaming requests. **Applies to PTU, and PTU-managed deployments**. This metric does not apply to standard pay-go deployments. Calculated as time taken for the first response to appear after a user sends a prompt, as measured by the API gateway. This number increases as the prompt size increases and/or cache hit size reduces. Note: this metric is an approximation as measured latency is heavily dependent on multiple factors, including concurrent calls and overall workload pattern. In addition, it does not account for any client- side latency that may exist between your client and the API endpoint. Please refer to your own logging for optimal latency tracking.| `ModelDepIoymentName`, `ModelName`, and `ModelVersion` |
6466

6567
## Configure diagnostic settings
6668

0 commit comments

Comments
 (0)