You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/monitoring.md
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,7 +6,7 @@ ms.author: mbullwin
6
6
ms.service: azure-ai-openai
7
7
ms.topic: how-to
8
8
ms.custom: subject-monitoring
9
-
ms.date: 03/29/2024
9
+
ms.date: 04/16/2024
10
10
---
11
11
12
12
# Monitoring Azure OpenAI Service
@@ -60,7 +60,9 @@ The following table summarizes the current subset of metrics available in Azure
60
60
|`Processed FineTuned Training Hours`| Usage |Sum| Number of training hours processed on an Azure OpenAI fine-tuned model. |`ApiName`, `ModelDeploymentName`,`ModelName`, `Region`|
61
61
|`Processed Inference Tokens`| Usage | Sum| Number of inference tokens processed by an Azure OpenAI model. Calculated as prompt tokens (input) + generated tokens. Applies to PayGo, PTU, and PTU-manged SKUs.|`ApiName`, `ModelDeploymentName`,`ModelName`, `Region`|
62
62
|`Processed Prompt Tokens`| Usage | Sum | Total number of prompt tokens (input) processed on an Azure OpenAI model. Applies to PayGo, PTU, and PTU-managed SKUs.|`ApiName`, `ModelDeploymentName`,`ModelName`, `Region`|
63
-
|`Provision-managed Utilization V2`| Usage | Average | Provision-managed utilization is the utilization percentage for a given provisioned-managed deployment. Calculated as (PTUs consumed/PTUs deployed)*100. When utilization is at or above 100%, calls are throttled and return a 429 error code. |`ModelDeploymentName`,`ModelName`,`ModelVersion`, `Region`, `StreamType`|
63
+
|`Provision-managed Utilization V2`| HTTP | Average | Provision-managed utilization is the utilization percentage for a given provisioned-managed deployment. Calculated as (PTUs consumed/PTUs deployed)*100. When utilization is at or above 100%, calls are throttled and return a 429 error code. |`ModelDeploymentName`,`ModelName`,`ModelVersion`, `Region`, `StreamType`|
64
+
|`Prompt Token Cache Match Rate`| HTTP | Average |**Provisioned-managed only**. The prompt token cache hit ration expressed as a percentage. |`ModelDeploymentName`, `ModelVersion`, `ModelName`, `Region`|
65
+
|`Time to Response`| HTTP | Average | Recommended latency (responsiveness) measure for streaming requests. **Applies to PTU, and PTU-managed deployments**. This metric does not apply to standard pay-go deployments. Calculated as time taken for the first response to appear after a user sends a prompt, as measured by the API gateway. This number increases as the prompt size increases and/or cache hit size reduces. Note: this metric is an approximation as measured latency is heavily dependent on multiple factors, including concurrent calls and overall workload pattern. In addition, it does not account for any client- side latency that may exist between your client and the API endpoint. Please refer to your own logging for optimal latency tracking.|`ModelDepIoymentName`, `ModelName`, and `ModelVersion`|
0 commit comments