Skip to content

Commit 7000a49

Browse files
Merge pull request #280751 from mrbullwinkle/mrb_07_12_2024_monitoring
[Azure OpenAI] Monitoring
2 parents 2a2422e + 64cf818 commit 7000a49

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

articles/ai-services/openai/how-to/monitoring.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ ms.author: mbullwin
66
ms.service: azure-ai-openai
77
ms.topic: how-to
88
ms.custom: subject-monitoring
9-
ms.date: 04/16/2024
9+
ms.date: 07/12/2024
1010
---
1111

1212
# Monitoring Azure OpenAI Service
@@ -56,6 +56,7 @@ The following table summarizes the current subset of metrics available in Azure
5656
|Metric|Category|Aggregation|Description|Dimensions|
5757
|---|---|---|---|---|
5858
|`Azure OpenAI Requests`|HTTP|Count|Total number of calls made to the Azure OpenAI API over a period of time. Applies to PayGo, PTU, and PTU-managed SKUs.| `ApiName`, `ModelDeploymentName`,`ModelName`,`ModelVersion`, `OperationName`, `Region`, `StatusCode`, `StreamType`|
59+
| `Active Tokens` | Usage | Total tokens minus cached tokens over a period of time. Applies to PTU and PTU-managed deployments. Use this metric to understand your TPS or TPM based utilization for PTUs and compare to your benchmarks for target TPS or TPM for your scenarios. | `ModelDeploymentName`,`ModelName`,`ModelVersion` |
5960
| `Generated Completion Tokens` | Usage | Sum | Number of generated tokens (output) from an Azure OpenAI model. Applies to PayGo, PTU, and PTU-manged SKUs | `ApiName`, `ModelDeploymentName`,`ModelName`, `Region`|
6061
| `Processed FineTuned Training Hours` | Usage |Sum| Number of training hours processed on an Azure OpenAI fine-tuned model. | `ApiName`, `ModelDeploymentName`,`ModelName`, `Region`|
6162
| `Processed Inference Tokens` | Usage | Sum| Number of inference tokens processed by an Azure OpenAI model. Calculated as prompt tokens (input) + generated tokens. Applies to PayGo, PTU, and PTU-manged SKUs.|`ApiName`, `ModelDeploymentName`,`ModelName`, `Region`|
@@ -64,6 +65,7 @@ The following table summarizes the current subset of metrics available in Azure
6465
|`Prompt Token Cache Match Rate` | HTTP | Average | **Provisioned-managed only**. The prompt token cache hit ration expressed as a percentage. | `ModelDeploymentName`, `ModelVersion`, `ModelName`, `Region`|
6566
|`Time to Response` | HTTP | Average | Recommended latency (responsiveness) measure for streaming requests. **Applies to PTU, and PTU-managed deployments**. This metric does not apply to standard pay-go deployments. Calculated as time taken for the first response to appear after a user sends a prompt, as measured by the API gateway. This number increases as the prompt size increases and/or cache hit size reduces. Note: this metric is an approximation as measured latency is heavily dependent on multiple factors, including concurrent calls and overall workload pattern. In addition, it does not account for any client- side latency that may exist between your client and the API endpoint. Please refer to your own logging for optimal latency tracking.| `ModelDepIoymentName`, `ModelName`, and `ModelVersion` |
6667

68+
6769
## Configure diagnostic settings
6870

6971
All of the metrics are exportable with [diagnostic settings in Azure Monitor](/azure/azure-monitor/essentials/diagnostic-settings). To analyze logs and metrics data with Azure Monitor Log Analytics queries, you need to configure diagnostic settings for your Azure OpenAI resource and your Log Analytics workspace.

0 commit comments

Comments
 (0)