You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|`Azure OpenAI Requests`|HTTP|Count|Total number of calls made to the Azure OpenAI API over a period of time. Applies to PayGo, PTU, and PTU-managed SKUs.|`ApiName`, `ModelDeploymentName`,`ModelName`,`ModelVersion`, `OperationName`, `Region`, `StatusCode`, `StreamType`|
59
+
|`Active Tokens`| Usage | Total tokens minus cached tokens over a period of time. Applies to PTU and PTU-managed deployments. Use this metric to understand your TPS or TPM based utilization for PTUs and compare to your benchmarks for target TPS or TPM for your scenarios. |`ModelDeploymentName`,`ModelName`,`ModelVersion`|
59
60
|`Generated Completion Tokens`| Usage | Sum | Number of generated tokens (output) from an Azure OpenAI model. Applies to PayGo, PTU, and PTU-manged SKUs |`ApiName`, `ModelDeploymentName`,`ModelName`, `Region`|
60
61
|`Processed FineTuned Training Hours`| Usage |Sum| Number of training hours processed on an Azure OpenAI fine-tuned model. |`ApiName`, `ModelDeploymentName`,`ModelName`, `Region`|
61
62
|`Processed Inference Tokens`| Usage | Sum| Number of inference tokens processed by an Azure OpenAI model. Calculated as prompt tokens (input) + generated tokens. Applies to PayGo, PTU, and PTU-manged SKUs.|`ApiName`, `ModelDeploymentName`,`ModelName`, `Region`|
@@ -64,6 +65,7 @@ The following table summarizes the current subset of metrics available in Azure
64
65
|`Prompt Token Cache Match Rate`| HTTP | Average |**Provisioned-managed only**. The prompt token cache hit ration expressed as a percentage. |`ModelDeploymentName`, `ModelVersion`, `ModelName`, `Region`|
65
66
|`Time to Response`| HTTP | Average | Recommended latency (responsiveness) measure for streaming requests. **Applies to PTU, and PTU-managed deployments**. This metric does not apply to standard pay-go deployments. Calculated as time taken for the first response to appear after a user sends a prompt, as measured by the API gateway. This number increases as the prompt size increases and/or cache hit size reduces. Note: this metric is an approximation as measured latency is heavily dependent on multiple factors, including concurrent calls and overall workload pattern. In addition, it does not account for any client- side latency that may exist between your client and the API endpoint. Please refer to your own logging for optimal latency tracking.|`ModelDepIoymentName`, `ModelName`, and `ModelVersion`|
66
67
68
+
67
69
## Configure diagnostic settings
68
70
69
71
All of the metrics are exportable with [diagnostic settings in Azure Monitor](/azure/azure-monitor/essentials/diagnostic-settings). To analyze logs and metrics data with Azure Monitor Log Analytics queries, you need to configure diagnostic settings for your Azure OpenAI resource and your Log Analytics workspace.
0 commit comments