You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/translator/text-translation/reference/v4/reference-overview.md
+12-6Lines changed: 12 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -39,6 +39,12 @@ Metrics allow you to view the translator usage and availability information in A
39
39
40
40
:::image type="content" source="../../../media/azure-portal-metrics-v4.png" alt-text="Screenshot of HTTP request metrics in the Azure portal.":::
41
41
42
+
#### Metrics terminology
43
+
44
+
***PTU**: provisioned throughput units
45
+
***TPS**: transactions per second
46
+
***TPM**: tokens per minute
47
+
42
48
The following tables list available metrics with description of how they're used to monitor **Translator resource** API calls.
43
49
44
50
#### Translator resource HTTP requests
@@ -72,19 +78,19 @@ The following tables list available metrics with description of how they're used
72
78
| Metrics | Description |
73
79
|:----|:-----|
74
80
|`AzureOpenAIAvailabilityRate`|Availability percentage with the following calculation:<br>`(Total Calls - Server Errors) / Total Calls`. Server Errors include any HTTP response >= 500.|
75
-
|`AzureOpenAIRequests`|Number of calls made to the Azure OpenAI API over a period of time. Applies to Provisioned Throughput Units (`PTU`), `PTU`-managed, and Pay-as-you-go deployments. To breakdown API requests, you can add a filter or apply splitting by the following dimensions: <br> `ModelDeploymentName`, `ModelName`, `ModelVersion`, `StatusCode` (successful, client errors, server errors), `StreamType` (streaming vs nonstreaming requests), and `Operation`.|
81
+
|`AzureOpenAIRequests`|Number of calls made to the Azure OpenAI API over a period of time. Applies to `PTU`, `PTU`-managed, and Pay-as-you-go deployments. To breakdown API requests, you can add a filter or apply splitting by the following dimensions: <br> `ModelDeploymentName`, `ModelName`, `ModelVersion`, `StatusCode` (successful, client errors, server errors), `StreamType` (streaming vs nonstreaming requests), and `Operation`.|
76
82
77
83
#### Azure OpenAI usage
78
84
79
85
| Metrics | Description |
80
86
|:----|:-----|
81
-
|`ActiveTokens`|Total tokens minus cached tokens over a period of time. Applies to Provisioned Throughput Units (`PTU`) and `PTU`-managed deployments. Use this metric to understand your TPS- or TPM-based utilization for `PTU`s and compare your benchmarks for target TPS or TPM for your scenarios. <br> To breakdown API requests you can add a filter or apply splitting by the following dimensions: `ModelDeploymentName`, `ModelName`, `ModelVersion`.|
82
-
|`GeneratedTokens`|Number of tokens generated (output) from an OpenAI model. Applies to `PTU`, `PTU`-managed, and Pay-as-you-go deployments. To breakdown this metric, you can add a filter or apply splitting by the following dimensions:<br>`ModelDeploymentName`or `ModelName`.|
87
+
|`ActiveTokens`|Total tokens minus cached tokens over a period of time. Applies to `PTU` and `PTU`-managed deployments. Use this metric to understand your `TPS`- or `TPM`-based utilization for `PTU`s and compare your benchmarks for target `TPS` or `TPM` for your scenarios. <br> To breakdown API requests, you can add a filter or apply splitting by the following dimensions: `ModelDeploymentName`, `ModelName`, `ModelVersion`.|
88
+
|`GeneratedTokens`|Number of tokens generated (output) from an OpenAI model. Applies to `PTU`, `PTU`-managed, and Pay-as-you-go deployments. To break down this metric, you can add a filter or apply splitting by the following dimensions:<br>`ModelDeploymentName`or `ModelName`.|
83
89
|`FineTunedTrainingHours`|Number of training hours processed on an OpenAI fine-tuned model.|
84
-
|`TokenTransaction`|Number of inference tokens processed on an OpenAI model. Calculated as prompt tokens (input) plus generated tokens (output). Applies to `PTU`, `PTU`-managed, and Pay-as-you-go deployments. To breakdown this metric, you can add a filter or apply splitting by the following dimensions:<br>`ModelDeploymentName`or `ModelName`.|
85
-
|`ProcessedPromptTokens`|Number of prompt tokens processed (input) on an OpenAI model. Applies to `PTU`, `PTU`-managed, and Pay-as-you-go deployments. To breakdown this metric, you can add a filter or apply splitting by the following dimensions:<br>`ModelDeploymentName`or `ModelName`.|
90
+
|`TokenTransaction`|Number of inference tokens processed on an OpenAI model. Calculated as prompt tokens (input) plus generated tokens (output). Applies to `PTU`, `PTU`-managed, and Pay-as-you-go deployments. To break down this metric, you can add a filter or apply splitting by the following dimensions:<br>`ModelDeploymentName`or `ModelName`.|
91
+
|`ProcessedPromptTokens`|Number of prompt tokens processed (input) on an OpenAI model. Applies to `PTU`, `PTU`-managed, and Pay-as-you-go deployments. To break down this metric, you can add a filter or apply splitting by the following dimensions:<br>`ModelDeploymentName`or `ModelName`.|
86
92
|`AzureOpenAIContextTokensCacheMatchRate`|Percentage of prompt tokens that hit the cache. Applies to `PTU` and `PTU`-managed deployments.
87
-
|`AzureOpenAIProvisionedManagedUtilizationV2`|Utilization percentage for a provisioned-managed deployment, calculated as (PTUs consumed / PTUs deployed) x 100. When utilization is greater than or equal to 100%, calls are throttled and error code 429 is returned. To breakdown this metric, you can add a filter or apply splitting by the following dimensions: `ModelDeploymentName`, `ModelName`, `ModelVersion`, and `StreamType` (streaming vs non-streaming requests).|
93
+
|`AzureOpenAIProvisionedManagedUtilizationV2`|Utilization percentage for a provisioned-managed deployment, calculated as (`PTU`s consumed / `PTU`s deployed) x 100. When utilization is greater than or equal to 100%, calls are throttled and error code 429 is returned. To break down this metric, you can add a filter or apply splitting by the following dimensions: `ModelDeploymentName`, `ModelName`, `ModelVersion`, and `StreamType` (streaming vs nonstreaming requests).|
0 commit comments