Skip to content

Commit caaef81

Browse files
Merge pull request #5723 from msakande/fix-serverless-rename-bugs
fix terminology
2 parents cbc7752 + c72fc73 commit caaef81

File tree

2 files changed

+7
-7
lines changed

2 files changed

+7
-7
lines changed

articles/ai-foundry/model-inference/concepts/deployment-types.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -98,13 +98,13 @@ Data zone provisioned deployments are available in the same Azure AI Foundry res
9898

9999
Data zone batch deployments provide all the same functionality as global batch deployments while allowing you to leverage Azure global infrastructure to dynamically route traffic to only data centers within the Microsoft defined data zone with the best availability for each request.
100100

101-
## Serverless API
101+
## Standard
102102

103-
**SKU name in code:** `Serverless API`
103+
**SKU name in code:** `Standard`
104104

105-
Serverless API deployments provide a pay-per-call billing model on the chosen model. Provides the fastest way to get started as you only pay for what you consume. Models available in each region as well as throughput may be limited.
105+
Standard deployments provide a pay-per-call billing model on the chosen model. Provides the fastest way to get started as you only pay for what you consume. Models available in each region as well as throughput may be limited.
106106

107-
Serverless API deployments are optimized for low to medium volume workloads with high burstiness. Customers with high consistent volume may experience greater latency variability.
107+
Standard deployments are optimized for low to medium volume workloads with high burstiness. Customers with high consistent volume may experience greater latency variability.
108108

109109
## Provisioned
110110

articles/ai-foundry/model-inference/how-to/monitor-models.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -158,9 +158,9 @@ The following categories of metrics are available:
158158
159159
| Metric | Internal name | Unit | Aggregation | Dimensions |
160160
|--------|---------------|------|-------------|------------|
161-
| **Input Tokens**<br /><br /> Number of prompt tokens processed (input) on a model. Applies to PTU, PTU-managed and serverless API deployments. | `InputTokens` | Count | Total (Sum) | `ApiName`, `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion` |
162-
| **Output Tokens**<br /><br /> Number of tokens generated (output) from a model. Applies to PTU, PTU-managed and serverless API deployments. | `OutputTokens` | Count | Total (Sum) | `ApiName`, `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion` |
163-
| **Total Tokens**<br /><br /> Number of inference tokens processed on a model. Calculated as prompt tokens (input) plus generated tokens (output). Applies to PTU, PTU-managed and serverless API deployments. | `TotalTokens` | Count | Total (Sum) | `ApiName`, `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion` |
161+
| **Input Tokens**<br /><br /> Number of prompt tokens processed (input) on a model. Applies to PTU, PTU-managed and standard deployments. | `InputTokens` | Count | Total (Sum) | `ApiName`, `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion` |
162+
| **Output Tokens**<br /><br /> Number of tokens generated (output) from a model. Applies to PTU, PTU-managed and standard deployments. | `OutputTokens` | Count | Total (Sum) | `ApiName`, `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion` |
163+
| **Total Tokens**<br /><br /> Number of inference tokens processed on a model. Calculated as prompt tokens (input) plus generated tokens (output). Applies to PTU, PTU-managed and standard deployments. | `TotalTokens` | Count | Total (Sum) | `ApiName`, `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion` |
164164
| **Tokens Cache Match Rate**<br /><br /> Percentage of prompt tokens that hit the cache. Applies to PTU and PTU-managed deployments. | `TokensCacheMatchRate` | Percentage | Average | `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion` |
165165
| **Provisioned Utilization**<br /><br /> Utilization % for a provisoned-managed deployment, calculated as (PTUs consumed / PTUs deployed) x 100. When utilization is greater than or equal to 100%, calls are throttled and error code 429 returned. | `TokensCacheMatchRate ` | Percentage | Average | `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion` |
166166
| **Provisioned Consumed Tokens**<br /><br /> Total tokens minus cached tokens over a period of time. Applies to PTU and PTU-managed deployments. | `ProvisionedConsumedTokens` | Count | Total (Sum) | `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion` |

0 commit comments

Comments
 (0)