Merge pull request #5723 from msakande/fix-serverless-rename-bugs

prmerger-automator[bot] · web-flow · commit caaef8141a43 · 2025-06-25T21:42:47.000Z
fix terminology
diff --git a/articles/ai-foundry/model-inference/concepts/deployment-types.md b/articles/ai-foundry/model-inference/concepts/deployment-types.md
@@ -98,13 +98,13 @@ Data zone provisioned deployments are available in the same Azure AI Foundry res
 
 Data zone batch deployments provide all the same functionality as global batch deployments while allowing you to leverage Azure global infrastructure to dynamically route traffic to only data centers within the Microsoft defined data zone with the best availability for each request. 
 
-## Serverless API
+## Standard
 
-**SKU name in code:** `Serverless API`
+**SKU name in code:** `Standard`
 
-Serverless API deployments provide a pay-per-call billing model on the chosen model. Provides the fastest way to get started as you only pay for what you consume. Models available in each region as well as throughput may be limited.  
+Standard deployments provide a pay-per-call billing model on the chosen model. Provides the fastest way to get started as you only pay for what you consume. Models available in each region as well as throughput may be limited.  
 
-Serverless API deployments are optimized for low to medium volume workloads with high burstiness. Customers with high consistent volume may experience greater latency variability.
+Standard deployments are optimized for low to medium volume workloads with high burstiness. Customers with high consistent volume may experience greater latency variability.
 
 ## Provisioned
 
diff --git a/articles/ai-foundry/model-inference/how-to/monitor-models.md b/articles/ai-foundry/model-inference/how-to/monitor-models.md
@@ -158,9 +158,9 @@ The following categories of metrics are available:
 
 | Metric | Internal name | Unit | Aggregation | Dimensions |
 |--------|---------------|------|-------------|------------|
-| **Input Tokens**<br /><br /> Number of prompt tokens processed (input) on a model. Applies to PTU, PTU-managed and serverless API deployments. | `InputTokens` | Count | Total (Sum) | `ApiName`, `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion` |
-| **Output Tokens**<br /><br /> Number of tokens generated (output) from a model. Applies to PTU, PTU-managed and serverless API deployments. | `OutputTokens` | Count | Total (Sum) | `ApiName`, `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion` |
-| **Total Tokens**<br /><br /> Number of inference tokens processed on a model. Calculated as prompt tokens (input) plus generated tokens (output). Applies to PTU, PTU-managed and serverless API deployments. | `TotalTokens` | Count | Total (Sum) | `ApiName`, `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion` |
+| **Input Tokens**<br /><br /> Number of prompt tokens processed (input) on a model. Applies to PTU, PTU-managed and standard deployments. | `InputTokens` | Count | Total (Sum) | `ApiName`, `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion` |
+| **Output Tokens**<br /><br /> Number of tokens generated (output) from a model. Applies to PTU, PTU-managed and standard deployments. | `OutputTokens` | Count | Total (Sum) | `ApiName`, `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion` |
+| **Total Tokens**<br /><br /> Number of inference tokens processed on a model. Calculated as prompt tokens (input) plus generated tokens (output). Applies to PTU, PTU-managed and standard deployments. | `TotalTokens` | Count | Total (Sum) | `ApiName`, `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion` |
 | **Tokens Cache Match Rate**<br /><br /> Percentage of prompt tokens that hit the cache. Applies to PTU and PTU-managed deployments. | `TokensCacheMatchRate` | Percentage | Average | `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion` |
 | **Provisioned Utilization**<br /><br /> Utilization % for a provisoned-managed deployment, calculated as (PTUs consumed / PTUs deployed) x 100. When utilization is greater than or equal to 100%, calls are throttled and error code 429 returned. | `TokensCacheMatchRate ` | Percentage | Average | `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion` |
 | **Provisioned Consumed Tokens**<br /><br /> Total tokens minus cached tokens over a period of time. Applies to PTU and PTU-managed deployments. | `ProvisionedConsumedTokens` | Count | Total (Sum) | `Region`, `ModelDeploymentName`, `ModelName`, `ModelVersion` |