Skip to content

Commit be36a78

Browse files
committed
Learn Editor: Update provisioned-throughput.md
1 parent 912d9c7 commit be36a78

File tree

1 file changed

+1
-0
lines changed

1 file changed

+1
-0
lines changed

articles/ai-services/openai/concepts/provisioned-throughput.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ An Azure OpenAI Deployment is a unit of management for a specific OpenAI Model.
3535
| Latency | Max latency constrained from the model. Overall latency is a factor of call shape. |
3636
| Utilization | Provisioned-managed Utilization V2 measure provided in Azure Monitor. |
3737
| Estimating size | Provided calculator in the studio & benchmarking script. |
38+
| Prompt caching | For supported models, we discount up to 100% of cached input tokens. |
3839

3940

4041
## How much throughput per PTU you get for each model

0 commit comments

Comments
 (0)