We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 912d9c7 commit be36a78Copy full SHA for be36a78
articles/ai-services/openai/concepts/provisioned-throughput.md
@@ -35,6 +35,7 @@ An Azure OpenAI Deployment is a unit of management for a specific OpenAI Model.
35
| Latency | Max latency constrained from the model. Overall latency is a factor of call shape. |
36
| Utilization | Provisioned-managed Utilization V2 measure provided in Azure Monitor. |
37
| Estimating size | Provided calculator in the studio & benchmarking script. |
38
+| Prompt caching | For supported models, we discount up to 100% of cached input tokens. |
39
40
41
## How much throughput per PTU you get for each model
0 commit comments