You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry/concepts/content-filtering.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,7 @@ author: PatrickFarley
26
26
27
27
The content filtering system is powered by [Azure AI Content Safety](../../ai-services/content-safety/overview.md), and it works by running both the prompt input and completion output through a set of classification models designed to detect and prevent the output of harmful content. Variations in API configurations and application design might affect completions and thus filtering behavior.
28
28
29
-
With Azure OpenAI model deployments, you can use the default content filter or create your own content filter (described later on). Models available through **serverless APIs** have content filtering enabled by default. To learn more about the default content filter enabled for serverless APIs, see [Guardrails & controls for Azure Direct Models in the model catalog](model-catalog-content-safety.md).
29
+
With Azure OpenAI model deployments, you can use the default content filter or create your own content filter (described later on). Models available through **standard deployments** have content filtering enabled by default. To learn more about the default content filter enabled for standard deployments, see [Content safety for models curated by Azure AI in the model catalog](model-catalog-content-safety.md).
Copy file name to clipboardExpand all lines: articles/ai-foundry/concepts/deployments-overview.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ The model catalog in Azure AI Foundry portal is the hub to discover and use a wi
20
20
Deployment options vary depending on the model offering:
21
21
22
22
***Azure OpenAI in Azure AI Foundry Models:** The latest OpenAI models that have enterprise features from Azure with flexible billing options.
23
-
***Standard deployment:** These models don't require compute quota from your subscription and are billed per token in a pay-as-you-go fashion.
23
+
***Standard deployment:** These models don't require compute quota from your subscription and are billed per token in a serverless pay per token offer.
24
24
***Open and custom models:** The model catalog offers access to a large variety of models across modalities, including models of open access. You can host open models in your own subscription with a managed infrastructure, virtual machines, and the number of instances for capacity management.
25
25
26
26
Azure AI Foundry offers four different deployment options:
@@ -39,7 +39,7 @@ Azure AI Foundry offers four different deployment options:
| Deployment instructions |[Deploy to Azure OpenAI](../how-to/deploy-models-openai.md)|[Deploy to Foundry Models](../model-inference/how-to/create-model-deployments.md)|[Deploy to Standard deployment](../how-to/deploy-models-serverless.md)|[Deploy to Managed compute](../how-to/deploy-models-managed.md)|
41
41
42
-
<sup>1</sup> A minimal endpoint infrastructure is billed per minute. You aren't billed for the infrastructure that hosts the model in pay-as-you-go. After you delete the endpoint, no further charges accrue.
42
+
<sup>1</sup> A minimal endpoint infrastructure is billed per minute. You aren't billed for the infrastructure that hosts the model in standard deployment. After you delete the endpoint, no further charges accrue.
43
43
44
44
<sup>2</sup> Billing is on a per-minute basis, depending on the product tier and the number of instances used in the deployment since the moment of creation. After you delete the endpoint, no further charges accrue.
Copy file name to clipboardExpand all lines: articles/ai-foundry/concepts/fine-tuning-overview.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -84,15 +84,15 @@ It's important to call out that fine-tuning is heavily dependent on the quality
84
84
## Supported models for fine-tuning
85
85
86
86
Now that you know when to use fine-tuning for your use case, you can go to Azure AI Foundry to find models available to fine-tune.
87
-
For some models in the model catalog, fine-tuning is available by using a serverless API, or a managed compute (preview), or both.
87
+
For some models in the model catalog, fine-tuning is available by using a standard deployment, or a managed compute (preview), or both.
88
88
89
-
Fine-tuning is available in specific Azure regions for some models that are deployed via serverless APIs. To fine-tune such models, a user must have a hub/project in the region where the model is available for fine-tuning. See [Region availability for models in serverless API endpoints](../how-to/deploy-models-serverless-availability.md) for detailed information.
89
+
Fine-tuning is available in specific Azure regions for some models that are deployed via standard deployments. To fine-tune such models, a user must have a hub/project in the region where the model is available for fine-tuning. See [Region availability for models in standard deployment](../how-to/deploy-models-serverless-availability.md) for detailed information.
90
90
91
91
For more information on fine-tuning using a managed compute (preview), see [Fine-tune models using managed compute (preview)](../how-to/fine-tune-managed-compute.md).
92
92
93
93
For details about Azure OpenAI in Azure AI Foundry Models that are available for fine-tuning, see the [Azure OpenAI in Foundry Models documentation](../../ai-services/openai/concepts/models.md#fine-tuning-models) or the [Azure OpenAI models table](#fine-tuning-azure-openai-models) later in this guide.
94
94
95
-
For the Azure OpenAI Service models that you can fine tune, supported regions for fine-tuning include North Central US, Sweden Central, and more.
95
+
For the Azure OpenAI Service models that you can fine tune, supported regions for fine-tuning include North Central US, Sweden Central, and more.
96
96
97
97
### Fine-tuning Azure OpenAI models
98
98
@@ -102,5 +102,5 @@ For the Azure OpenAI Service models that you can fine tune, supported regions f
102
102
103
103
-[Fine-tune models using managed compute (preview)](../how-to/fine-tune-managed-compute.md)
104
104
-[Fine-tune an Azure OpenAI model in Azure AI Foundry portal](../../ai-services/openai/how-to/fine-tuning.md?context=/azure/ai-studio/context/context)
105
-
-[Fine-tune models using serverless API](../how-to/fine-tune-serverless.md)
105
+
-[Fine-tune models using standard deployment](../how-to/fine-tune-serverless.md)
| Region | East US/East US2 |[Serverless APIs](../how-to/model-catalog-overview.md#serverless-api-pay-per-token-billing) and [Azure OpenAI](/azure/ai-services/openai/overview)|
77
-
| Tokens per minute (TPM) rate limit | 30k (180 RPM based on Azure OpenAI) for non-reasoning and 100k for reasoning models <br> N/A (serverless APIs) | For Azure OpenAI models, selection is available for users with rate limit ranges based on deployment type (standard, global, global standard, and so on.) <br> For serverless APIs, this setting is abstracted. |
78
-
| Number of requests | Two requests in a trail for every hour (24 trails per day) |Serverless APIs, Azure OpenAI |
79
-
| Number of trails/runs | 14 days with 24 trails per day for 336 runs |Serverless APIs, Azure OpenAI |
| Number of tokens processed (moderate) | 80:20 ratio for input to output tokens, that is, 800 input tokens to 200 output tokens. |Serverless APIs, Azure OpenAI |
82
-
| Number of concurrent requests | One (requests are sent sequentially one after other) |Serverless APIs, Azure OpenAI |
83
-
| Data | Synthetic (input prompts prepared from static text) |Serverless APIs, Azure OpenAI |
84
-
| Region | East US/East US2 |Serverless APIs and Azure OpenAI |
76
+
| Region | East US/East US2 |[Standard deployments](../how-to/model-catalog-overview.md#standard-deployment-pay-per-token-offer-billing) and [Azure OpenAI](/azure/ai-services/openai/overview)|
77
+
| Tokens per minute (TPM) rate limit | 30k (180 RPM based on Azure OpenAI) for non-reasoning and 100k for reasoning models <br> N/A (standard deployments) | For Azure OpenAI models, selection is available for users with rate limit ranges based on deployment type (standard, global, global standard, and so on.) <br> For standard deployments, this setting is abstracted. |
78
+
| Number of requests | Two requests in a trail for every hour (24 trails per day) |Standard deployments, Azure OpenAI |
79
+
| Number of trails/runs | 14 days with 24 trails per day for 336 runs |Standard deployments, Azure OpenAI |
| Number of tokens processed (moderate) | 80:20 ratio for input to output tokens, that is, 800 input tokens to 200 output tokens. |Standard deployments, Azure OpenAI |
82
+
| Number of concurrent requests | One (requests are sent sequentially one after other) |Standard deployments, Azure OpenAI |
83
+
| Data | Synthetic (input prompts prepared from static text) |Standard deployments, Azure OpenAI |
84
+
| Region | East US/East US2 |Standard deployments and Azure OpenAI |
85
85
| Deployment type | Standard | Applicable only for Azure OpenAI |
86
-
| Streaming | True | Applies to serverless APIs and Azure OpenAI. For models deployed via [managed compute](../how-to/model-catalog-overview.md#managed-compute), or for endpoints when streaming is not supported TTFT is represented as P50 of latency metric. |
86
+
| Streaming | True | Applies to standard deployments and Azure OpenAI. For models deployed via [managed compute](../how-to/model-catalog-overview.md#managed-compute), or for endpoints when streaming is not supported TTFT is represented as P50 of latency metric. |
87
87
| SKU | Standard_NC24ads_A100_v4 (24 cores, 220GB RAM, 64GB storage) | Applicable only for Managed Compute (to estimate the cost and perf metrics) |
88
88
89
89
The performance of LLMs and SLMs is assessed across the following metrics:
@@ -111,14 +111,14 @@ For performance metrics like latency or throughput, the time to first token and
111
111
112
112
### Cost
113
113
114
-
Cost calculations are estimates for using an LLM or SLM model endpoint hosted on the Azure AI platform. Azure AI supports displaying the cost of serverless APIs and Azure OpenAI models. Because these costs are subject to change, we refresh our cost calculations on a regular cadence.
114
+
Cost calculations are estimates for using an LLM or SLM model endpoint hosted on the Azure AI platform. Azure AI supports displaying the cost of standard deployments and Azure OpenAI models. Because these costs are subject to change, we refresh our cost calculations on a regular cadence.
115
115
116
116
The cost of LLMs and SLMs is assessed across the following metrics:
117
117
118
118
| Metric | Description |
119
119
|--------|-------------|
120
-
| Cost per input tokens | Cost for serverless API deployment for 1 million input tokens |
121
-
| Cost per output tokens | Cost for serverless API deployment for 1 million output tokens |
120
+
| Cost per input tokens | Cost for standard deployment for 1 million input tokens |
121
+
| Cost per output tokens | Cost for standard deployment for 1 million output tokens |
122
122
| Estimated cost | Cost for the sum of cost per input tokens and cost per output tokens, with a ratio of 3:1. |
0 commit comments