Merge branch 'MicrosoftDocs:main' into heidist-uuf

HeidiSteen · web-flow · commit 29e6e91ea5ac · 2025-01-28T07:47:36.000-08:00
diff --git a/articles/ai-foundry/model-inference/concepts/deployment-types.md b/articles/ai-foundry/model-inference/concepts/deployment-types.md
@@ -13,43 +13,35 @@ ms.custom: ignite-2024, github-universe-2024
 
 # Deployment types in Azure AI model inference
 
-Azure AI model inference in Azure AI services provides customers with choices on the hosting structure that fits their business and usage patterns. The service offers two main types of deployment: **standard** and **provisioned**. Standard is offered with a global deployment option, routing traffic globally to provide higher throughput. Provisioned is also offered with a global deployment option, allowing customers to purchase and deploy provisioned throughput units across Azure global infrastructure.
+Azure AI model inference makes models available using the *model deployment* concept in Azure AI Services resources. *Model deployments* are also Azure resources and, when created, they give access to a given model under certain configurations. Such configuration includes the infrastructure require to process the requests. 
 
-All deployments can perform the exact same inference operations, however the billing, scale, and performance are substantially different. As part of your solution design, you need to make two key decisions:
+Azure AI model inference provides customers with choices on the hosting structure that fits their business and usage patterns. Those options are translated to different deployments types (or SKUs) that are available at model deployment time in the Azure AI Services resource.
 
-- **Data residency needs**: global vs. regional resources  
-- **Call volume**: standard vs. provisioned
+:::image type="content" source="../media/add-model-deployments/models-deploy-deployment-type.png" alt-text="Screenshot showing how to customize the deployment type for a given model deployment." lightbox="../media/add-model-deployments/models-deploy-deployment-type.png":::
 
-Deployment types support varies by model and model provider. You can see which deployment type (SKU) each model supports in the [Models section](models.md). 
+Different model providers offer different deployments SKUs that you can select from. When selecting a deployment type, consider your **data residency needs** and **call volume/capacity** requirements.
 
-## Global versus regional deployment types
+## Deployment types for Azure OpenAI models
 
-For standard and provisioned deployments, you have an option of two types of configurations within your resource – **global** or **regional**. Global standard is the recommended starting point. 
+The service offers two main types of deployments: **standard** and **provisioned**. For a given deployment type, customers can align their workloads with their data processing requirements by choosing an Azure geography (`Standard` or `Provisioned-Managed`), Microsoft specified data zone (`DataZone-Standard` or `DataZone Provisioned-Managed`), or Global (`Global-Standard` or `Global Provisioned-Managed`) processing options.
 
-Global deployments leverage Azure's global infrastructure, dynamically route customer traffic to the data center with best availability for the customer's inference requests. This means you get the highest initial throughput limits and best model availability with Global while still providing our uptime SLA and low latency. For high volume workloads above the specified usage tiers on standard and global standard, you may experience increased latency variation. For customers that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput.
+To learn more about deployment options for Azure OpenAI models see [Azure OpenAI documentation](../../../ai-services/openai/how-to/deployment-types.md).
 
-Our global deployments are the first location for all new models and features. Customers with large throughput requirements should consider our provisioned deployment offering.
+## Deployment types for Models-as-a-Service models
 
-## Standard
+Models from third-party model providers with pay-as-you-go billing (collectively called Models-as-a-Service), makes models available in Azure AI model inference under **standard** deployments with a Global processing option (`Global-Standard`). 
 
-Standard deployments provide a pay-per-call billing model on the chosen model. Provides the fastest way to get started as you only pay for what you consume. Models available in each region and throughput may be limited.  
+Models-as-a-Service offers regional deployment options under [Serverless API endpoints](../../../ai-studio/how-to/deploy-models-serverless.md) in Azure AI Foundry. Prompts and outputs are processed within the geography specified during deployment. However, those deployments can't be accessed using the Azure AI model inference endpoint in Azure AI Services.
 
-Standard deployments are optimized for low to medium volume workloads with high burstiness. Customers with high consistent volume may experience greater latency variability.
+### Global-Standard
 
-Only Azure OpenAI models support this deployment type.
+Global deployments leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global standard provides the highest default quota and eliminates the need to load balance across multiple resources. Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure location. Learn more about [data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
 
-## Global standard
+## Control deployment options
 
-Global deployments are available in the same Azure AI services resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request.  Global standard provides the highest default quota and eliminates the need to load balance across multiple resources.  
+Administrators can control which model deployment types are available to their users by using Azure Policies. Learn more about [How to control AI model deployment with custom policies](../../../ai-studio/how-to/custom-policy-model-deployment.md).
 
-Customers with high consistent volume may experience greater latency variability. The threshold is set per model. For applications that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput if available.
+## Related content
 
-## Global provisioned
-
-Global deployments are available in the same Azure AI services resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global provisioned deployments provide reserved model processing capacity for high and predictable throughput using Azure global infrastructure.
-
-Only Azure OpenAI models support this deployment type.
-
-## Next steps
-
-- [Quotas & limits](../quotas-limits.md)
+- [Quotas & limits](../quotas-limits.md)
+- [Data privacy, and security for Models-as-a-Service models](../../../ai-studio/how-to/concept-data-privacy.md)
diff --git a/articles/ai-foundry/model-inference/includes/code-create-chat-client-entra.md b/articles/ai-foundry/model-inference/includes/code-create-chat-client-entra.md
@@ -23,11 +23,11 @@ Then, you can use the package to consume the model. The following example shows
 ```python
 import os
 from azure.ai.inference import ChatCompletionsClient
-from azure.identity import AzureDefaultCredential
+from azure.identity import DefaultAzureCredential
 
 model = ChatCompletionsClient(
     endpoint="https://<resource>.services.ai.azure.com/models",
-    credential=AzureDefaultCredential(),
+    credential=DefaultAzureCredential(),
     model="mistral-large-2407",
 )
 ```
@@ -45,11 +45,11 @@ Then, you can use the package to consume the model. The following example shows
 ```javascript
 import ModelClient from "@azure-rest/ai-inference";
 import { isUnexpected } from "@azure-rest/ai-inference";
-import { AzureDefaultCredential } from "@azure/identity";
+import { DefaultAzureCredential } from "@azure/identity";
 
 const client = new ModelClient(
     "https://<resource>.services.ai.azure.com/models", 
-    new AzureDefaultCredential(),
+    new DefaultAzureCredential(),
     "mistral-large-2407"
 );
 ```
@@ -81,7 +81,7 @@ Then, you can use the package to consume the model. The following example shows
 ```csharp
 ChatCompletionsClient client = new ChatCompletionsClient(
     new Uri("https://<resource>.services.ai.azure.com/models"),
-    new AzureDefaultCredential(includeInteractiveCredentials: true),
+    new DefaultAzureCredential(includeInteractiveCredentials: true),
     "mistral-large-2407"
 );
 ```
diff --git a/articles/ai-foundry/model-inference/includes/code-create-embeddings-client.md b/articles/ai-foundry/model-inference/includes/code-create-embeddings-client.md
@@ -36,11 +36,11 @@ If you are using an endpoint with support for Entra ID, you can create your clie
 ```python
 import os
 from azure.ai.inference import EmbeddingsClient
-from azure.identity import AzureDefaultCredential
+from azure.identity import DefaultAzureCredential
 
 client = EmbeddingsClient(
     endpoint="https://<resource>.services.ai.azure.com/models",
-    credential=AzureDefaultCredential(),
+    credential=DefaultAzureCredential(),
 )
 ```
 
@@ -72,11 +72,11 @@ For endpoint with support for Microsoft Entra ID, you can create your client as
 ```javascript
 import ModelClient from "@azure-rest/ai-inference";
 import { isUnexpected } from "@azure-rest/ai-inference";
-import { AzureDefaultCredential } from "@azure/identity";
+import { DefaultAzureCredential } from "@azure/identity";
 
 const client = new ModelClient(
     "https://<resource>.services.ai.azure.com/models", 
-    new AzureDefaultCredential()
+    new DefaultAzureCredential()
 );
 ```
 
diff --git a/articles/ai-foundry/model-inference/includes/configure-project-connection/portal.md b/articles/ai-foundry/model-inference/includes/configure-project-connection/portal.md
@@ -10,7 +10,11 @@ zone_pivot_groups: azure-ai-models-deployment
 
 [!INCLUDE [Header](intro.md)]
 
-* An AI project connected to your Azure AI Services resource. You call follow the steps at [Configure Azure AI model inference service in my project](../../how-to/configure-project-connection.md) in Azure AI Foundry.
+* An AI project resource.
+
+* The feature **Deploy models to Azure AI model inference service** on.
+
+   :::image type="content" source="../../media/quickstart-ai-project/ai-project-inference-endpoint.gif" alt-text="An animation showing how to turn on the Deploy models to Azure AI model inference service feature in Azure AI Foundry portal." lightbox="../../media/quickstart-ai-project/ai-project-inference-endpoint.gif":::
 
 ## Add a connection
 
@@ -50,4 +54,4 @@ You can see the model deployments available in the connected resource by followi
 
 5. The details page shows information about the specific deployment. If you want to test the model, you can use the option **Open in playground**.
 
-6. The Azure AI Foundry playground is displayed, where you can interact with the given model.
+6. The Azure AI Foundry playground is displayed, where you can interact with the given model.
diff --git a/articles/ai-services/language-service/custom-named-entity-recognition/includes/use-pre-existing-resource.md b/articles/ai-services/language-service/custom-named-entity-recognition/includes/use-pre-existing-resource.md
@@ -51,7 +51,7 @@ Make sure to enable **Custom text classification / Custom Named Entity Recogniti
 5. Select **Apply**.
 
 >[!Important]
-> * Make sure that your **Language resource** has **storage blob data contributor** role assigned on the storage account you are connecting.
+> Make sure that the user making changes has **storage blob data contributor** role assigned for them.
 
 ### Add required roles
 
diff --git a/articles/ai-services/openai/how-to/prompt-caching.md b/articles/ai-services/openai/how-to/prompt-caching.md
@@ -75,16 +75,16 @@ A single character difference in the first 1,024 tokens will result in a cache m
 
 ## What is cached?
 
-The o1-series models are text only and don't support system messages, images, tool use/function calling, or structured outputs. This limits the efficacy of prompt caching for these models to the user/assistant portions of the messages array which are less likely to have an identical 1024 token prefix.
+o1-series models feature support varies by model. For more details, see our dedicated [reasoning models guide](./reasoning.md). 
 
 Prompt caching is supported for:
 
 |**Caching supported**|**Description**|**Supported models**|
 |--------|--------|--------|
-| **Messages** | The complete messages array: system, user, and assistant content | `gpt-4o`<br/>`gpt-4o-mini`<br/>`gpt-4o-realtime-preview` (version 2024-12-17) |
-| **Images** | Images included in user messages, both as links or as base64-encoded data. The detail parameter must be set the same across requests. | `gpt-4o`<br/>`gpt-4o-mini` |
-| **Tool use** | Both the messages array and tool definitions. | `gpt-4o`<br/>`gpt-4o-mini`<br/>`gpt-4o-realtime-preview` (version 2024-12-17) |
-| **Structured outputs** | Structured output schema is appended as a prefix to the system message. | `gpt-4o`<br/>`gpt-4o-mini` |
+| **Messages** | The complete messages array: system, developer, user, and assistant content | `gpt-4o`<br/>`gpt-4o-mini`<br/>`gpt-4o-realtime-preview` (version 2024-12-17) <br> `o1` (version 2024-12-17) |
+| **Images** | Images included in user messages, both as links or as base64-encoded data. The detail parameter must be set the same across requests. | `gpt-4o`<br/>`gpt-4o-mini` <br> `o1` (version 2024-12-17)  |
+| **Tool use** | Both the messages array and tool definitions. | `gpt-4o`<br/>`gpt-4o-mini`<br/>`gpt-4o-realtime-preview` (version 2024-12-17) <br> `o1` (version 2024-12-17) |
+| **Structured outputs** | Structured output schema is appended as a prefix to the system message. | `gpt-4o`<br/>`gpt-4o-mini` <br> `o1` (version 2024-12-17) |
 
 To improve the likelihood of cache hits occurring, you should structure your requests such that repetitive content occurs at the beginning of the messages array.
 
diff --git a/articles/ai-services/translator/firewalls.md b/articles/ai-services/translator/firewalls.md
@@ -1,19 +1,19 @@
 ---
-title: Translate behind firewalls - Translator
+title: Use Azure AI Translator to translate behind firewalls.
 titleSuffix: Azure AI services
 description: Azure AI Translator can translate behind firewalls using either domain-name or IP filtering.
 #services: cognitive-services
 author: laujan
 manager: nitinme
 ms.service: azure-ai-translator
 ms.topic: conceptual
-ms.date: 07/09/2024
+ms.date: 01/27/2025
 ms.author: lajanuar
 ---
 
-# Use Translator behind firewalls
+# Use Azure AI Translator behind firewalls
 
-Translator can translate behind firewalls using either [Domain-name](/azure/firewall/dns-settings#dns-proxy-configuration) or [IP filtering](#configure-firewall). Domain-name filtering is the preferred method.
+Azure AI Translator can translate behind firewalls using either [Domain-name](/azure/firewall/dns-settings#dns-proxy-configuration) or [IP filtering](#configure-firewall). Domain-name filtering is the preferred method.
 
 If you still require IP filtering, you can get the [IP addresses details using service tag](/azure/virtual-network/service-tags-overview#discover-service-tags-by-using-downloadable-json-files). Translator is under the **CognitiveServicesManagement** service tag.
 
diff --git a/articles/ai-services/translator/whats-new.md b/articles/ai-services/translator/whats-new.md
@@ -1,13 +1,12 @@
 ---
 title: What's new in Azure AI Translator?
 titleSuffix: Azure AI services
-description: Learn of the latest changes to the Translator Service API.
+description: Learn about the latest changes to the Azure AI Translator Service API.
 author: laujan
 manager: nitinme
 ms.service: azure-ai-translator
-ms.custom: build-2023
 ms.topic: overview
-ms.date: 06/19/2024
+ms.date: 01/27/2025
 ms.author: lajanuar
 ---
 <!-- markdownlint-disable MD024 -->
@@ -20,7 +19,7 @@ Bookmark this page to stay up to date with release notes, feature enhancements,
 
 Translator is a language service that enables users to translate text and documents, helps entities expand their global outreach, and supports preservation of at-risk and endangered languages.
 
-Translator service supports language translation for more than 100 languages. If your language community is interested in partnering with Microsoft to add your language to Translator, contact us via the [Translator community partner onboarding form](https://forms.office.com/pages/responsepage.aspx?id=v4j5cvGGr0GRqy180BHbR-riVR3Xj0tOnIRdZOALbM9UOU1aMlNaWFJOOE5YODhRR1FWVzY0QzU1OS4u).
+Azure AI Translator service supports language translation for more than 100 languages. If your language community is interested in partnering with Microsoft to add your language to Translator, contact us via the [Translator community partner onboarding form](https://forms.office.com/pages/responsepage.aspx?id=v4j5cvGGr0GRqy180BHbR-riVR3Xj0tOnIRdZOALbM9UOU1aMlNaWFJOOE5YODhRR1FWVzY0QzU1OS4u).
 
 ## May 2024
 
diff --git a/articles/ai-studio/.openpublishing.redirection.ai-studio.json b/articles/ai-studio/.openpublishing.redirection.ai-studio.json
@@ -207,7 +207,7 @@
           },
           {
             "source_path_from_root": "/articles/ai-studio/ai-services/how-to/content-safety.md",
-            "redirect_url": "/azure/ai-foundry/model-inference/how-to/configure-content-safety",
+            "redirect_url": "/azure/ai-foundry/model-inference/how-to/configure-content-filters",
             "redirect_document_id": false
           },
           {
diff --git a/articles/ai-studio/how-to/create-azure-ai-resource.md b/articles/ai-studio/how-to/create-azure-ai-resource.md
@@ -111,7 +111,7 @@ For hubs that use CMK encryption mode, you can update the encryption key to a ne
 
 ### Update Azure Application Insights and Azure Container Registry
 
-To use custom environments for Prompt Flow, you're required to configure an Azure Container Registry for your hub. To use Azure Application Insights for Prompt Flow deployments, a configured Azure Application Insights resource is required for your hub. Updating the workspace-attached Azure Container Registry or ApplicationInsights resources may break lineage of previous jobs, deployed inference endpoints, or your ability to rerun earlier jobs in the workspace. 
+To use custom environments for Prompt Flow, you're required to configure an Azure Container Registry for your hub. To use Azure Application Insights for Prompt Flow deployments, a configured Azure Application Insights resource is required for your hub. Updating the workspace-attached Azure Container Registry or Application Insights resources may break lineage of previous jobs, deployed inference endpoints, or your ability to rerun earlier jobs in the workspace. After association with an Azure AI Foundry hub, Azure Container Registry and Application Insights resources cannot be disassociated (set to null).
 
 You can use the Azure portal, Azure SDK/CLI options, or the infrastructure-as-code templates to update both Azure Application Insights and Azure Container Registry for the hub.
 
diff --git a/articles/ai-studio/reference/reference-model-inference-api.md b/articles/ai-studio/reference/reference-model-inference-api.md
@@ -115,11 +115,11 @@ If you are using an endpoint with support for Entra ID, you can create your clie
 ```python
 import os
 from azure.ai.inference import ChatCompletionsClient
-from azure.identity import AzureDefaultCredential
+from azure.identity import DefaultAzureCredential
 
 model = ChatCompletionsClient(
     endpoint=os.environ["AZUREAI_ENDPOINT_URL"],
-    credential=AzureDefaultCredential(),
+    credential=DefaultAzureCredential(),
 )
 ```
 
@@ -151,11 +151,11 @@ For endpoint with support for Microsoft Entra ID, you can create your client as
 ```javascript
 import ModelClient from "@azure-rest/ai-inference";
 import { isUnexpected } from "@azure-rest/ai-inference";
-import { AzureDefaultCredential } from "@azure/identity";
+import { DefaultAzureCredential } from "@azure/identity";
 
 const client = new ModelClient(
     process.env.AZUREAI_ENDPOINT_URL, 
-    new AzureDefaultCredential()
+    new DefaultAzureCredential()
 );
 ```
 
diff --git a/articles/machine-learning/reference-model-inference-api.md b/articles/machine-learning/reference-model-inference-api.md
@@ -108,11 +108,11 @@ If you are using an endpoint with support for Entra ID, you can create your clie
 ```python
 import os
 from azure.ai.inference import ChatCompletionsClient
-from azure.identity import AzureDefaultCredential
+from azure.identity import DefaultAzureCredential
 
 client = ChatCompletionsClient(
     endpoint=os.environ["AZUREAI_ENDPOINT_URL"],
-    credential=AzureDefaultCredential(),
+    credential=DefaultAzureCredential(),
 )
 ```
 
@@ -144,11 +144,11 @@ For endpoint with support for Microsoft Entra ID, you can create your client as
 ```javascript
 import ModelClient from "@azure-rest/ai-inference";
 import { isUnexpected } from "@azure-rest/ai-inference";
-import { AzureDefaultCredential } from "@azure/identity";
+import { DefaultAzureCredential } from "@azure/identity";
 
 const client = new ModelClient(
     process.env.AZUREAI_ENDPOINT_URL, 
-    new AzureDefaultCredential()
+    new DefaultAzureCredential()
 );
 ```
 

Original file line number	Diff line number	Diff line change
`@@ -207,7 +207,7 @@`
`207`	`207`	`},`
`208`	`208`	`{`
`209`	`209`	`"source_path_from_root": "/articles/ai-studio/ai-services/how-to/content-safety.md",`
`210`		`- "redirect_url": "/azure/ai-foundry/model-inference/how-to/configure-content-safety",`
	`210`	`+ "redirect_url": "/azure/ai-foundry/model-inference/how-to/configure-content-filters",`
`211`	`211`	`"redirect_document_id": false`
`212`	`212`	`},`
`213`	`213`	`{`