MicrosoftDocs
diff --git a/‎.openpublishing.redirection.json
Lines changed: 5 additions & 0 deletions b/‎.openpublishing.redirection.json
Lines changed: 5 additions & 0 deletions
diff --git a/‎articles/ai-foundry/model-inference/concepts/deployment-types.md
Lines changed: 19 additions & 26 deletions b/‎articles/ai-foundry/model-inference/concepts/deployment-types.md
Lines changed: 19 additions & 26 deletions
diff --git a/‎articles/ai-foundry/model-inference/concepts/endpoints.md
Lines changed: 7 additions & 3 deletions b/‎articles/ai-foundry/model-inference/concepts/endpoints.md
Lines changed: 7 additions & 3 deletions
diff --git a/‎articles/ai-foundry/model-inference/faq.yml
Lines changed: 1 addition & 1 deletion b/‎articles/ai-foundry/model-inference/faq.yml
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/ai-foundry/model-inference/how-to/configure-content-filters.md
Lines changed: 2 additions & 2 deletions b/‎articles/ai-foundry/model-inference/how-to/configure-content-filters.md
Lines changed: 2 additions & 2 deletions
diff --git a/‎articles/ai-foundry/model-inference/how-to/configure-entra-id.md
Lines changed: 32 additions & 0 deletions b/‎articles/ai-foundry/model-inference/how-to/configure-entra-id.md
Lines changed: 32 additions & 0 deletions
diff --git a/‎articles/ai-foundry/model-inference/how-to/configure-project-connection.md
Lines changed: 1 addition & 1 deletion b/‎articles/ai-foundry/model-inference/how-to/configure-project-connection.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/ai-foundry/model-inference/how-to/create-model-deployments.md
Lines changed: 1 addition & 1 deletion b/‎articles/ai-foundry/model-inference/how-to/create-model-deployments.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/ai-foundry/model-inference/how-to/github/create-model-deployments.md
Lines changed: 1 addition & 1 deletion b/‎articles/ai-foundry/model-inference/how-to/github/create-model-deployments.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/ai-foundry/model-inference/how-to/inference.md
Lines changed: 2 additions & 2 deletions b/‎articles/ai-foundry/model-inference/how-to/inference.md
Lines changed: 2 additions & 2 deletions
@@ -15,6 +15,11 @@
       "redirect_url": "/azure/search/search-how-to-dotnet-sdk",
       "redirect_document_id": false
     },
+    {
+      "source_path_from_root": "/articles/ai-services/agents/how-to/tools/overview.md",
+      "redirect_url": "/azure/ai-services/agents/overview",
+      "redirect_document_id": false
+    },
     {
       "source_path_from_root": "/articles/search/search-howto-index-csv-blobs.md",
       "redirect_url": "/azure/search/search-how-to-index-csv-blobs",
 
@@ -2,7 +2,7 @@
 title: Understanding deployment types in Azure AI model inference
 titleSuffix: Azure AI Foundry
 description: Learn how to use deployment types in Azure AI model deployments
-author: mrbullwinkle
+author: santiagxf
 manager: nitinme
 ms.service: azure-ai-model-inference
 ms.topic: how-to
@@ -13,43 +13,36 @@ ms.custom: ignite-2024, github-universe-2024
 
 # Deployment types in Azure AI model inference
 
-Azure AI model inference in Azure AI services provides customers with choices on the hosting structure that fits their business and usage patterns. The service offers two main types of deployment: **standard** and **provisioned**. Standard is offered with a global deployment option, routing traffic globally to provide higher throughput. Provisioned is also offered with a global deployment option, allowing customers to purchase and deploy provisioned throughput units across Azure global infrastructure.
+Azure AI model inference makes models available using the *model deployment* concept in Azure AI Services resources. *Model deployments* are also Azure resources and, when created, they give access to a given model under certain configurations. Such configuration includes the infrastructure require to process the requests. 
 
-All deployments can perform the exact same inference operations, however the billing, scale, and performance are substantially different. As part of your solution design, you need to make two key decisions:
+Azure AI model inference provides customers with choices on the hosting structure that fits their business and usage patterns. Those options are translated to different deployments types (or SKUs) that are available at model deployment time in the Azure AI Services resource.
 
-- **Data residency needs**: global vs. regional resources  
-- **Call volume**: standard vs. provisioned
+:::image type="content" source="../media/add-model-deployments/models-deploy-deployment-type.png" alt-text="Screenshot showing how to customize the deployment type for a given model deployment." lightbox="../media/add-model-deployments/models-deploy-deployment-type.png":::
 
-Deployment types support varies by model and model provider. You can see which deployment type (SKU) each model supports in the [Models section](models.md). 
+Different model providers offer different deployments SKUs that you can select from. When selecting a deployment type, consider your **data residency needs** and **call volume/capacity** requirements.
 
-## Global versus regional deployment types
+## Deployment types for Azure OpenAI models
 
-For standard and provisioned deployments, you have an option of two types of configurations within your resource – **global** or **regional**. Global standard is the recommended starting point. 
+The service offers two main types of deployments: **standard** and **provisioned**. For a given deployment type, customers can align their workloads with their data processing requirements by choosing an Azure geography (`Standard` or `Provisioned-Managed`), Microsoft specified data zone (`DataZone-Standard` or `DataZone Provisioned-Managed`), or Global (`Global-Standard` or `Global Provisioned-Managed`) processing options.
 
-Global deployments leverage Azure's global infrastructure, dynamically route customer traffic to the data center with best availability for the customer's inference requests. This means you get the highest initial throughput limits and best model availability with Global while still providing our uptime SLA and low latency. For high volume workloads above the specified usage tiers on standard and global standard, you may experience increased latency variation. For customers that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput.
+To learn more about deployment options for Azure OpenAI models see [Azure OpenAI documentation](../../../ai-services/openai/how-to/deployment-types.md).
 
-Our global deployments are the first location for all new models and features. Customers with large throughput requirements should consider our provisioned deployment offering.
+## Deployment types for Models-as-a-Service models
 
-## Standard
+Models from third-party model providers with pay-as-you-go billing (collectively called Models-as-a-Service), makes models available in Azure AI model inference under **standard** deployments with a Global processing option (`Global-Standard`). 
 
-Standard deployments provide a pay-per-call billing model on the chosen model. Provides the fastest way to get started as you only pay for what you consume. Models available in each region and throughput may be limited.  
+### Global-Standard
 
-Standard deployments are optimized for low to medium volume workloads with high burstiness. Customers with high consistent volume may experience greater latency variability.
+Global deployments leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global standard provides the highest default quota and eliminates the need to load balance across multiple resources. Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure location. Learn more about [data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
 
-Only Azure OpenAI models support this deployment type.
+> [!NOTE]
+> Models-as-a-Service offers regional deployment options under [Serverless API endpoints](../../../ai-studio/how-to/deploy-models-serverless.md) in Azure AI Foundry. Prompts and outputs are processed within the geography specified during deployment. However, those deployments can't be accessed using the Azure AI model inference endpoint in Azure AI Services.
 
-## Global standard
+## Control deployment options
 
-Global deployments are available in the same Azure AI services resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request.  Global standard provides the highest default quota and eliminates the need to load balance across multiple resources.  
+Administrators can control which model deployment types are available to their users by using Azure Policies. Learn more about [How to control AI model deployment with custom policies](../../../ai-studio/how-to/custom-policy-model-deployment.md).
 
-Customers with high consistent volume may experience greater latency variability. The threshold is set per model. For applications that require the lower latency variance at large workload usage, we recommend purchasing provisioned throughput if available.
+## Related content
 
-## Global provisioned
-
-Global deployments are available in the same Azure AI services resources as non-global deployment types but allow you to leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global provisioned deployments provide reserved model processing capacity for high and predictable throughput using Azure global infrastructure.
-
-Only Azure OpenAI models support this deployment type.
-
-## Next steps
-
-- [Quotas & limits](../quotas-limits.md)
+- [Quotas & limits](../quotas-limits.md)
+- [Data privacy, and security for Models-as-a-Service models](../../../ai-studio/how-to/concept-data-privacy.md)
@@ -2,7 +2,7 @@
 title: Model inference endpoint in Azure AI services
 titleSuffix: Azure AI Foundry
 description: Learn about the model inference endpoint in Azure AI services
-author: mrbullwinkle
+author: santiagxf
 manager: nitinme
 ms.service: azure-ai-model-inference
 ms.topic: how-to
@@ -38,7 +38,11 @@ To learn more about how to create deployments see [Add and configure model deplo
 
 ## Azure AI inference endpoint
 
-The Azure AI inference endpoint allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. This endpoint follows the [Azure AI model inference API](../../../ai-studio/reference/reference-model-inference-api.md) which all the models in Azure AI model inference support.
+The Azure AI inference endpoint allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. This endpoint follows the [Azure AI model inference API](.././reference/reference-model-inference-api.md) which all the models in Azure AI model inference support. It support the following modalidities:
+
+* Text embeddings
+* Image embeddings
+* Chat completions
 
 You can see the endpoint URL and credentials in the **Overview** section:
 
@@ -84,4 +88,4 @@ The Azure OpenAI endpoint is supported by the **OpenAI SDK (`AzureOpenAI` class)
 ## Next steps
 
 - [Models](models.md)
-- [Deployment types](deployment-types.md)
+- [Deployment types](deployment-types.md)
@@ -5,7 +5,7 @@ metadata:
   description: Get answers to the most popular questions about Azure AI model inference
   #services: cognitive-services
   manager: nitinme
-  ms.service: azure-ai-models
+  ms.service: azure-ai-model-inference
   ms.topic: faq
   ms.date: 1/21/2025
   ms.author: fasantia
 
@@ -6,8 +6,8 @@ manager: nitinme
 ms.service: azure-ai-model-inference
 ms.topic: how-to
 ms.date: 1/21/2025
-author: mrbullwinkle
-ms.author: mbullwin
+author: santiagxf
+ms.author: fasantia 
 recommendations: false
 ms.custom: ignite-2024, github-universe-2024
 zone_pivot_groups: azure-ai-models-deployment
 
@@ -0,0 +1,32 @@
+---
+title: Configure key-less authentication with Microsoft Entra ID
+titleSuffix: Azure AI Foundry
+description: Learn how to configure key-less authorization to use Azure AI model inference with Microsoft Entra ID.
+ms.service: azure-ai-model-inference
+ms.topic: how-to
+ms.date: 10/01/2024
+ms.custom: ignite-2024, github-universe-2024
+manager: nitinme
+author: santiagxf
+ms.author: fasantia 
+recommendations: false
+zone_pivot_groups: azure-ai-models-deployment
+---
+
+# Configure key-less authentication with Microsoft Entra ID
+
+::: zone pivot="ai-foundry-portal"
+[!INCLUDE [portal](../includes/configure-entra-id/portal.md)]
+::: zone-end
+
+::: zone pivot="programming-language-cli"
+[!INCLUDE [cli](../includes/configure-entra-id/cli.md)]
+::: zone-end
+
+::: zone pivot="programming-language-bicep"
+[!INCLUDE [bicep](../includes/configure-entra-id/bicep.md)]
+::: zone-end
+
+## Next steps
+
+* [Develop applications using Azure AI model inference service in Azure AI services](../supported-languages.md)
@@ -7,7 +7,7 @@ ms.topic: how-to
 ms.date: 1/21/2025
 ms.custom: ignite-2024, github-universe-2024
 manager: nitinme
-author: mrbullwinkle
+author: santiagxf
 ms.author: fasantia 
 recommendations: false
 zone_pivot_groups: azure-ai-models-deployment
 
@@ -7,7 +7,7 @@ ms.topic: how-to
 ms.date: 1/21/2025
 ms.custom: ignite-2024, github-universe-2024
 manager: nitinme
-author: mrbullwinkle
+author: santiagxf
 ms.author: fasantia 
 recommendations: false
 zone_pivot_groups: azure-ai-models-deployment
 
@@ -7,7 +7,7 @@ ms.topic: how-to
 ms.date: 1/21/2025
 ms.custom: ignite-2024, github-universe-2024
 manager: nitinme
-author: mrbullwinkle
+author: santiagxf
 ms.author: fasantia 
 recommendations: false
 ---
 
@@ -26,9 +26,9 @@ Azure AI services expose multiple endpoints depending on the type of work you're
 > * Azure AI model inference endpoint
 > * Azure OpenAI endpoint
 
-The **Azure AI inference endpoint** allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. All the models support this capability. This endpoint follows the [Azure AI model inference API](../../../ai-studio/reference/reference-model-inference-api.md).
+The **Azure AI inference endpoint** (usually with the form `https://<resource-name>.services.ai.azure.com/models`) allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. All the models support this capability. This endpoint follows the [Azure AI model inference API](.././reference/reference-model-inference-api.md). 
 
-**Azure OpenAI** models deployed to AI services also support the Azure OpenAI API. This endpoint exposes the full capabilities of OpenAI models and supports more features like assistants, threads, files, and batch inference.
+**Azure OpenAI** models deployed to AI services also support the Azure OpenAI API (usually with the form `https://<resource-name>.openai.azure.com`). This endpoint exposes the full capabilities of OpenAI models and supports more features like assistants, threads, files, and batch inference.
 
 To learn more about how to apply the **Azure OpenAI endpoint** see [Azure OpenAI service documentation](../../../ai-services/openai/overview.md).