MicrosoftDocs
diff --git a/‎.openpublishing.publish.config.json
Lines changed: 6 additions & 0 deletions b/‎.openpublishing.publish.config.json
Lines changed: 6 additions & 0 deletions
diff --git a/‎articles/ai-foundry/model-inference/breadcrumb/toc.yml
Lines changed: 11 additions & 0 deletions b/‎articles/ai-foundry/model-inference/breadcrumb/toc.yml
Lines changed: 11 additions & 0 deletions
diff --git a/‎articles/ai-foundry/model-inference/concepts/content-filter.md
Lines changed: 309 additions & 0 deletions b/‎articles/ai-foundry/model-inference/concepts/content-filter.md
Lines changed: 309 additions & 0 deletions
diff --git a/‎articles/ai-foundry/model-inference/concepts/default-safety-policies.md
Lines changed: 81 additions & 0 deletions b/‎articles/ai-foundry/model-inference/concepts/default-safety-policies.md
Lines changed: 81 additions & 0 deletions
diff --git a/‎articles/ai-foundry/model-inference/concepts/deployment-types.md
Lines changed: 48 additions & 0 deletions b/‎articles/ai-foundry/model-inference/concepts/deployment-types.md
Lines changed: 48 additions & 0 deletions
diff --git a/‎articles/ai-foundry/model-inference/concepts/endpoints.md
Lines changed: 91 additions & 0 deletions b/‎articles/ai-foundry/model-inference/concepts/endpoints.md
Lines changed: 91 additions & 0 deletions
diff --git a/‎articles/ai-foundry/model-inference/concepts/model-versions.md
Lines changed: 62 additions & 0 deletions b/‎articles/ai-foundry/model-inference/concepts/model-versions.md
Lines changed: 62 additions & 0 deletions
@@ -176,6 +176,12 @@
       "branch": "main",
       "branch_mapping": {}
     },
+    {
+      "path_to_root": "azureai-model-inference-bicep",
+      "url": "https://github.com/Azure-Samples/azureai-model-inference-bicep",
+      "branch": "main",
+      "branch_mapping": {}
+    },
     {
       "path_to_root": "azure-docs-pr-policy-includes",
       "url": "https://github.com/MicrosoftDocs/azure-docs-pr",
 
@@ -0,0 +1,11 @@
+- name: Azure
+  tocHref: /azure/
+  topicHref: /azure/index
+  items:
+  - name: AI Foundry
+    tocHref: /azure/ai-foundry/
+    topicHref: /azure/ai-studio/index
+    items:
+    - name: Model Inference
+      tocHref: /azure/ai-foundry/model-inference/
+      topicHref: /azure/ai-foundry/model-inference/index
@@ -0,0 +1,81 @@
+---
+title: Default content safety policies for Azure AI Model Inference
+titleSuffix: Azure AI Foundry
+description: Learn about the default content safety policies that Azure AI Model Inference uses to flag content.
+author: PatrickFarley
+ms.author: fasantia
+ms.service: azure-ai-model-inference
+ms.topic: conceptual 
+ms.date: 1/21/2025
+manager: nitinme
+---
+
+# Default content safety policies for Azure AI Model Inference
+
+Azure AI model inference includes default safety applied to all models, excluding Azure OpenAI Whisper. These configurations provide you with a responsible experience by default.
+
+Default safety aims to mitigate risks such as hate and fairness, sexual, violence, self-harm, protected material content, and user prompt injection attacks. To learn more about content filtering, read [our documentation describing categories and severity levels](content-filter.md).
+
+This document describes the default configuration.
+
+> [!TIP]
+> By default, all model deployments use the default configuration. However, you can configure content filtering per model deployment as explained at [Configuring content filtering](../how-to/configure-content-filters.md).
+
+## Text models
+
+Text models in Azure AI model inference can take in and generate both text and code. These models apply Azure's text content filtering models to detect and prevent harmful content. This system works on both prompt and completion. 
+
+| Risk Category                             | Prompt/Completion      | Severity Threshold  |
+|-------------------------------------------|------------------------|---------------------|
+| Hate and Fairness                         | Prompts and Completions| Medium              |
+| Violence                                  | Prompts and Completions| Medium              |
+| Sexual                                    | Prompts and Completions| Medium              |
+| Self-Harm                                 | Prompts and Completions| Medium              |
+| User prompt injection attack (Jailbreak)  | Prompts                | N/A                 |
+| Protected Material – Text                 | Completions            | N/A                 |
+| Protected Material – Code                 | Completions            | N/A                 |
+
+## Vision and chat with vision models
+
+Vision models can take both text and images at the same time as part of the input. Default content filtering capabilities vary per model and provider.
+
+### Azure OpenAI: GPT-4o and GPT-4 Turbo
+
+| Risk Category                                                       | Prompt/Completion      | Severity Threshold |
+|---------------------------------------------------------------------|------------------------|---------------------|
+| Hate and Fairness                                                   | Prompts and Completions| Medium              |
+| Violence                                                            | Prompts and Completions| Medium              |
+| Sexual                                                              | Prompts and Completions| Medium              |
+| Self-Harm                                                           | Prompts and Completions| Medium              |
+| Identification of Individuals and Inference of Sensitive Attributes | Prompts                | N/A                 |
+| User prompt injection attack (Jailbreak)                            | Prompts                | N/A                 |
+
+### Azure OpenAI: DALL-E 3 and DALL-E 2
+
+| Risk Category                                     | Prompt/Completion      | Severity Threshold |
+|---------------------------------------------------|------------------------|---------------------|
+| Hate and Fairness                                 | Prompts and Completions| Low                 |
+| Violence                                          | Prompts and Completions| Low                 |
+| Sexual                                            | Prompts and Completions| Low                 |
+| Self-Harm                                         | Prompts and Completions| Low                 |
+| Content Credentials                               | Completions            | N/A                 |
+| Deceptive Generation of Political Candidates      | Prompts                | N/A                 |
+| Depictions of Public Figures                      | Prompts                | N/A                 |
+| User prompt injection attack (Jailbreak)          | Prompts                | N/A                 |
+| Protected Material – Art and Studio Characters    | Prompts                | N/A                 |
+| Profanity                                         | Prompts                | N/A                 |
+
+
+In addition to the previous safety configurations, Azure OpenAI DALL-E also comes with [prompt transformation](../../../ai-services/openai/concepts/prompt-transformation.md) by default. This transformation occurs on all prompts to enhance the safety of your original prompt, specifically in the risk categories of diversity, deceptive generation of political candidates, depictions of public figures, protected material, and others. 
+
+### Meta: Llama-3.2-11B-Vision-Instruct and Llama-3.2-90B-Vision-Instruct
+
+Content filters apply only to text prompts and completions. Images aren't subject to content moderation.
+
+### Microsoft: Phi-3.5-vision-instruct
+
+Content filters apply only to text prompts and completions. Images aren't subject to content moderation.
+
+## Next steps
+
+* [Configure content filters in Azure AI Model Inference](../how-to/configure-content-filters.md)
@@ -0,0 +1,48 @@
+---
+title: Understanding deployment types in Azure AI model inference
+titleSuffix: Azure AI Foundry
+description: Learn how to use deployment types in Azure AI model deployments
+author: santiagxf
+manager: nitinme
+ms.service: azure-ai-model-inference
+ms.topic: how-to
+ms.date: 1/21/2025
+ms.author: fasantia
+ms.custom: ignite-2024, github-universe-2024
+---
+
+# Deployment types in Azure AI model inference
+
+Azure AI model inference makes models available using the *model deployment* concept in Azure AI Services resources. *Model deployments* are also Azure resources and, when created, they give access to a given model under certain configurations. Such configuration includes the infrastructure require to process the requests. 
+
+Azure AI model inference provides customers with choices on the hosting structure that fits their business and usage patterns. Those options are translated to different deployments types (or SKUs) that are available at model deployment time in the Azure AI Services resource.
+
+:::image type="content" source="../media/add-model-deployments/models-deploy-deployment-type.png" alt-text="Screenshot showing how to customize the deployment type for a given model deployment." lightbox="../media/add-model-deployments/models-deploy-deployment-type.png":::
+
+Different model providers offer different deployments SKUs that you can select from. When selecting a deployment type, consider your **data residency needs** and **call volume/capacity** requirements.
+
+## Deployment types for Azure OpenAI models
+
+The service offers two main types of deployments: **standard** and **provisioned**. For a given deployment type, customers can align their workloads with their data processing requirements by choosing an Azure geography (`Standard` or `Provisioned-Managed`), Microsoft specified data zone (`DataZone-Standard` or `DataZone Provisioned-Managed`), or Global (`Global-Standard` or `Global Provisioned-Managed`) processing options.
+
+To learn more about deployment options for Azure OpenAI models see [Azure OpenAI documentation](../../../ai-services/openai/how-to/deployment-types.md).
+
+## Deployment types for Models-as-a-Service models
+
+Models from third-party model providers with pay-as-you-go billing (collectively called Models-as-a-Service), makes models available in Azure AI model inference under **standard** deployments with a Global processing option (`Global-Standard`). 
+
+### Global-Standard
+
+Global deployments leverage Azure's global infrastructure to dynamically route traffic to the data center with best availability for each request. Global standard provides the highest default quota and eliminates the need to load balance across multiple resources. Data stored at rest remains in the designated Azure geography, while data may be processed for inferencing in any Azure location. Learn more about [data residency](https://azure.microsoft.com/explore/global-infrastructure/data-residency/).
+
+> [!NOTE]
+> Models-as-a-Service offers regional deployment options under [Serverless API endpoints](../../../ai-studio/how-to/deploy-models-serverless.md) in Azure AI Foundry. Prompts and outputs are processed within the geography specified during deployment. However, those deployments can't be accessed using the Azure AI model inference endpoint in Azure AI Services.
+
+## Control deployment options
+
+Administrators can control which model deployment types are available to their users by using Azure Policies. Learn more about [How to control AI model deployment with custom policies](../../../ai-studio/how-to/custom-policy-model-deployment.md).
+
+## Related content
+
+- [Quotas & limits](../quotas-limits.md)
+- [Data privacy, and security for Models-as-a-Service models](../../../ai-studio/how-to/concept-data-privacy.md)
@@ -0,0 +1,91 @@
+---
+title: Model inference endpoint in Azure AI services
+titleSuffix: Azure AI Foundry
+description: Learn about the model inference endpoint in Azure AI services
+author: santiagxf
+manager: nitinme
+ms.service: azure-ai-model-inference
+ms.topic: how-to
+ms.date: 1/21/2025
+ms.author: fasantia
+ms.custom: ignite-2024, github-universe-2024
+---
+
+# Model inference endpoint in Azure AI Services
+
+Azure AI model inference in Azure AI services allows customers to consume the most powerful models from flagship model providers using a single endpoint and credentials. This means that you can switch between models and consume them from your application without changing a single line of code.
+
+The article explains how models are organized inside of the service and how to use the inference endpoint to invoke them.
+
+## Deployments
+
+Azure AI model inference makes models available using the **deployment** concept. **Deployments** are a way to give a model a name under certain configurations. Then, you can invoke such model configuration by indicating its name on your requests.
+
+Deployments capture:
+
+> [!div class="checklist"]
+> * A model name
+> * A model version
+> * A provisioning/capacity type<sup>1</sup>
+> * A content filtering configuration<sup>1</sup>
+> * A rate limiting configuration<sup>1</sup>
+
+<sup>1</sup> Configurations may vary depending on the selected model.
+
+An Azure AI services resource can have as many model deployments as needed and they don't incur in cost unless inference is performed for those models. Deployments are Azure resources and hence they're subject to Azure policies.
+
+To learn more about how to create deployments see [Add and configure model deployments](../how-to/create-model-deployments.md).
+
+## Azure AI inference endpoint
+
+The Azure AI inference endpoint allows customers to use a single endpoint with the same authentication and schema to generate inference for the deployed models in the resource. This endpoint follows the [Azure AI model inference API](.././reference/reference-model-inference-api.md) which all the models in Azure AI model inference support. It support the following modalidities:
+
+* Text embeddings
+* Image embeddings
+* Chat completions
+
+You can see the endpoint URL and credentials in the **Overview** section:
+
+:::image type="content" source="../media/overview/overview-endpoint-and-key.png" alt-text="Screenshot showing how to get the URL and key associated with the resource." lightbox="../media/overview/overview-endpoint-and-key.png":::
+
+### Routing
+
+The inference endpoint routes requests to a given deployment by matching the parameter `name` inside of the request to the name of the deployment. This means that *deployments work as an alias of a given model under certain configurations*. This flexibility allows you to deploy a given model multiple times in the service but under different configurations if needed.
+
+:::image type="content" source="../media/endpoint/endpoint-routing.png" alt-text="An illustration showing how routing works for a Meta-llama-3.2-8b-instruct model by indicating such name in the parameter 'model' inside of the payload request." lightbox="../media/endpoint/endpoint-routing.png":::
+
+For example, if you create a deployment named `Mistral-large`, then such deployment can be invoked as:
+
+[!INCLUDE [code-create-chat-client](../includes/code-create-chat-client.md)]
+
+[!INCLUDE [code-create-chat-completion](../includes/code-create-chat-completion.md)]
+
+> [!TIP]
+> Deployment routing isn't case sensitive.
+
+### SDKs
+
+The Azure AI model inference endpoint is supported by multiple SDKs, including the **Azure AI Inference SDK**, the **Azure AI Foundry SDK**, and the **Azure OpenAI SDK**; which are available in multiple languages. Multiple integrations are also supported in popular frameworks like LangChain, LangGraph, Llama-Index, Semantic Kernel, and AG2. See [supported programming languages and SDKs](../supported-languages.md) for details.
+
+## Azure OpenAI inference endpoint
+
+Azure OpenAI models deployed to AI services also support the Azure OpenAI API. This API exposes the full capabilities of OpenAI models and supports additional features like assistants, threads, files, and batch inference.
+
+Azure OpenAI inference endpoints work at the deployment level and they have their own URL that is associated with each of them. However, the same authentication mechanism can be used to consume them. Learn more in the reference page for [Azure OpenAI API](../../../ai-services/openai/reference.md)
+
+:::image type="content" source="../media/endpoint/endpoint-openai.png" alt-text="An illustration showing how Azure OpenAI deployments contain a single URL for each deployment." lightbox="../media/endpoint/endpoint-openai.png":::
+
+Each deployment has a URL that is the concatenations of the **Azure OpenAI** base URL and the route `/deployments/<model-deployment-name>`.
+
+> [!IMPORTANT]
+> There's no routing mechanism for the Azure OpenAI endpoint, as each URL is exclusive for each model deployment.
+
+### SDKs
+
+The Azure OpenAI endpoint is supported by the **OpenAI SDK (`AzureOpenAI` class)** and **Azure OpenAI SDKs**, which are available in multiple languages. See [supported languages](../supported-languages.md#azure-openai-models) for details. 
+
+
+## Next steps
+
+- [Models](models.md)
+- [Deployment types](deployment-types.md)
@@ -0,0 +1,62 @@
+---
+title: Model versions in Azure AI model inference
+titleSuffix: Azure AI Foundry
+description: Learn about model versions in Azure AI model inference. 
+ms.service: azure-ai-model-inference
+ms.topic: conceptual
+ms.custom: ignite-2024, github-universe-2024 
+ms.date: 1/21/2025
+manager: nitinme
+author: santiagxf
+ms.author: fasantia
+recommendations: false
+---
+
+# Model versions in Azure AI model inference
+
+Azure AI services are committed to providing the best generative AI models for customers. As part of this commitment, Azure AI services regularly releases new model versions to incorporate the latest features and improvements from key model providers in the industry.
+
+## How model versions work
+
+We want to make it easy for customers to stay up to date as models improve. Customers can choose to start with a particular version and stay on it or to automatically update as new versions are released.
+
+We distinguish two different versions when working with models:
+
+* The version of the model itself.
+* The version of the API used to consume a model deployment.
+
+The version of a model is decided when you deploy it. You can choose an update policy, which can include the following options:
+
+* Deployments set with a specific version or without offering an upgrade policy require a manual upgrade if a new version is released. When the model is retired, those deployments stop working.
+
+* Deployments set to **Auto-update to default** automatically update to use the new default version.
+
+* Deployments set to **Upgrade when expired** automatically update when its current version is retired.
+
+> [!NOTE]
+> Update policies are configured per deployment and **vary** by model and provider.
+
+The API version indicated the contract that you use to interface with the model in code. When using REST APIs, you indicate the API version using the query parameter `api-version`. Azure SDKs versions are usually paired with specific APIs versions but you can indicate the API version you want to use. A given model deployment might support multiple API versions. The release of a new model version might not require you to upgrade to a new API version, as is the case when there's an update to the model's weights.
+
+## Azure OpenAI model updates
+
+Azure works closely with OpenAI to release new model versions. When a new version of a model is released, you can immediately test it in new deployments. Azure publishes when new versions of models are released, and notifies customers at least two weeks before a new version becomes the default version of the model. Azure also maintains the previous major version of the model until its retirement date, so you can switch back to it if desired.
+
+### What you need to know about Azure OpenAI model version upgrades
+
+As a customer of Azure OpenAI models, you might notice some changes in the model behavior and compatibility after a version upgrade.  These changes might affect your applications and workflows that rely on the models.  Here are some tips to help you prepare for version upgrades and minimize the impact:
+
+* Read [what's new](../../../ai-services/openai/whats-new.md) and [models](../../../ai-services/openai/concepts/models.md) to understand the changes and new features.
+* Read the documentation on [model deployments](../../../ai-services/openai/how-to/create-resource.md) and [version upgrades](../../../ai-services/openai/how-to/working-with-models.md) to understand how to work with model versions.
+* Test your applications and workflows with the new model version after release.
+* Update your code and configuration to use the new features and capabilities of the new model version.
+
+## Non-Microsoft model updates
+
+Azure works closely with model providers to release new model versions. When a new version of a model is released, you can immediately test it in new deployments. Azure also maintains the previous major version of the model until its retirement date, so you can switch back to it if desired.
+
+New model versions might result in a new model ID being published. For example, `Llama-3.3-70B-Instruct`, `Meta-Llama-3.1-70B-Instruct`, and `Meta-Llama-3-70B-Instruct`. In some cases, all the model versions might be available in the same API version. In other cases, you might also need to adjust the API version used to consume the model in case the API contract has changed from one model to another.
+
+## Related content
+
+- [Learn more about working with Azure OpenAI models](../../../ai-services/openai/how-to/working-with-models.md)