name changes models as a service

ssalgadodev · ssalgadodev · commit 5b9caaf642a0 · 2025-05-09T20:21:54.000-04:00
diff --git a/articles/ai-foundry/concepts/deployments-overview.md b/articles/ai-foundry/concepts/deployments-overview.md
@@ -20,7 +20,7 @@ The model catalog in Azure AI Foundry portal is the hub to discover and use a wi
 Deployment options vary depending on the model offering:
 
 * **Azure OpenAI in Azure AI Foundry Models:** The latest OpenAI models that have enterprise features from Azure with flexible billing options.
-* **Standard deployment:** These models don't require compute quota from your subscription and are billed per token in a pay-as-you-go fashion. 
+* **Standard deployment:** These models don't require compute quota from your subscription and are billed per token in a serverless pay per token offer. 
 * **Open and custom models:** The model catalog offers access to a large variety of models across modalities, including models of open access. You can host open models in your own subscription with a managed infrastructure, virtual machines, and the number of instances for capacity management.
 
 Azure AI Foundry offers four different deployment options:
@@ -39,7 +39,7 @@ Azure AI Foundry offers four different deployment options:
 | Billing bases                 | Token usage & [provisioned throughput units](../../ai-services/openai/concepts/provisioned-throughput.md)        | Token usage       | Token usage<sup>1</sup>      | Compute core hours<sup>2</sup> |
 | Deployment instructions       | [Deploy to Azure OpenAI](../how-to/deploy-models-openai.md) | [Deploy to Azure AI model inference](../model-inference/how-to/create-model-deployments.md) | [Deploy to Standard deployment](../how-to/deploy-models-serverless.md) | [Deploy to Managed compute](../how-to/deploy-models-managed.md) |
 
-<sup>1</sup> A minimal endpoint infrastructure is billed per minute. You aren't billed for the infrastructure that hosts the model in pay-as-you-go. After you delete the endpoint, no further charges accrue.
+<sup>1</sup> A minimal endpoint infrastructure is billed per minute. You aren't billed for the infrastructure that hosts the model in standard deployment. After you delete the endpoint, no further charges accrue.
 
 <sup>2</sup> Billing is on a per-minute basis, depending on the product tier and the number of instances used in the deployment since the moment of creation. After you delete the endpoint, no further charges accrue.
 
diff --git a/articles/ai-foundry/concepts/models-featured.md b/articles/ai-foundry/concepts/models-featured.md
@@ -250,8 +250,8 @@ See [the Microsoft model collection in Azure AI Foundry portal](https://ai.azure
 
 Mistral AI offers two categories of models, namely: 
 
-- _Premium models_: These include Mistral Large, Mistral Small, Mistral-OCR-2503, and Ministral 3B models, and are available as standard deployments with pay-as-you-go token-based billing.  
-- _Open models_: These include Mistral-small-2503, Codestral, and Mistral Nemo (that are available as standard deployments with pay-as-you-go token-based billing), and [Mixtral-8x7B-Instruct-v01, Mixtral-8x7B-v01, Mistral-7B-Instruct-v01, and Mistral-7B-v01](../how-to/deploy-models-mistral-open.md)(that are available to download and run on self-hosted managed endpoints).
+- _Premium models_: These include Mistral Large, Mistral Small, Mistral-OCR-2503, and Ministral 3B models, and are available as standard deployments with serverless pay per token offer.  
+- _Open models_: These include Mistral-small-2503, Codestral, and Mistral Nemo (that are available as standard deployments with serverless pay per token offer), and [Mixtral-8x7B-Instruct-v01, Mixtral-8x7B-v01, Mistral-7B-Instruct-v01, and Mistral-7B-v01](../how-to/deploy-models-mistral-open.md)(that are available to download and run on self-hosted managed endpoints).
 
 
 | Model  | Type | Capabilities |
diff --git a/articles/ai-foundry/faq.yml b/articles/ai-foundry/faq.yml
@@ -53,7 +53,7 @@ sections:
       - question: |
           What is the billing model for standard deployments? 
         answer: |
-          Azure AI Foundry offers pay-as-you-go inference APIs and hosted fine-tuning for [Llama 2 family models](how-to/deploy-models-llama.md). Currently, there's no extra charge for Azure AI Foundry outside of typical AI services and other Azure resource charges.
+          Azure AI Foundry offers standard deployment models and hosted fine-tuning for [Llama 2 family models](how-to/deploy-models-llama.md). Currently, there's no extra charge for Azure AI Foundry outside of typical AI services and other Azure resource charges.
       - question: |
           Can all models be secured with content filtering? 
         answer: |
diff --git a/articles/ai-foundry/how-to/concept-data-privacy.md b/articles/ai-foundry/how-to/concept-data-privacy.md
@@ -41,7 +41,7 @@ Although containers for **Curated by Azure AI** models are scanned for vulnerabi
 
 ## Generation of inferencing outputs as a standard deployment
 
-When you deploy a model from the model catalog (base or fine-tuned) by using standard deployments with pay-as-you-go billing for inferencing, an API is provisioned. The API gives you access to the model that the Azure Machine Learning service hosts and manages. Learn more about standard deployments in [Model catalog and collections](./model-catalog-overview.md).
+When you deploy a model from the model catalog (base or fine-tuned) by using standard deployments with serverless pay per token offer for inferencing, an API is provisioned. The API gives you access to the model that the Azure Machine Learning service hosts and manages. Learn more about standard deployments in [Model catalog and collections](./model-catalog-overview.md).
 
 The model processes your input prompts and generates outputs based on its functionality, as described in the model details. Your use of the model (along with the provider's accountability for the model and its outputs) is subject to the license terms for the model. Microsoft provides and manages the hosting infrastructure and API endpoint. The models hosted in this *standard deployment* scenario are subject to Azure data, privacy, and security commitments. [Learn more about Azure compliance offerings applicable to Azure AI Foundry](https://servicetrust.microsoft.com/DocumentPage/7adf2d9e-d7b5-4e71-bad8-713e6a183cf3).
 
diff --git a/articles/ai-foundry/how-to/configure-private-link.md b/articles/ai-foundry/how-to/configure-private-link.md
@@ -374,7 +374,7 @@ If you need to configure custom DNS server without DNS forwarding, use the follo
 * `<instance-name>-22.<region>.instances.azureml.ms` - Only used by the `az ml compute connect-ssh` command to connect to computers in a managed virtual network. Not needed if you aren't using a managed network or SSH connections.
 
 * `<managed online endpoint name>.<region>.inference.ml.azure.com` - Used by managed online endpoints
-* `models.ai.azure.com` - Used for deploying Models as a Service
+* `models.ai.azure.com` - Used for standard deployment
 
 To find the private IP addresses for your A records, see the [Azure Machine Learning custom DNS](/azure/machine-learning/how-to-custom-dns#find-the-ip-addresses) article.
 
diff --git a/articles/ai-foundry/how-to/model-catalog-overview.md b/articles/ai-foundry/how-to/model-catalog-overview.md
@@ -82,7 +82,7 @@ To view a list of supported models for standard deployment or Managed Compute, g
 
 <!-- docutune:enable -->
 
-:::image type="content" source="../media/explore/platform-service-cycle.png" alt-text="Diagram that shows models as a service and the service cycle of managed computes." lightbox="../media/explore/platform-service-cycle.png":::
+:::image type="content" source="../media/explore/platform-service-cycle.png" alt-text="Diagram that shows a standard deployment model and the service cycle of managed computes." lightbox="../media/explore/platform-service-cycle.png":::
 
 ## Model lifecycle: deprecation and retirement
 AI models evolve fast, and when a new version or a new model with updated capabilities in the same model family become available, older models may be retired in the AI Foundry model catalog. To allow for a smooth transition to a newer model version, some models provide users with the option to enable automatic updates. To learn more about the model lifecycle of different models, upcoming model retirement dates, and suggested replacement models and versions, see:
diff --git a/articles/ai-foundry/model-inference/how-to/quickstart-ai-project.md b/articles/ai-foundry/model-inference/how-to/quickstart-ai-project.md
@@ -16,7 +16,7 @@ recommendations: false
 
 If you already have an AI project in Azure AI Foundry, the model catalog deploys models from third-party model providers as stand-alone endpoints in your project by default. Each model deployment has its own set of URI and credentials to access it. On the other hand, Azure OpenAI models are deployed to Azure AI Services resource or to the Azure OpenAI Service resource.
 
-You can change this behavior and deploy both types of models to Azure AI Foundry resources (formerly known Azure AI Services). Once configured, **deployments of Models as a Service models supporting pay-as-you-go billing happen to the connected Azure AI Services resource** instead to the project itself, giving you a single set of endpoint and credential to access all the models deployed in Azure AI Foundry. You can manage Azure OpenAI and third-party model providers models in the same way.
+You can change this behavior and deploy both types of models to Azure AI Foundry resources (formerly known Azure AI Services). Once configured, **deployments of models as a standard deployment happen to the connected Azure AI Services resource** instead to the project itself, giving you a single set of endpoint and credential to access all the models deployed in Azure AI Foundry. You can manage Azure OpenAI and third-party model providers models in the same way.
 
 Additionally, deploying models to Azure AI Foundry Models brings the extra benefits of:
 
@@ -188,7 +188,7 @@ For each model deployed as standard deployments, follow these steps:
 
 Consider the following limitations when configuring your project to use Azure AI model inference:
 
-* Only models supporting pay-as-you-go billing (Models as a Service) are available for deployment to Azure AI model inference. Models requiring compute quota from your subscription (Managed Compute), including custom models, can only be deployed within a given project as Managed Online Endpoints and continue to be accessible using their own set of endpoint URI and credentials.
+* Only models supporting pay-as-you-go billing (standard deployment) are available for deployment to Azure AI model inference. Models requiring compute quota from your subscription (Managed Compute), including custom models, can only be deployed within a given project as Managed Online Endpoints and continue to be accessible using their own set of endpoint URI and credentials.
 * Models available as both pay-as-you-go billing and managed compute offerings are, by default, deployed to Azure AI model inference in Azure AI services resources. Azure AI Foundry portal doesn't offer a way to deploy them to Managed Online Endpoints. You have to turn off the feature mentioned at [Configure the project to use Azure AI model inference](#configure-the-project-to-use-azure-ai-model-inference) or use the Azure CLI/Azure ML SDK/ARM templates to perform the deployment.
 
 ## Next steps
diff --git a/articles/machine-learning/concept-model-catalog.md b/articles/machine-learning/concept-model-catalog.md
@@ -47,7 +47,7 @@ Model Catalog offers two distinct ways to deploy models from the catalog for you
 
 Features | Managed compute   | standard deployment (pay-as-you-go)
 --|--|-- 
-Deployment experience and billing |  Model weights are deployed to dedicated Virtual Machines with managed online endpoints. The managed online endpoint, which can have one or more deployments, makes available a REST API for inference. You're billed for the Virtual Machine core hours used by the deployments.  | Access to models is through a deployment that provisions an API to access the model. The API provides access to the model hosted in a central GPU pool, managed by Microsoft, for inference. This mode of access is referred to as "Models as a Service".   You're billed for inputs and outputs to the APIs, typically in tokens; pricing information is provided before you deploy.  
+Deployment experience and billing |  Model weights are deployed to dedicated Virtual Machines with managed online endpoints. The managed online endpoint, which can have one or more deployments, makes available a REST API for inference. You're billed for the Virtual Machine core hours used by the deployments.  | Access to models is through a deployment that provisions an API to access the model. The API provides access to the model hosted in a central GPU pool, managed by Microsoft, for inference. This mode of access is referred to as "standard deployment".   You're billed for inputs and outputs to the APIs, typically in tokens; pricing information is provided before you deploy.  
 | API authentication   | Keys and Microsoft Entra ID authentication. [Learn more.](concept-endpoints-online-auth.md) | Keys only.  
 Content safety | Use Azure Content Safety service APIs.  | Azure AI Content Safety filters are available integrated with inference APIs. Azure AI Content Safety filters may be billed separately.  
 Network isolation | Managed Virtual Network with Online Endpoints. [Learn more.](how-to-network-isolation-model-catalog.md)  |  
@@ -64,7 +64,7 @@ Phi-3 family models | Phi-3-mini-4k-Instruct <br> Phi-3-mini-128k-Instruct <br>
 Nixtla | Not available | TimeGEN-1
 Other models | Available | Not available
 
-:::image type="content" source="./media/concept-model-catalog/platform-service-cycle.png" alt-text="A diagram showing models as a service and Real time end points service cycle." lightbox="media/concept-model-catalog/platform-service-cycle.png":::
+:::image type="content" source="./media/concept-model-catalog/platform-service-cycle.png" alt-text="A diagram showing standard deployment and Real time end points service cycle." lightbox="media/concept-model-catalog/platform-service-cycle.png":::
 
 ## Managed compute
 
diff --git a/articles/machine-learning/how-to-custom-dns.md b/articles/machine-learning/how-to-custom-dns.md
@@ -143,7 +143,7 @@ The following FQDNs are for Microsoft Azure operated by 21Vianet regions:
 
 * `<instance-name>-22.<region>.instances.azureml.cn` - Only used by the `az ml compute connect-ssh` command to connect to computes in a private virtual network. Not needed if you aren't using a managed network or SSH connections.
 * `<managed online endpoint name>.<region>.inference.ml.azure.cn` - Used by managed online endpoints
-* `models.ai.azure.com` - Used for deploying Models as a Service
+* `models.ai.azure.com` - Used for standard deployment
 
 #### Azure US Government