Merge pull request #3703 from msakande/freshness-deploy-managed-compute

AnnaMHuff · web-flow · commit dda95b61585a · 2025-03-24T17:09:22.000-06:00
freshness- deploy-models-managed.md
diff --git a/articles/ai-foundry/how-to/deploy-models-managed.md b/articles/ai-foundry/how-to/deploy-models-managed.md
@@ -7,7 +7,7 @@ ms.service: azure-ai-foundry
 ms.custom:
   - build-2024
 ms.topic: how-to
-ms.date: 8/13/2024
+ms.date: 03/24/2025
 ms.reviewer: fasantia 
 reviewer: santiagxf
 ms.author: mopeakande
@@ -39,121 +39,119 @@ You can deploy managed compute models using the Azure Machine Learning SDK, but
 
 ## Deploy the model
 
-Let's deploy the model.
-
-First, you need to install the Azure Machine Learning SDK.
-
-```python
-pip install azure-ai-ml
-pip install azure-identity
-```
-
-Use this code to authenticate with Azure Machine Learning and create a client object. Replace the placeholders with your subscription ID, resource group name, and Azure AI Foundry project name.
-
-```python
-from azure.ai.ml import MLClient
-from azure.identity import InteractiveBrowserCredential
-
-client = MLClient(
-    credential=InteractiveBrowserCredential,
-    subscription_id="your subscription name goes here",
-    resource_group_name="your resource group name goes here",
-    workspace_name="your project name goes here",
-)
-```
-
-For the managed compute deployment option, you need to create an endpoint before a model deployment. Think of an endpoint as a container that can house multiple model deployments. The endpoint names need to be unique in a region, so in this example we're using the timestamp to create a unique endpoint name.
-
-```python
-import time, sys
-from azure.ai.ml.entities import (
-    ManagedOnlineEndpoint,
-    ManagedOnlineDeployment,
-    ProbeSettings,
-)
-
-# Make the endpoint name unique
-timestamp = int(time.time())
-online_endpoint_name = "customize your endpoint name here" + str(timestamp)
-
-# Create an online endpoint
-endpoint = ManagedOnlineEndpoint(
-    name=online_endpoint_name,
-    auth_mode="key",
-)
-workspace_ml_client.begin_create_or_update(endpoint).wait()
-```
-
-
-Create a deployment. You can find the model ID in the model catalog.
-
-```python
-model_name = "azureml://registries/azureml/models/deepset-roberta-base-squad2/versions/16" 
-
-demo_deployment = ManagedOnlineDeployment(
-    name="demo",
-    endpoint_name=online_endpoint_name,
-    model=model_name,
-    instance_type="Standard_DS3_v2",
-    instance_count=2,
-    liveness_probe=ProbeSettings(
-        failure_threshold=30,
-        success_threshold=1,
-        timeout=2,
-        period=10,
-        initial_delay=1000,
-    ),
-    readiness_probe=ProbeSettings(
-        failure_threshold=10,
-        success_threshold=1,
-        timeout=10,
-        period=10,
-        initial_delay=1000,
-    ),
-)
-workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()
-endpoint.traffic = {"demo": 100}
-workspace_ml_client.begin_create_or_update(endpoint).result()
-```
+1.  Install the Azure Machine Learning SDK.
+
+    ```python
+    pip install azure-ai-ml
+    pip install azure-identity
+    ```
+
+1. Authenticate with Azure Machine Learning and create a client object. Replace the placeholders with your subscription ID, resource group name, and Azure AI Foundry project name.
+
+    ```python
+    from azure.ai.ml import MLClient
+    from azure.identity import InteractiveBrowserCredential
+    
+    workspace_ml_client = MLClient(
+        credential=InteractiveBrowserCredential,
+        subscription_id="your subscription name goes here",
+        resource_group_name="your resource group name goes here",
+        workspace_name="your project name goes here",
+    )
+    ```
+
+1. Create an endpoint. For the managed compute deployment option, you need to create an endpoint before a model deployment. Think of an endpoint as a container that can house multiple model deployments. The endpoint names need to be unique in a region, so in this example use the timestamp to create a unique endpoint name.
+
+    ```python
+    import time, sys
+    from azure.ai.ml.entities import (
+        ManagedOnlineEndpoint,
+        ManagedOnlineDeployment,
+        ProbeSettings,
+    )
+    
+    # Make the endpoint name unique
+    timestamp = int(time.time())
+    online_endpoint_name = "customize your endpoint name here" + str(timestamp)
+    
+    # Create an online endpoint
+    endpoint = ManagedOnlineEndpoint(
+        name=online_endpoint_name,
+        auth_mode="key",
+    )
+    workspace_ml_client.online_endpoints.begin_create_or_update(endpoint).wait()
+    ```
+
+1. Create a deployment. Replace the model ID in the next code with the model ID that you copied from the details page of the model you selected in the [Get the model ID](#get-the-model-id) section.
+
+    ```python
+    model_name = "azureml://registries/azureml/models/deepset-roberta-base-squad2/versions/16" 
+    
+    demo_deployment = ManagedOnlineDeployment(
+        name="demo",
+        endpoint_name=online_endpoint_name,
+        model=model_name,
+        instance_type="Standard_DS3_v2",
+        instance_count=2,
+        liveness_probe=ProbeSettings(
+            failure_threshold=30,
+            success_threshold=1,
+            timeout=2,
+            period=10,
+            initial_delay=1000,
+        ),
+        readiness_probe=ProbeSettings(
+            failure_threshold=10,
+            success_threshold=1,
+            timeout=10,
+            period=10,
+            initial_delay=1000,
+        ),
+    )
+    workspace_ml_client.online_deployments.begin_create_or_update(demo_deployment).wait()
+    endpoint.traffic = {"demo": 100}
+    workspace_ml_client.online_endpoints.begin_create_or_update(endpoint).result()
+    ```
 
 ## Inference the deployment
-You need a sample json data to test inferencing. Create `sample_score.json` with the following example. 
-
-```python
-{
-  "inputs": {
-    "question": [
-      "Where do I live?",
-      "Where do I live?",
-      "What's my name?",
-      "Which name is also used to describe the Amazon rainforest in English?"
-    ],
-    "context": [
-      "My name is Wolfgang and I live in Berlin",
-      "My name is Sarah and I live in London",
-      "My name is Clara and I live in Berkeley.",
-      "The Amazon rainforest (Portuguese: Floresta Amaz\u00f4nica or Amaz\u00f4nia; Spanish: Selva Amaz\u00f3nica, Amazon\u00eda or usually Amazonia; French: For\u00eat amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain \"Amazonas\" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species."
-    ]
-  }
-}
-```
-
-Let's inference with `sample_score.json`. Change the location based on where you saved your sample json file.
-
-```python
-scoring_file = "./sample_score.json" 
-response = workspace_ml_client.online_endpoints.invoke(
-    endpoint_name=online_endpoint_name,
-    deployment_name="demo",
-    request_file=scoring_file,
-)
-response_json = json.loads(response)
-print(json.dumps(response_json, indent=2))
-```
+
+1. You need a sample json data to test inferencing. Create `sample_score.json` with the following example. 
+
+    ```python
+    {
+      "inputs": {
+        "question": [
+          "Where do I live?",
+          "Where do I live?",
+          "What's my name?",
+          "Which name is also used to describe the Amazon rainforest in English?"
+        ],
+        "context": [
+          "My name is Wolfgang and I live in Berlin",
+          "My name is Sarah and I live in London",
+          "My name is Clara and I live in Berkeley.",
+          "The Amazon rainforest (Portuguese: Floresta Amaz\u00f4nica or Amaz\u00f4nia; Spanish: Selva Amaz\u00f3nica, Amazon\u00eda or usually Amazonia; French: For\u00eat amazonienne; Dutch: Amazoneregenwoud), also known in English as Amazonia or the Amazon Jungle, is a moist broadleaf forest that covers most of the Amazon basin of South America. This basin encompasses 7,000,000 square kilometres (2,700,000 sq mi), of which 5,500,000 square kilometres (2,100,000 sq mi) are covered by the rainforest. This region includes territory belonging to nine nations. The majority of the forest is contained within Brazil, with 60% of the rainforest, followed by Peru with 13%, Colombia with 10%, and with minor amounts in Venezuela, Ecuador, Bolivia, Guyana, Suriname and French Guiana. States or departments in four nations contain \"Amazonas\" in their names. The Amazon represents over half of the planet's remaining rainforests, and comprises the largest and most biodiverse tract of tropical rainforest in the world, with an estimated 390 billion individual trees divided into 16,000 species."
+        ]
+      }
+    }
+    ```
+
+1. Inference with `sample_score.json`. Change the location of the scoring file in the next code, based on where you saved your sample json file.
+
+    ```python
+    scoring_file = "./sample_score.json" 
+    response = workspace_ml_client.online_endpoints.invoke(
+        endpoint_name=online_endpoint_name,
+        deployment_name="demo",
+        request_file=scoring_file,
+    )
+    response_json = json.loads(response)
+    print(json.dumps(response_json, indent=2))
+    ```
 
 ## Configure Autoscaling
 
-To configure autoscaling for deployments, you can go to Azure Portal, locate the Azure resource typed `Machine learning online deployment` in the resource group of the AI project, and use Scaling menu under Setting. For more information on autoscaling, see [Autoscale online endpoints](/azure/machine-learning/how-to-autoscale-endpoints) in the Azure Machine Learning documentation. 
+To configure autoscaling for deployments, you can go to Azure portal, locate the Azure resource typed `Machine learning online deployment` in the resource group of the AI project, and use Scaling menu under Setting. For more information on autoscaling, see [Autoscale online endpoints](/azure/machine-learning/how-to-autoscale-endpoints) in the Azure Machine Learning documentation. 
 
 ## Delete the deployment endpoint
 
@@ -163,7 +161,7 @@ To delete deployments in Azure AI Foundry portal, select the **Delete** button o
 
 To deploy and perform inferencing with real-time endpoints, you consume Virtual Machine (VM) core quota that is assigned to your subscription on a per-region basis. When you sign up for Azure AI Foundry, you receive a default VM quota for several VM families available in the region. You can continue to create deployments until you reach your quota limit. Once that happens, you can request for a quota increase.  
 
-## Next steps
+## Related content
 
 - Learn more about what you can do in [Azure AI Foundry](../what-is-ai-foundry.md)
 - Get answers to frequently asked questions in the [Azure AI FAQ article](../faq.yml)