You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/prompt-flow/how-to-deploy-to-code.md
+22-22Lines changed: 22 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,11 +17,11 @@ ms.date: 05/08/2024
17
17
18
18
# Deploy a flow to online endpoint for real-time inference with CLI
19
19
20
-
In this article, you'll learn to deploy your flow to a [managed online endpoint](../concept-endpoints-online.md#managed-online-endpoints-vs-kubernetes-online-endpoints) or a [Kubernetes online endpoint](../concept-endpoints-online.md#managed-online-endpoints-vs-kubernetes-online-endpoints) for use in real-time inferencing with Azure Machine Learning v2 CLI.
20
+
In this article, you learn to deploy your flow to a [managed online endpoint](../concept-endpoints-online.md#managed-online-endpoints-vs-kubernetes-online-endpoints) or a [Kubernetes online endpoint](../concept-endpoints-online.md#managed-online-endpoints-vs-kubernetes-online-endpoints) for use in real-time inferencing with Azure Machine Learning v2 CLI.
21
21
22
-
Before beginning make sure that you have tested your flow properly, and feel confident that it's ready to be deployed to production. To learn more about testing your flow, see [test your flow](how-to-bulk-test-evaluate-flow.md). After testing your flow you'll learn how to create managed online endpoint and deployment, and how to use the endpoint for real-time inferencing.
22
+
Before beginning make sure that you have tested your flow properly, and feel confident that it's ready to be deployed to production. To learn more about testing your flow, see [test your flow](how-to-bulk-test-evaluate-flow.md). After testing your flow you learn how to create managed online endpoint and deployment, and how to use the endpoint for real-time inferencing.
23
23
24
-
- This article will cover how to use the CLI experience.
24
+
- This article covers how to use the CLI experience.
25
25
- The Python SDK isn't covered in this article. See the GitHub sample notebook instead. To use the Python SDK, you must have The Python SDK v2 for Azure Machine Learning. To learn more, see [Install the Python SDK v2 for Azure Machine Learning](/python/api/overview/azure/ai-ml-readme).
26
26
27
27
> [!IMPORTANT]
@@ -33,20 +33,20 @@ Before beginning make sure that you have tested your flow properly, and feel con
33
33
34
34
- The Azure CLI and the Azure Machine Learning extension to the Azure CLI. For more information, see [Install, set up, and use the CLI (v2)](../how-to-configure-cli.md).
35
35
- An Azure Machine Learning workspace. If you don't have one, use the steps in the [Quickstart: Create workspace resources article](../quickstart-create-resources.md) to create one.
36
-
- Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the owner or contributor role for the Azure Machine Learning workspace, or a custom role allowing "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/". If you use studio to create/manage online endpoints/deployments, you'll need an additional permission "Microsoft.Resources/deployments/write" from the resource group owner. For more information, see [Manage access to an Azure Machine Learning workspace](../how-to-assign-roles.md).
36
+
- Azure role-based access controls (Azure RBAC) are used to grant access to operations in Azure Machine Learning. To perform the steps in this article, your user account must be assigned the owner or contributor role for the Azure Machine Learning workspace, or a custom role allowing "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/". If you use studio to create/manage online endpoints/deployments, you need another permission "Microsoft.Resources/deployments/write" from the resource group owner. For more information, see [Manage access to an Azure Machine Learning workspace](../how-to-assign-roles.md).
37
37
38
38
> [!NOTE]
39
-
> Managed online endpoint only supports managed virtual network. If your workspace is in custom vnet, you can deploy to Kubernetes online endpoint, or [deploy to other platforms such as Docker](https://microsoft.github.io/promptflow/how-to-guides/deploy-a-flow/index.html).
39
+
> Managed online endpoint only supports managed virtual network. If your workspace is in custom virtual network, you can deploy to Kubernetes online endpoint, or [deploy to other platforms such as Docker](https://microsoft.github.io/promptflow/how-to-guides/deploy-a-flow/index.html).
40
40
41
41
### Virtual machine quota allocation for deployment
42
42
43
43
For managed online endpoints, Azure Machine Learning reserves 20% of your compute resources for performing upgrades. Therefore, if you request a given number of instances in a deployment, you must have a quota for `ceil(1.2 * number of instances requested for deployment) * number of cores for the VM SKU` available to avoid getting an error. For example, if you request 10 instances of a Standard_DS3_v2 VM (that comes with four cores) in a deployment, you should have a quota for 48 cores (12 instances four cores) available. To view your usage and request quota increases, see [View your usage and quotas in the Azure portal](../how-to-manage-quotas.md#view-your-usage-and-quotas-in-the-azure-portal).
44
44
45
45
## Get the flow ready for deploy
46
46
47
-
Each flow will have a folder which contains codes/prompts, definition and other artifacts of the flow. If you have developed your flow with UI, you can download the flow folder from the flow details page. If you have developed your flow with CLI or SDK, you should have the flow folder already.
47
+
Each flow has a folder which contains codes/prompts, definition, and other artifacts of the flow. If you have developed your flow with UI, you can download the flow folder from the flow details page. If you have developed your flow with CLI or SDK, you should have the flow folder already.
48
48
49
-
This article will use the [sample flow "basic-chat"](https://github.com/Azure/azureml-examples/tree/main/cli/generative-ai/promptflow/basic-chat) as an example to deploy to Azure Machine Learning managed online endpoint.
49
+
This article uses the [sample flow "basic-chat"](https://github.com/Azure/azureml-examples/tree/main/cli/generative-ai/promptflow/basic-chat) as an example to deploy to Azure Machine Learning managed online endpoint.
50
50
51
51
> [!IMPORTANT]
52
52
>
@@ -68,7 +68,7 @@ In the online deployment, you can either refer to a registered model, or specify
68
68
Following is a model definition example for a chat flow.
69
69
70
70
> [!NOTE]
71
-
> If your flow is not a chat flow, then you don't need to add these `properties`.
71
+
> If your flow isn't a chat flow, then you don't need to add these `properties`.
| `$schema` | (Optional) The YAML schema. To see all available options in the YAML file, you can view the schema in the preceding code snippet in a browser. |
138
138
| `name` | The name of the endpoint. |
139
139
| `auth_mode` | Use `key` for key-based authentication. Use `aml_token` for Azure Machine Learning token-based authentication. To get the most recent token, use the `az ml online-endpoint get-credentials` command. |
140
-
|`property: enforce_access_to_default_secret_stores` (preview)|- By default the endpoint will use system-asigned identity. This property only works for system-assigned identity. <br> - This property means if you have the connection secrets reader permission, the endpoint system-assigned identity will be auto-assigned Azure Machine Learning Workspace Connection Secrets Reader role of the workspace, so that the endpoint can access connections correctly when performing inferencing. <br> - By default this property is `disabled``.|
140
+
|`property: enforce_access_to_default_secret_stores` (preview)|- By default the endpoint uses system-asigned identity. This property only works for system-assigned identity. <br> - This property means if you have the connection secrets reader permission, the endpoint system-assigned identity is auto-assigned Azure Machine Learning Workspace Connection Secrets Reader role of the workspace, so that the endpoint can access connections correctly when performing inferencing. <br> - By default this property is `disabled``.|
141
141
142
-
If you create a Kubernetes online endpoint, you need to specify the following additional attributes:
142
+
If you create a Kubernetes online endpoint, you need to specify the following attributes:
@@ -155,7 +155,7 @@ For more configurations of endpoint, see [managed online endpoint schema](../ref
155
155
156
156
By default, when you create an online endpoint, a system-assigned managed identity is automatically generated for you. You can also specify an existing user-assigned managed identity for the endpoint.
157
157
158
-
If you want to use user-assigned identity, you can specify the following additional attributes in the `endpoint.yaml`:
158
+
If you want to use user-assigned identity, you can specify the following attributes in the `endpoint.yaml`:
159
159
160
160
```yaml
161
161
identity:
@@ -268,14 +268,14 @@ environment_variables:
268
268
| Environment | The environment to host the model and code. It contains: <br> - `image`<br> - `inference_config`: is used to build a serving container for online deployments, including `liveness route`, `readiness_route`, and `scoring_route` . |
269
269
| Instance type | The VM size to use for the deployment. For the list of supported sizes, see [Managed online endpoints SKU list](../reference-managed-online-endpoints-vm-sku-list.md). |
270
270
| Instance count | The number of instances to use for the deployment. Base the value on the workload you expect. For high availability, we recommend that you set the value to at least `3`. We reserve an extra 20% for performing upgrades. For more information, see [limits for online endpoints](../how-to-manage-quotas.md#azure-machine-learning-online-endpoints-and-batch-endpoints). |
271
-
| Environment variables | Following environment variables need to be set for endpoints deployed from a flow: <br> - (required) `PRT_CONFIG_OVERRIDE`: for pulling connections from workspace <br> - (optional) `PROMPTFLOW_RESPONSE_INCLUDED_FIELDS:`: When there are multiple fields in the response, using this env variable will filter the fields to expose in the response. <br> For example, if there are two flow outputs: "answer", "context", and if you only want to have "answer" in the endpoint response, you can set this env variable to '["answer"]'. |
271
+
| Environment variables | Following environment variables need to be set for endpoints deployed from a flow: <br> - (required) `PRT_CONFIG_OVERRIDE`: for pulling connections from workspace <br> - (optional) `PROMPTFLOW_RESPONSE_INCLUDED_FIELDS:`: When there are multiple fields in the response, using this env variable filters the fields to expose in the response. <br> For example, if there are two flow outputs: "answer", "context", and if you only want to have "answer" in the endpoint response, you can set this env variable to '["answer"]'. |
272
272
273
273
> [!IMPORTANT]
274
274
>
275
275
> If your flow folder has a `requirements.txt` file which contains the dependencies needed to execute the flow, you need to follow the [deploy with a custom environment steps](#deploy-with-a-custom-environment) to build the custom environment including the dependencies.
276
276
277
277
278
-
If you create a Kubernetes online deployment, you need to specify the following additional attributes:
278
+
If you create a Kubernetes online deployment, you need to specify the following attributes:
279
279
280
280
| Attribute | Description |
281
281
|--|--|
@@ -303,11 +303,11 @@ az ml online-deployment create --file blue-deployment.yml --all-traffic
303
303
304
304
> [!TIP]
305
305
>
306
-
> If you prefer not to block your CLI console, you can add the flag `--no-wait` to the command. However, this will stop the interactive display of the deployment status.
306
+
> If you prefer not to block your CLI console, you can add the flag `--no-wait` to the command. However, this stops the interactive display of the deployment status.
307
307
308
308
> [!IMPORTANT]
309
309
>
310
-
> The `--all-traffic` flag in the above `az ml online-deployment create` allocates 100% of the endpoint traffic to the newly created blue deployment. Though this is helpful for development and testing purposes, for production, you might want to open traffic to the new deployment through an explicit command. For example, `az ml online-endpoint update -n $ENDPOINT_NAME --traffic "blue=100"`.
310
+
> The `--all-traffic` flag in the previous `az ml online-deployment create` allocates 100% of the endpoint traffic to the newly created blue deployment. Though this is helpful for development and testing purposes, for production, you might want to open traffic to the new deployment through an explicit command. For example, `az ml online-endpoint update -n $ENDPOINT_NAME --traffic "blue=100"`.
311
311
312
312
### Check status of the endpoint and deployment
313
313
@@ -364,7 +364,7 @@ environment_variables:
364
364
my_connection: <override_connection_name>
365
365
```
366
366
367
-
If you want to override a specific field of the connection, you can override by adding environment variables with naming pattern `<connection_name>_<field_name>`. For example, if your flow uses a connection named `my_connection` with a configuration key called `chat_deployment_name`, the serving backend will attempt to retrieve `chat_deployment_name` from the environment variable 'MY_CONNECTION_CHAT_DEPLOYMENT_NAME' by default. If the environment variable is not set, it will use the original value from the flow definition.
367
+
If you want to override a specific field of the connection, you can override by adding environment variables with naming pattern `<connection_name>_<field_name>`. For example, if your flow uses a connection named `my_connection` with a configuration key called `chat_deployment_name`, the serving backend attempts to retrieve `chat_deployment_name` from the environment variable 'MY_CONNECTION_CHAT_DEPLOYMENT_NAME' by default. If the environment variable isn't set, it uses the original value from the flow definition.
368
368
369
369
370
370
**Option 2**: override by referring to asset
@@ -380,7 +380,7 @@ environment_variables:
380
380
381
381
### Deploy with a custom environment
382
382
383
-
This section will show you how to use a docker build context to specify the environment for your deployment, assuming you have knowledge of [Docker](https://www.docker.com/) and [Azure Machine Learning environments](../concept-environments.md).
383
+
This section shows you how to use a docker build context to specify the environment for your deployment, assuming you have knowledge of [Docker](https://www.docker.com/) and [Azure Machine Learning environments](../concept-environments.md).
384
384
385
385
1. In your local environment, create a folder named `image_build_with_reqirements` contains following files:
386
386
@@ -391,7 +391,7 @@ This section will show you how to use a docker build context to specify the envi
391
391
```
392
392
- The `requirements.txt` should be inherited from the flow folder, which has been used to track the dependencies of the flow.
393
393
394
-
- The `Dockerfile` content is as following:
394
+
- The `Dockerfile` content is similar to the following text:
395
395
396
396
```
397
397
FROM mcr.microsoft.com/azureml/promptflow/promptflow-runtime:latest
@@ -430,7 +430,7 @@ environment_variables:
430
430
431
431
### Configure concurrency for deployment
432
432
433
-
When deploying your flow to online deployment, there are two environment variables, which you configure for concurrency: `PROMPTFLOW_WORKER_NUM`and `PROMPTFLOW_WORKER_THREADS`. Besides, you'll also need to set the `max_concurrent_requests_per_instance` parameter.
433
+
When you deploy your flow to online deployment, there are two environment variables, which you configure for concurrency: `PROMPTFLOW_WORKER_NUM`and `PROMPTFLOW_WORKER_THREADS`. Besides, you'll also need to set the `max_concurrent_requests_per_instance` parameter.
434
434
435
435
Below is an example of how to configure in the `deployment.yaml` file.
436
436
@@ -478,8 +478,8 @@ environment_variables:
478
478
```
479
479
480
480
> [!NOTE]
481
-
> If you only set `app_insights_enabled: true` but your workspace does not have a linked Application Insights, your deployment will not fail but there will be no data collected.
482
-
> If you specify both `app_insights_enabled: true` and the above environment variable at the same time, the tracing data and metrics will be sent to workspace linked Application Insights. Hence, if you want to specify a different Application Insights, you only need to keep the environment variable.
481
+
> If you only set `app_insights_enabled: true` but your workspace doesn't have a linked Application Insights, your deployment won't fail but there will be no data collected.
482
+
> If you specify both `app_insights_enabled: true` and the above environment variable at the same time, the tracing data and metrics are sent to workspace linked Application Insights. Hence, if you want to specify a different Application Insights, you only need to keep the environment variable.
483
483
484
484
485
485
## Common errors
@@ -497,7 +497,7 @@ request_settings:
497
497
>
498
498
> The 300,000 ms timeout _only works for managed online deployments from prompt flow_. The maximum for a non-prompt flow managed online endpoint is 180 seconds.
499
499
>
500
-
> You need to make sure that you have added properties for your model as below (either inline model specification in the deployment yaml or standalone model specification yaml) to indicate this is a deployment from prompt flow.
500
+
> You need to make sure that you have added properties for your model as follows (either inline model specification in the deployment yaml or standalone model specification yaml) to indicate this is a deployment from prompt flow.
0 commit comments