You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-deploy-and-where.md
+19-7Lines changed: 19 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -163,25 +163,25 @@ For more information on working with models trained outside Azure Machine Learni
163
163
## Single versus multi-model endpoints
164
164
Azure ML supports deploying single or multiple models behind a single endpoint.
165
165
166
-
Multi-model endpoints use a shared container to host multiple models. This helps to reduce overhead costs, improves utilization and enables you to chain modules together into ensembles. Models you specify in your deployment script are mounted and made available on the disk of the serving container - you can load them into memory on demand and score based on the specific model being requested at scoring time.
166
+
Multi-model endpoints use a shared container to host multiple models. This helps to reduce overhead costs, improves utilization,and enables you to chain modules together into ensembles. Models you specify in your deployment script are mounted and made available on the disk of the serving container - you can load them into memory on demand and score based on the specific model being requested at scoring time.
167
167
168
-
For an E2E example which shows how to use multiple models behind a single containerized endpoint, see [this example](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/deployment/deploy-multi-model)
168
+
For an E2E example, which shows how to use multiple models behind a single containerized endpoint, see [this example](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/deployment/deploy-multi-model)
169
169
170
170
## Prepare to deploy
171
171
172
172
To deploy the model as a service, you need the following components:
173
173
174
174
***Define inference environment**. This environment encapsulates the dependencies required to run your model for inference.
175
175
***Define scoring code**. This script accepts requests, scores the requests by using the model, and returns the results.
176
-
***Define inference configuration**. The inference configuration specifies the the environment configuration, entry script, and other components needed to run the model as a service.
176
+
***Define inference configuration**. The inference configuration specifies the environment configuration, entry script, and other components needed to run the model as a service.
177
177
178
178
Once you have the necessary components, you can profile the service that will be created as a result of deploying your model to understand its CPUand memory requirements.
179
179
180
180
### 1. Define inference environment
181
181
182
182
An inference configuration describes how to set up the web-service containing your model. It's used later, when you deploy the model.
183
183
184
-
Inference configuration uses Azure Machine Learning environments to define the software dependencies needed for your deployment. Environments allow you to create, manage, and reuse the software dependencies required for training and deployment. You can create an environment from custom dependency files or use one of the curated Azure Machine Learning environments. The following YAMLis an example of a Conda dependencies filefor inference. Please note that you must indicate azureml-defaults with verion >=1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service. If you want to use automatic schema generation, your entry script must also import the `inference-schema` packages.
184
+
Inference configuration uses Azure Machine Learning environments to define the software dependencies needed for your deployment. Environments allow you to create, manage, and reuse the software dependencies required for training and deployment. You can create an environment from custom dependency files or use one of the curated Azure Machine Learning environments. The following YAMLis an example of a Conda dependencies filefor inference. Note that you must indicate azureml-defaults with verion >=1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service. If you want to use automatic schema generation, your entry script must also import the `inference-schema` packages.
185
185
186
186
```YAML
187
187
name: project_environment
@@ -273,7 +273,7 @@ These types are currently supported:
273
273
*`pyspark`
274
274
* Standard Python object
275
275
276
-
To use schema generation, include the opensource `inference-schema` package in your dependencies file. For more information on this package, see [https://github.com/Azure/InferenceSchema](https://github.com/Azure/InferenceSchema). Define the inputand output sample formats in the `input_sample`and`output_sample` variables, which represent the request and response formats for the web service. Use these samples in the inputand output function decorators on the `run()` function. The following scikit-learn example uses schema generation.
276
+
To use schema generation, include the open-source `inference-schema` package in your dependencies file. For more information on this package, see [https://github.com/Azure/InferenceSchema](https://github.com/Azure/InferenceSchema). Define the inputand output sample formats in the `input_sample`and`output_sample` variables, which represent the request and response formats for the web service. Use these samples in the inputand output function decorators on the `run()` function. The following scikit-learn example uses schema generation.
277
277
278
278
##### Example entry script
279
279
@@ -419,15 +419,15 @@ For information on using a custom Docker image with an inference configuration,
419
419
420
420
Once you have registered your model and prepared the other components necessary for its deployment, you can determine the CPUand memory the deployed service will need. Profiling tests the service that runs your model and returns information such as the CPU usage, memory usage, and response latency. It also provides a recommendation for the CPUand memory based on resource usage.
421
421
422
-
In order to profile your model you will need:
422
+
In order to profile your model, you will need:
423
423
* A registered model.
424
424
* An inference configuration based on your entry script and inference environment definition.
425
425
* A single column tabular dataset, where each row contains a string representing sample request data.
426
426
427
427
> [!IMPORTANT]
428
428
> At this point we only support profiling of services that expect their request data to be a string, for example: string serialized json, text, string serialized image, etc. The content of each row of the dataset (string) will be put into the body of the HTTP request and sent to the service encapsulating the model for scoring.
429
429
430
-
Below is an example of how you can construct an input dataset to profile a service which expects its incoming request data to contain serialized json. In this case we created a dataset based one hundred instances of the same request data content. In real world scenarios we suggest that you use larger datasets containing various inputs, especially if your model resource usage/behavior isinput dependent.
430
+
Below is an example of how you can construct an input dataset to profile a service that expects its incoming request data to contain serialized json. In this case, we created a dataset based one hundred instances of the same request data content. In real world scenarios we suggest that you use larger datasets containing various inputs, especially if your model resource usage/behavior isinput dependent.
431
431
432
432
```python
433
433
import json
@@ -491,6 +491,18 @@ The following command demonstrates how to profile a model by using the CLI:
491
491
az ml model profile -g <resource-group-name>-w <workspace-name>--inference-config-file<path-to-inf-config.json>-m <model-id>--idi <input-dataset-id>-n <unique-name>
492
492
```
493
493
494
+
> [!TIP]
495
+
> To persist the information returned by profiling, use tags or properties for the model. Using tags or properties stores the data with the model in the model registry. The following examples demonstrate adding a new tag containing the `requestedCpu`and`requestedMemoryInGb` information:
> az ml model profile -g <resource-group-name>-w <workspace-name>--i <model-id>--add-tag requestedCpu=1--add-tag requestedMemoryInGb=0.5
504
+
>```
505
+
494
506
## Deploy to target
495
507
496
508
Deployment uses the inference configuration deployment configuration to deploy the models. The deployment process is similar regardless of the compute target. Deploying to AKSis slightly different because you must provide a reference to the AKS cluster.
0 commit comments