Skip to content

Commit 4a0cbd9

Browse files
authored
Merge pull request #108780 from Blackmist/tags-for-profile
adding a note and doing some grammar clean up
2 parents 0f88376 + 8934fef commit 4a0cbd9

File tree

1 file changed

+19
-7
lines changed

1 file changed

+19
-7
lines changed

articles/machine-learning/how-to-deploy-and-where.md

Lines changed: 19 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -163,25 +163,25 @@ For more information on working with models trained outside Azure Machine Learni
163163
## Single versus multi-model endpoints
164164
Azure ML supports deploying single or multiple models behind a single endpoint.
165165

166-
Multi-model endpoints use a shared container to host multiple models. This helps to reduce overhead costs, improves utilization and enables you to chain modules together into ensembles. Models you specify in your deployment script are mounted and made available on the disk of the serving container - you can load them into memory on demand and score based on the specific model being requested at scoring time.
166+
Multi-model endpoints use a shared container to host multiple models. This helps to reduce overhead costs, improves utilization, and enables you to chain modules together into ensembles. Models you specify in your deployment script are mounted and made available on the disk of the serving container - you can load them into memory on demand and score based on the specific model being requested at scoring time.
167167

168-
For an E2E example which shows how to use multiple models behind a single containerized endpoint, see [this example](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/deployment/deploy-multi-model)
168+
For an E2E example, which shows how to use multiple models behind a single containerized endpoint, see [this example](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/deployment/deploy-multi-model)
169169

170170
## Prepare to deploy
171171

172172
To deploy the model as a service, you need the following components:
173173

174174
* **Define inference environment**. This environment encapsulates the dependencies required to run your model for inference.
175175
* **Define scoring code**. This script accepts requests, scores the requests by using the model, and returns the results.
176-
* **Define inference configuration**. The inference configuration specifies the the environment configuration, entry script, and other components needed to run the model as a service.
176+
* **Define inference configuration**. The inference configuration specifies the environment configuration, entry script, and other components needed to run the model as a service.
177177

178178
Once you have the necessary components, you can profile the service that will be created as a result of deploying your model to understand its CPU and memory requirements.
179179

180180
### 1. Define inference environment
181181

182182
An inference configuration describes how to set up the web-service containing your model. It's used later, when you deploy the model.
183183

184-
Inference configuration uses Azure Machine Learning environments to define the software dependencies needed for your deployment. Environments allow you to create, manage, and reuse the software dependencies required for training and deployment. You can create an environment from custom dependency files or use one of the curated Azure Machine Learning environments. The following YAML is an example of a Conda dependencies file for inference. Please note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service. If you want to use automatic schema generation, your entry script must also import the `inference-schema` packages.
184+
Inference configuration uses Azure Machine Learning environments to define the software dependencies needed for your deployment. Environments allow you to create, manage, and reuse the software dependencies required for training and deployment. You can create an environment from custom dependency files or use one of the curated Azure Machine Learning environments. The following YAML is an example of a Conda dependencies file for inference. Note that you must indicate azureml-defaults with verion >= 1.0.45 as a pip dependency, because it contains the functionality needed to host the model as a web service. If you want to use automatic schema generation, your entry script must also import the `inference-schema` packages.
185185

186186
```YAML
187187
name: project_environment
@@ -273,7 +273,7 @@ These types are currently supported:
273273
* `pyspark`
274274
* Standard Python object
275275

276-
To use schema generation, include the open source `inference-schema` package in your dependencies file. For more information on this package, see [https://github.com/Azure/InferenceSchema](https://github.com/Azure/InferenceSchema). Define the input and output sample formats in the `input_sample` and `output_sample` variables, which represent the request and response formats for the web service. Use these samples in the input and output function decorators on the `run()` function. The following scikit-learn example uses schema generation.
276+
To use schema generation, include the open-source `inference-schema` package in your dependencies file. For more information on this package, see [https://github.com/Azure/InferenceSchema](https://github.com/Azure/InferenceSchema). Define the input and output sample formats in the `input_sample` and `output_sample` variables, which represent the request and response formats for the web service. Use these samples in the input and output function decorators on the `run()` function. The following scikit-learn example uses schema generation.
277277

278278
##### Example entry script
279279

@@ -419,15 +419,15 @@ For information on using a custom Docker image with an inference configuration,
419419

420420
Once you have registered your model and prepared the other components necessary for its deployment, you can determine the CPU and memory the deployed service will need. Profiling tests the service that runs your model and returns information such as the CPU usage, memory usage, and response latency. It also provides a recommendation for the CPU and memory based on resource usage.
421421

422-
In order to profile your model you will need:
422+
In order to profile your model, you will need:
423423
* A registered model.
424424
* An inference configuration based on your entry script and inference environment definition.
425425
* A single column tabular dataset, where each row contains a string representing sample request data.
426426

427427
> [!IMPORTANT]
428428
> At this point we only support profiling of services that expect their request data to be a string, for example: string serialized json, text, string serialized image, etc. The content of each row of the dataset (string) will be put into the body of the HTTP request and sent to the service encapsulating the model for scoring.
429429

430-
Below is an example of how you can construct an input dataset to profile a service which expects its incoming request data to contain serialized json. In this case we created a dataset based one hundred instances of the same request data content. In real world scenarios we suggest that you use larger datasets containing various inputs, especially if your model resource usage/behavior is input dependent.
430+
Below is an example of how you can construct an input dataset to profile a service that expects its incoming request data to contain serialized json. In this case, we created a dataset based one hundred instances of the same request data content. In real world scenarios we suggest that you use larger datasets containing various inputs, especially if your model resource usage/behavior is input dependent.
431431

432432
```python
433433
import json
@@ -491,6 +491,18 @@ The following command demonstrates how to profile a model by using the CLI:
491491
az ml model profile -g <resource-group-name> -w <workspace-name> --inference-config-file <path-to-inf-config.json> -m <model-id> --idi <input-dataset-id> -n <unique-name>
492492
```
493493

494+
> [!TIP]
495+
> To persist the information returned by profiling, use tags or properties for the model. Using tags or properties stores the data with the model in the model registry. The following examples demonstrate adding a new tag containing the `requestedCpu` and `requestedMemoryInGb` information:
496+
>
497+
> ```python
498+
> model.add_tags({'requestedCpu': details['requestedCpu'],
499+
> 'requestedMemoryInGb': details['requestedMemoryInGb']})
500+
> ```
501+
>
502+
> ```azurecli-interactive
503+
> az ml model profile -g <resource-group-name> -w <workspace-name> --i <model-id> --add-tag requestedCpu=1 --add-tag requestedMemoryInGb=0.5
504+
> ```
505+
494506
## Deploy to target
495507

496508
Deployment uses the inference configuration deployment configuration to deploy the models. The deployment process is similar regardless of the compute target. Deploying to AKS is slightly different because you must provide a reference to the AKS cluster.

0 commit comments

Comments
 (0)