Skip to content

Commit 2f2c9b7

Browse files
Merge pull request #233911 from santiagxf/santiagxf/azureml-batch-huggingface
Update how-to-nlp-processing-batch.md
2 parents 9553a2b + 6497c1e commit 2f2c9b7

File tree

1 file changed

+34
-15
lines changed

1 file changed

+34
-15
lines changed

articles/machine-learning/how-to-nlp-processing-batch.md

Lines changed: 34 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ ms.custom: devplatv2
1717

1818
[!INCLUDE [cli v2](../../includes/machine-learning-dev-v2.md)]
1919

20-
Batch Endpoints can be used for processing tabular data, but also any other file type like text. Those deployments are supported in both MLflow and custom models. In this tutorial we will learn how to deploy a model that can perform text summarization of long sequences of text using a model from HuggingFace.
20+
Batch Endpoints can be used for processing tabular data that contain text. Those deployments are supported in both MLflow and custom models. In this tutorial we will learn how to deploy a model that can perform text summarization of long sequences of text using a model from HuggingFace.
2121

2222
## About this sample
2323

@@ -27,13 +27,26 @@ The model we are going to work with was built using the popular library transfor
2727
* It is trained for summarization of text in English.
2828
* We are going to use Torch as a backend.
2929

30-
The information in this article is based on code samples contained in the [azureml-examples](https://github.com/azure/azureml-examples) repository. To run the commands locally without having to copy/paste YAML and other files, clone the repo and then change directories to the `cli/endpoints/batch/deploy-models/huggingface-text-summarization` if you are using the Azure CLI or `sdk/python/endpoints/batch/deploy-models/huggingface-text-summarization` if you are using our SDK for Python.
30+
The information in this article is based on code samples contained in the [azureml-examples](https://github.com/azure/azureml-examples) repository. To run the commands locally without having to copy/paste YAML and other files, clone the repo and then change directories to the [`cli/endpoints/batch/deploy-models/huggingface-text-summarization`](https://github.com/azure/azureml-examples/tree/main/cli/endpoints/batch/deploy-models/huggingface-text-summarization) if you are using the Azure CLI or [`sdk/python/endpoints/batch/deploy-models/huggingface-text-summarization`](https://github.com/azure/azureml-examples/tree/main/sdk/python/endpoints/batch/deploy-models/huggingface-text-summarization) if you are using our SDK for Python.
31+
32+
# [Azure CLI](#tab/cli)
3133

3234
```azurecli
3335
git clone https://github.com/Azure/azureml-examples --depth 1
3436
cd azureml-examples/cli/endpoints/batch/deploy-models/huggingface-text-summarization
3537
```
3638

39+
# [Python](#tab/python)
40+
41+
In a Jupyter notebook:
42+
43+
```python
44+
!git clone https://github.com/Azure/azureml-examples --depth 1
45+
!cd azureml-examples/sdk/python/endpoints/batch/deploy-models/huggingface-text-summarization
46+
```
47+
48+
---
49+
3750
### Follow along in Jupyter Notebooks
3851

3952
You can follow along this sample in a Jupyter Notebook. In the cloned repository, open the notebook: [text-summarization-batch.ipynb](https://github.com/Azure/azureml-examples/blob/main/sdk/python/endpoints/batch/deploy-models/huggingface-text-summarization/text-summarization-batch.ipynb).
@@ -46,7 +59,7 @@ You can follow along this sample in a Jupyter Notebook. In the cloned repository
4659

4760
First, let's connect to Azure Machine Learning workspace where we're going to work on.
4861

49-
# [Azure CLI](#tab/azure-cli)
62+
# [Azure CLI](#tab/cli)
5063

5164
```azurecli
5265
az account set --subscription <subscription>
@@ -80,7 +93,13 @@ ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group,
8093

8194
### Registering the model
8295

83-
Due to the size of the model, it hasn't been included in this repository. Instead, you can generate a local copy with the following code. A local copy of the model will be placed at `model`. We will use it during the course of this tutorial.
96+
Due to the size of the model, it hasn't been included in this repository. Instead, you can download a copy from the HuggingFace model's hub. You need the packages `transformers` and `torch` installed in the environment you are using.
97+
98+
```python
99+
%pip install transformers torch
100+
```
101+
102+
Use the following code to download the model to a folder `model`:
84103

85104
```python
86105
from transformers import pipeline
@@ -99,7 +118,7 @@ MODEL_NAME='bart-text-summarization'
99118
az ml model create --name $MODEL_NAME --path "model"
100119
```
101120

102-
# [Python](#tab/sdk)
121+
# [Python](#tab/python)
103122

104123
```python
105124
model_name = 'bart-text-summarization'
@@ -115,7 +134,7 @@ We are going to create a batch endpoint named `text-summarization-batch` where t
115134

116135
1. Decide on the name of the endpoint. The name of the endpoint will end-up in the URI associated with your endpoint. Because of that, __batch endpoint names need to be unique within an Azure region__. For example, there can be only one batch endpoint with the name `mybatchendpoint` in `westus2`.
117136

118-
# [Azure CLI](#tab/azure-cli)
137+
# [Azure CLI](#tab/cli)
119138

120139
In this case, let's place the name of the endpoint in a variable so we can easily reference it later.
121140

@@ -133,7 +152,7 @@ We are going to create a batch endpoint named `text-summarization-batch` where t
133152
134153
1. Configure your batch endpoint
135154
136-
# [Azure CLI](#tab/azure-cli)
155+
# [Azure CLI](#tab/cli)
137156
138157
The following YAML file defines a batch endpoint:
139158
@@ -156,7 +175,7 @@ We are going to create a batch endpoint named `text-summarization-batch` where t
156175
157176
:::code language="azurecli" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/huggingface-text-summarization/deploy-and-run.sh" ID="create_batch_endpoint" :::
158177
159-
# [Python](#tab/sdk)
178+
# [Python](#tab/python)
160179
161180
```python
162181
ml_client.batch_endpoints.begin_create_or_update(endpoint)
@@ -199,7 +218,7 @@ Let's create the deployment that will host the model:
199218

200219
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/huggingface-text-summarization/deployment.yml" range="7-10" :::
201220

202-
# [Python](#tab/sdk)
221+
# [Python](#tab/python)
203222

204223
Let's get a reference to the environment:
205224

@@ -217,7 +236,7 @@ Let's create the deployment that will host the model:
217236
218237
1. Each deployment runs on compute clusters. They support both [Azure Machine Learning Compute clusters (AmlCompute)](./how-to-create-attach-compute-cluster.md) or [Kubernetes clusters](./how-to-attach-kubernetes-anywhere.md). In this example, our model can benefit from GPU acceleration, which is why we will use a GPU cluster.
219238

220-
# [Azure CLI](#tab/azure-cli)
239+
# [Azure CLI](#tab/cli)
221240

222241
:::code language="azurecli" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/huggingface-text-summarization/deploy-and-run.sh" ID="create_compute" :::
223242

@@ -253,7 +272,7 @@ Let's create the deployment that will host the model:
253272

254273
:::code language="azurecli" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/huggingface-text-summarization/deploy-and-run.sh" ID="create_batch_deployment_set_default" :::
255274

256-
# [Python](#tab/sdk)
275+
# [Python](#tab/python)
257276

258277
To create a new deployment with the indicated environment and scoring script use the following code:
259278

@@ -298,7 +317,7 @@ Let's create the deployment that will host the model:
298317
az ml batch-endpoint update --name $ENDPOINT_NAME --set defaults.deployment_name=$DEPLOYMENT_NAME
299318
```
300319

301-
# [Python](#tab/sdk)
320+
# [Python](#tab/python)
302321

303322
```python
304323
endpoint.defaults.deployment_name = deployment.name
@@ -321,7 +340,7 @@ For testing our endpoint, we are going to use a sample of the dataset [BillSum:
321340
> [!NOTE]
322341
> The utility `jq` may not be installed on every installation. You can get instructions in [this link](https://stedolan.github.io/jq/download/).
323342
324-
# [Python](#tab/sdk)
343+
# [Python](#tab/python)
325344

326345
```python
327346
input = Input(type=AssetTypes.URI_FOLDER, path="data")
@@ -341,7 +360,7 @@ For testing our endpoint, we are going to use a sample of the dataset [BillSum:
341360

342361
:::code language="azurecli" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/huggingface-text-summarization/deploy-and-run.sh" ID="show_job_in_studio" :::
343362

344-
# [Python](#tab/sdk)
363+
# [Python](#tab/python)
345364

346365
```python
347366
ml_client.jobs.get(job.name)
@@ -357,7 +376,7 @@ For testing our endpoint, we are going to use a sample of the dataset [BillSum:
357376
az ml job download --name $JOB_NAME --output-name score --download-path .
358377
```
359378

360-
# [Python](#tab/sdk)
379+
# [Python](#tab/python)
361380

362381
```python
363382
ml_client.jobs.download(name=job.name, output_name='score', download_path='./')

0 commit comments

Comments
 (0)