Skip to content

Commit 4295a1b

Browse files
authored
Update how-to-nlp-processing-batch.md
1 parent 3be7500 commit 4295a1b

File tree

1 file changed

+4
-7
lines changed

1 file changed

+4
-7
lines changed

articles/machine-learning/how-to-nlp-processing-batch.md

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ Batch Endpoints can be used for processing tabular data, but also any other file
2121

2222
## About this sample
2323

24-
The model we are going to work with was built using the popular library transformers from HuggingFace along with [a pre-trained model from Facebook with the BART architecture](https://huggingface.co/facebook/bart-large-cnn). It was introduced in the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation](https://arxiv.org/abs/1910.13461). This model has the following constrains that are important to keep in mind for deployment:
24+
The model we are going to work with was built using the popular library transformers from HuggingFace along with [a pre-trained model from Facebook with the BART architecture](https://huggingface.co/facebook/bart-large-cnn). It was introduced in the paper [BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation](https://arxiv.org/abs/1910.13461). This model has the following constrains which are important to keep in mind for deployment:
2525

2626
* It can work with sequences up to 1024 tokens.
2727
* It is trained for summarization of text in English.
@@ -162,15 +162,15 @@ We are going to create a batch endpoint named `text-summarization-batch` where t
162162
ml_client.batch_endpoints.begin_create_or_update(endpoint)
163163
```
164164

165-
### Creating the deployment
165+
## Creating the deployment
166166

167167
Let's create the deployment that will host the model:
168168

169169
1. We need to create a scoring script that can read the CSV files provided by the batch deployment and return the scores of the model with the summary. The following script does the following:
170170

171171
> [!div class="checklist"]
172-
> * Indicates an `init` function that load the model using `transformers`. Notice that the tokenizer of the model is loaded separately to account for the limitation in the sequence lenghs of the model we are currently using.
173-
> Notice that we are doing some model optimizations too to improve the performance using `optimum`.
172+
> * Indicates an `init` function that detects the hardware configuration (CPU vs GPU) and loads the model accordingly. Both the model and the tokenizer are loaded in global variables. We are not using a `pipeline` object from HuggingFace to account for the limitation in the sequence lenghs of the model we are currently using.
173+
> * Notice that we are doing performing model optimizations to improve the performance using `optimum` and accelerate libraries. If the model or hardware doesn't support it, we will run the deployment without such optimizations.
174174
> * Indicates a `run` function that is executed for each mini-batch the batch deployment provides.
175175
> * The `run` function read the entire batch using the `datasets` library. The text we need to summarize is on the column `text`.
176176
> * The `run` method iterates over each of the rows of the text and run the prediction. Since this is a very expensive model, running the prediction over entire files will result in an out-of-memory exception. Notice that the model is not execute with the `pipeline` object from `transformers`. This is done to account for long sequences of text and the limitation of 1024 tokens in the underlying model we are using.
@@ -183,7 +183,6 @@ Let's create the deployment that will host the model:
183183
> [!TIP]
184184
> Although files are provided in mini-batches by the deployment, this scoring script processes one row at a time. This is a common pattern when dealing with expensive models (like transformers) as trying to load the entire batch and send it to the model at once may result in high-memory pressure on the batch executor (OOM exeptions).
185185
186-
187186
1. We need to indicate over which environment we are going to run the deployment. In our case, our model runs on `Torch` and it requires the libraries `transformers`, `accelerate`, and `optimium` from HuggingFace. Azure Machine Learning already has an environment with Torch and GPU support available. We are just going to add a couple of dependencies in a `conda.yml` file.
188187

189188
__environment/conda.yml__
@@ -211,7 +210,6 @@ Let's create the deployment that will host the model:
211210
image="mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.8-cudnn8-ubuntu22.04:latest",
212211
)
213212
```
214-
215213
---
216214

217215
> [!IMPORTANT]
@@ -236,7 +234,6 @@ Let's create the deployment that will host the model:
236234
)
237235
ml_client.begin_create_or_update(compute_cluster)
238236
```
239-
240237
---
241238

242239
> [!NOTE]

0 commit comments

Comments
 (0)