Skip to content

Commit 983d112

Browse files
committed
Update code and text
1 parent ae549df commit 983d112

File tree

1 file changed

+62
-75
lines changed

1 file changed

+62
-75
lines changed

articles/machine-learning/how-to-use-batch-model-openai-embeddings.md

Lines changed: 62 additions & 75 deletions
Original file line numberDiff line numberDiff line change
@@ -125,35 +125,19 @@ You can configure the identity of the compute instance to have access to the Azu
125125
az role assignment create --role "Cognitive Services User" --assignee $PRINCIPAL_ID --scope $RESOURCE_ID
126126
```
127127
128+
If you get an error message about not finding a user or service principal in the graph database for your principal, check your role assignments. You might need to assign yourself a Global Administrator or Application Administrator role.
129+
128130
# [Access keys](#tab/keys)
129131
130-
You can configure the batch deployment to use the OpenAI resource access key to get predictions. Copy the access key from your account, and keep it for later steps.
132+
You can configure the batch deployment to use the access key of your OpenAI resource to get predictions. Copy the access key from your account, and keep it for later steps.
131133
132134
---
133135
136+
## Register the OpenAI model
134137
135-
### Register the OpenAI model
136-
137-
Model deployments in batch endpoints can only deploy registered models. You can use MLflow models with the flavor OpenAI to create a model in your workspace referencing a deployment in Azure OpenAI.
138-
139-
1. Create an MLflow model in the workspace's models registry pointing to your OpenAI deployment with the model you want to use. Use MLflow SDK to create the model:
140-
141-
> [!TIP]
142-
> In the cloned repository in the folder **model** you already have an MLflow model to generate embeddings based on ADA-002 model in case you want to skip this step.
138+
Model deployments in batch endpoints can deploy only registered models. You can use MLflow models with the flavor OpenAI to create a model in your workspace that references a deployment in Azure OpenAI.
143139
144-
```python
145-
import mlflow
146-
import openai
147-
148-
engine = openai.Model.retrieve("text-embedding-ada-002")
149-
150-
model_info = mlflow.openai.save_model(
151-
path="model",
152-
model="text-embedding-ada-002",
153-
engine=engine.id,
154-
task=openai.Embedding,
155-
)
156-
```
140+
In the cloned repository, the **model** folder contains an MLflow model that generates embeddings based on the ADA-002 model.
157141
158142
1. Register the model in the workspace:
159143
@@ -165,10 +149,15 @@ Model deployments in batch endpoints can only deploy registered models. You can
165149
166150
[!notebook-python[] (~/azureml-examples-main/sdk/python/endpoints/batch/deploy-models/openai-embeddings/deploy-and-test.ipynb?name=register_model)]
167151
168-
169152
## Create a deployment for an OpenAI model
170153
171-
1. First, let's create the endpoint that hosts the model. Decide on the name of the endpoint:
154+
To deploy the OpenAI model, you need to create an endpoint, an environment, a scoring script, and a batch deployment. The following sections show you how to create these components.
155+
156+
### Create an endpoint
157+
158+
An endpoint is needed to host the model. Take the following steps to create an endpoint:
159+
160+
1. Set up a variable to store your endpoint name. Replace the name in the following code with one that's unique within the region of your resource group.
172161
173162
# [Azure CLI](#tab/cli)
174163
@@ -178,14 +167,11 @@ Model deployments in batch endpoints can only deploy registered models. You can
178167
179168
[!notebook-python[] (~/azureml-examples-main/sdk/python/endpoints/batch/deploy-models/openai-embeddings/deploy-and-test.ipynb?name=name_endpoint)]
180169
181-
182170
1. Configure the endpoint:
183171
184172
# [Azure CLI](#tab/cli)
185173
186-
The following YAML file defines a batch endpoint:
187-
188-
__endpoint.yml__
174+
Create a YAML file called *endpoint.yml* that contains the following lines. Replace the `name` value with your endpoint name.
189175
190176
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/openai-embeddings/endpoint.yml":::
191177
@@ -203,79 +189,82 @@ Model deployments in batch endpoints can only deploy registered models. You can
203189
204190
[!notebook-python[] (~/azureml-examples-main/sdk/python/endpoints/batch/deploy-models/openai-embeddings/deploy-and-test.ipynb?name=create_endpoint)]
205191
206-
1. Our scoring script uses some specific libraries that are not part of the standard OpenAI SDK so we need to create an environment that have them. Here, we configure an environment with a base image a conda YAML.
192+
### Configure an environment
207193
208-
# [Azure CLI](#tab/cli)
194+
The scoring script in this example uses some libraries that aren't part of the standard OpenAI SDK. Create an environment that contains a base image and also a conda YAML file to capture those dependencies:
209195
210-
__environment/environment.yml__
196+
# [Azure CLI](#tab/cli)
211197
212-
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/openai-embeddings/environment/environment.yml":::
198+
The *environment* folder contains a file named *environment.yml* that configures the environment.
213199
214-
# [Python SDK](#tab/python)
200+
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/openai-embeddings/environment/environment.yml":::
215201
216-
[!notebook-python[] (~/azureml-examples-main/sdk/python/endpoints/batch/deploy-models/openai-embeddings/deploy-and-test.ipynb?name=configure_environment)]
217-
218-
---
202+
# [Python SDK](#tab/python)
203+
204+
[!notebook-python[] (~/azureml-examples-main/sdk/python/endpoints/batch/deploy-models/openai-embeddings/deploy-and-test.ipynb?name=configure_environment)]
219205
220-
The conda YAML looks as follows:
206+
---
221207
222-
__conda.yaml__
208+
The conda YAML file, *conda.yml*, contains the following lines:
223209
224-
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/openai-embeddings/environment/conda.yaml":::
210+
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/openai-embeddings/environment/conda.yaml":::
225211
226-
1. Let's create a scoring script that performs the execution. In Batch Endpoints, MLflow models don't require a scoring script. However, in this case we want to extend a bit the capabilities of batch endpoints by:
212+
### Create a scoring script
227213
228-
> [!div class="checklist"]
229-
> * Allow the endpoint to read multiple data types, including `csv`, `tsv`, `parquet`, `json`, `jsonl`, `arrow`, and `txt`.
230-
> * Add some validations to ensure the MLflow model used has an OpenAI flavor on it.
231-
> * Format the output in `jsonl` format.
232-
> * Add an environment variable `AZUREML_BI_TEXT_COLUMN` to control (optionally) which input field you want to generate embeddings for.
214+
This example uses a scoring script that performs the execution. In batch endpoints, MLflow models don't require a scoring script. But this example extends the capabilities of batch endpoints by:
233215
234-
> [!TIP]
235-
> By default, MLflow will use the first text column available in the input data to generate embeddings from. Use the environment variable `AZUREML_BI_TEXT_COLUMN` with the name of an existing column in the input dataset to change the column if needed. Leave it blank if the default behavior works for you.
236-
237-
The scoring script looks as follows:
216+
- Allowing the endpoint to read multiple data types, including `csv`, `tsv`, `parquet`, `json`, `jsonl`, `arrow`, and `txt` formats.
217+
- Adding some validations to ensure the MLflow model has an OpenAI flavor.
218+
- Formatting the output in `jsonl` format.
219+
- Adding an environment variable `AZUREML_BI_TEXT_COLUMN` to optionally control which input field you want to generate embeddings for.
238220
239-
__code/batch_driver.py__
221+
> [!TIP]
222+
> By default, MLflow generates embeddings from the first text column that's available in the input data. If you want to use a different column, set the environment variable `AZUREML_BI_TEXT_COLUMN` to the name of your preferred column. Leave that variable blank if the default behavior works for you.
240223
241-
:::code language="python" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/openai-embeddings/code/batch_driver.py" :::
224+
The scoring script, *code/batch_driver.py*, contains the following lines:
242225
243-
1. One the scoring script is created, it's time to create a batch deployment for it. We use environment variables to configure the OpenAI deployment. Particularly we use the following keys:
226+
:::code language="python" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/openai-embeddings/code/batch_driver.py" :::
244227
245-
* `OPENAI_API_BASE` is the URL of the Azure OpenAI resource to use.
246-
* `OPENAI_API_VERSION` is the version of the API you plan to use.
247-
* `OPENAI_API_TYPE` is the type of API and authentication you want to use.
228+
### Create a batch deployment
248229
249-
# [Microsoft Entra authentication](#tab/ad)
230+
To configure the OpenAI deployment, you use environment variables. Specifically, you use the following keys:
250231
251-
The environment variable `OPENAI_API_TYPE="azure_ad"` instructs OpenAI to use Active Directory authentication and hence no key is required to invoke the OpenAI deployment. The identity of the cluster is used instead.
252-
253-
# [Access keys](#tab/keys)
232+
- `OPENAI_API_BASE` is the URL of your Azure OpenAI resource.
233+
- `OPENAI_API_VERSION` is the version of the API that you plan to use.
234+
- `OPENAI_API_TYPE` is the type of API and authentication that you want to use.
254235
255-
To use access keys instead of Microsoft Entra authentication, we need the following environment variables:
236+
# [Microsoft Entra authentication](#tab/ad)
237+
238+
If you use the environment variable `OPENAI_API_TYPE` with a value of `azure_ad`, OpenAI uses Microsoft Entra authentication. No key is required to invoke the OpenAI deployment. Instead, the identity of the cluster is used.
239+
240+
# [Access keys](#tab/keys)
256241
257-
* Use `OPENAI_API_TYPE="azure"`
258-
* Use `OPENAI_API_KEY="<YOUR_AZURE_OPENAI_KEY>"`
242+
To use an access key instead of Microsoft Entra authentication, you use the following environment variables and values:
259243
260-
1. Once we decided on the authentication and the environment variables, we can use them in the deployment. The following example shows how to use Microsoft Entra authentication particularly:
244+
* `OPENAI_API_TYPE: "azure"`
245+
* `OPENAI_API_KEY: "<your-Azure-OpenAI-key>"`
246+
247+
---
248+
249+
1. Update the values of the authentication and environment variables in the deployment configuration. The following example uses Microsoft Entra authentication:
261250
262251
# [Azure CLI](#tab/cli)
263252
264-
__deployment.yml__
253+
The *deployment.yml* file configures the deployment:
265254
266255
:::code language="yaml" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/openai-embeddings/deployment.yml" highlight="26-28":::
267256
268257
> [!TIP]
269-
> Notice the `environment_variables` section where we indicate the configuration for the OpenAI deployment. The value for `OPENAI_API_BASE` will be set later in the creation command so you don't have to edit the YAML configuration file.
258+
> The `environment_variables` section provides the configuration for the OpenAI deployment. The `OPENAI_API_BASE` value is set when the deployment is created, so you don't have to edit that value in the YAML configuration file.
270259
271260
# [Python SDK](#tab/python)
272261
273262
[!notebook-python[] (~/azureml-examples-main/sdk/python/endpoints/batch/deploy-models/openai-embeddings/deploy-and-test.ipynb?name=configure_deployment)]
274263
275264
> [!TIP]
276-
> Notice the `environment_variables` section where we indicate the configuration for the OpenAI deployment.
265+
> The `environment_variables` section provides the configuration for the OpenAI deployment.
277266
278-
1. Now, let's create the deployment.
267+
1. Create the deployment.
279268
280269
# [Azure CLI](#tab/cli)
281270
@@ -285,17 +274,17 @@ Model deployments in batch endpoints can only deploy registered models. You can
285274
286275
[!notebook-python[] (~/azureml-examples-main/sdk/python/endpoints/batch/deploy-models/openai-embeddings/deploy-and-test.ipynb?name=create_deployment)]
287276
288-
Finally, set the new deployment as the default one:
277+
Set the new deployment as the default one:
289278
290279
[!notebook-python[] (~/azureml-examples-main/sdk/python/endpoints/batch/deploy-models/openai-embeddings/deploy-and-test.ipynb?name=set_default_deployment)]
291280
292-
1. At this point, our batch endpoint is ready to be used.
281+
The batch endpoint is ready for use.
293282
294283
## Test the deployment
295284
296-
For testing our endpoint, we are going to use a sample of the dataset [BillSum: A Corpus for Automatic Summarization of US Legislation](https://arxiv.org/abs/1910.00523). This sample is included in the repository in the folder data.
285+
For testing the endpoint, you use a sample of the dataset [BillSum: A Corpus for Automatic Summarization of US Legislation](https://arxiv.org/abs/1910.00523). This sample is included in the repository, in the *data* folder.
297286
298-
1. Create a data input for this model:
287+
1. Set up the input data:
299288
300289
# [Azure CLI](#tab/cli)
301290
@@ -350,12 +339,10 @@ For testing our endpoint, we are going to use a sample of the dataset [BillSum:
350339

351340
[!notebook-python[] (~/azureml-examples-main/sdk/python/endpoints/batch/deploy-models/openai-embeddings/deploy-and-test.ipynb?name=get_job)]
352341

353-
1. Once the deployment is finished, we can download the predictions:
342+
1. After the deployment is finished, download the predictions:
354343

355344
# [Azure CLI](#tab/cli)
356345

357-
To download the predictions, use the following command:
358-
359346
:::code language="azurecli" source="~/azureml-examples-main/cli/endpoints/batch/deploy-models/openai-embeddings/deploy-and-run.sh" ID="download_outputs" :::
360347

361348
# [Python SDK](#tab/python)
@@ -370,7 +357,7 @@ For testing our endpoint, we are going to use a sample of the dataset [BillSum:
370357

371358
[!notebook-python[] (~/azureml-examples-main/sdk/python/endpoints/batch/deploy-models/openai-embeddings/deploy-and-test.ipynb?name=download_outputs)]
372359

373-
1. The output predictions look like the following.
360+
1. Use the following code to view the output predictions:
374361

375362
```python
376363
import pandas as pd

0 commit comments

Comments
 (0)