You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/integrate-synapseml.md
+28-23Lines changed: 28 additions & 23 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,21 +16,26 @@ recommendations: false
16
16
17
17
# Use Azure OpenAI with large datasets
18
18
19
-
Azure OpenAI can be used to solve a large number of natural language tasks through prompting the completion API. To make it easier to scale your prompting workflows from a few examples to large datasets of examples, Azure OpenAI Service is integrated with the distributed machine learning library [SynapseML](https://www.microsoft.com/research/blog/synapseml-a-simple-multilingual-and-massively-parallel-machine-learning-library/). This integration makes it easy to use the [Apache Spark](https://spark.apache.org/) distributed computing framework to process millions of prompts with Azure OpenAI Service. This tutorial shows how to apply large language models at a distributed scale by using Azure OpenAI and Azure Synapse Analytics.
19
+
Azure OpenAI can be used to solve a large number of natural language tasks through prompting the completion API. To make it easier to scale your prompting workflows from a few examples to large datasets of examples, Azure OpenAI Service is integrated with the distributed machine learning library [SynapseML](https://www.microsoft.com/research/blog/synapseml-a-simple-multilingual-and-massively-parallel-machine-learning-library/). This integration makes it easy to use the [Apache Spark](https://spark.apache.org/) distributed computing framework to process millions of prompts with Azure OpenAI Service.
20
+
21
+
This tutorial shows how to apply large language models at a distributed scale by using Azure OpenAI and Azure Synapse Analytics.
20
22
21
23
## Prerequisites
22
24
23
25
- An Azure subscription. <ahref="https://azure.microsoft.com/free/cognitive-services"target="_blank">Create one for free</a>.
24
-
- Access granted to Azure OpenAI in the desired Azure subscription.
26
+
27
+
- Access granted to Azure OpenAI in your Azure subscription.
28
+
25
29
- An Azure OpenAI resource. [Create a resource](create-resource.md?pivots=web-portal#create-a-resource).
30
+
26
31
- An Apache Spark cluster with SynapseML installed.
27
32
- Create a [serverless Apache Spark pool](../../../synapse-analytics/get-started-analyze-spark.md#create-a-serverless-apache-spark-pool).
28
33
- To install SynapseML for your Apache Spark cluster, see [Install SynapseML](#step-3-install-synapseml).
29
34
30
35
> [!NOTE]
31
36
> Currently, you must submit an application to access Azure OpenAI Service. To apply for access, complete <ahref="https://aka.ms/oai/access"target="_blank">this form</a>. If you need assistance, open an issue on this repo to contact Microsoft.
32
37
33
-
Microsoft recommends that you [create an Azure Synapse workspace](../../../synapse-analytics/get-started-create-workspace.md). However, you can also use Azure Databricks, Azure HDInsight, Spark on Kubernetes, or the Python environment with the `pyspark` package.
38
+
We recommend that you [create an Azure Synapse workspace](../../../synapse-analytics/get-started-create-workspace.md). However, you can also use Azure Databricks, Azure HDInsight, Spark on Kubernetes, or the Python environment with the `pyspark` package.
34
39
35
40
## Use example code as a notebook
36
41
@@ -41,25 +46,23 @@ To use the example code in this article with your Apache Spark cluster, complete
41
46
1. Install SynapseML for your Apache Spark cluster in your notebook.
42
47
1. Configure the notebook to work with your Azure OpenAI service resource.
43
48
44
-
### Step 1: Prepare your notebook
45
-
46
-
You can create a new notebook in your Apache Spark platform, or you can download an existing notebook and import it into Azure Synapse. You can add each snippet of example code in this article as a new cell in your notebook.
49
+
### Prepare your notebook
47
50
48
-
#### (Optional) Download demonstration notebook
51
+
You can create a new notebook in your Apache Spark platform, or you can import an existing notebook. After you have a notebook in place, you can add each snippet of example code in this article as a new cell in your notebook.
49
52
50
-
As an option, you can download a demonstration notebook and connect it with your workspace.
53
+
- To use a notebook in Azure Synapse Analytics, see [Create, develop, and maintain Synapse notebooks in Azure Synapse Analytics](../../../synapse-analytics/spark/apache-spark-development-using-notebooks.md).
51
54
52
-
1. Download [this demonstration notebook](https://github.com/microsoft/SynapseML/blob/master/docs/Explore%20Algorithms/OpenAI/OpenAI.ipynb). During the download process, select **Raw**, and then save the file.
55
+
- To use a notebook in Azure Databricks, see [Manage notebooks for Azure Databricks](/azure/databricks/notebooks/notebooks-manage.md).
53
56
54
-
1. Import the notebook [into the Synapse Workspace](../../../synapse-analytics/spark/apache-spark-development-using-notebooks.md#create-a-notebook), or if you're using Azure Databricks, import the notebook [into the Azure Databricks Workspace](/azure/databricks/notebooks/notebooks-manage#create-a-notebook).
57
+
- (Optional) Download [this demonstration notebook](https://github.com/microsoft/SynapseML/blob/master/docs/Explore%20Algorithms/OpenAI/OpenAI.ipynb) and connect it with your workspace. During the download process, select **Raw**, and then save the file.
55
58
56
-
### Step 2: Connect your cluster
59
+
### Connect your cluster
57
60
58
61
When you have a notebook ready, connect or _attach_ your notebook to an Apache Spark cluster.
59
62
60
-
### Step 3: Install SynapseML
63
+
### Install SynapseML
61
64
62
-
To run the exercises, you need to install SynapseML on your Apache Spark cluster. For more information about the installation process, see the link for Azure Synapse at the bottom of [the SynapseML website](https://microsoft.github.io/SynapseML/).
65
+
To run the exercises, you need to install SynapseML on your Apache Spark cluster. For more information, see [Install SynapseML](https://microsoft.github.io/SynapseML/docs/Get%20Started/Install%20SynapseML/) on the [SynapseML website](https://microsoft.github.io/SynapseML/).
63
66
64
67
To install SynapseML, create a new cell at the top of your notebook and run the following code.
65
68
@@ -98,7 +101,7 @@ To install SynapseML, create a new cell at the top of your notebook and run the
98
101
99
102
The connection process can take several minutes.
100
103
101
-
### Step 4: Configure the notebook
104
+
### Configure the notebook
102
105
103
106
Create a new code cell and run the following code to configure the notebook for your service. Set the `resource_name`, `deployment_name`, `location`, and `key` variables to the corresponding values for your Azure OpenAI resource.
104
107
@@ -138,7 +141,9 @@ df = spark.createDataFrame(
138
141
139
142
## Create the OpenAICompletion Apache Spark client
140
143
141
-
To apply the Azure OpenAI Completion service to the dataframe, create an `OpenAICompletion` object that serves as a distributed client. Parameters of the service can be set either with a single value, or by a column of the dataframe with the appropriate setters on the `OpenAICompletion` object. In this example, you set the `maxTokens` parameter to 200. A token is around four characters, and this limit applies to the sum of the prompt and the result. You also set the `promptCol` parameter with the name of the prompt column in the dataframe.
144
+
To apply the Azure OpenAI Completion service to the dataframe, create an `OpenAICompletion` object that serves as a distributed client. Parameters of the service can be set either with a single value, or by a column of the dataframe with the appropriate setters on the `OpenAICompletion` object.
145
+
146
+
In this example, you set the `maxTokens` parameter to 200. A token is around four characters, and this limit applies to the sum of the prompt and the result. You also set the `promptCol` parameter with the name of the prompt column in the dataframe.
142
147
143
148
```python
144
149
from synapse.ml.cognitive import OpenAICompletion
@@ -157,7 +162,7 @@ completion = (
157
162
158
163
## Transform the dataframe with the OpenAICompletion client
159
164
160
-
After you have the dataframe and completion client, you can transform your input dataset and add a column called `completions` with all of the information the service adds. In this example, you select only the text for simplicity.
165
+
After you have the dataframe and completion client, you can transform your input dataset and add a column called `completions` with all of the information the service adds. In this example, select only the text for simplicity.
The following image shows example output with completions in Azure Synapse Analytics Studio. Keep in mind that completions text can vary so your output might look different.
175
+
The following image shows example output with completions in Azure Synapse Analytics Studio. Keep in mind that completions text can vary. Your output might look different.
171
176
172
177
:::image type="content" source="../media/how-to/synapse-studio-transform-dataframe-output.png" alt-text="Screenshot that shows sample completions in Azure Synapse Analytics Studio." border="false":::
173
178
174
179
## Explore other usage scenarios
175
180
176
-
Let's review some other use case scenarios for working with Azure OpenAI Service and large datasets.
181
+
Here are some other use cases for working with Azure OpenAI Service and large datasets.
177
182
178
183
### Improve throughput with request batching
179
184
180
185
You can use Azure OpenAI Service with large datasets to improve throughput with request batching. In the previous example, you make several requests to the service, one for each prompt. To complete multiple prompts in a single request, you can use batch mode.
181
186
182
-
In the `OpenAICompletion` object, instead of setting the **Prompt** column to "prompt," you can specify "batchPrompt" to create the **BatchPrompt** column. To support this method, you create a dataframe with a list of prompts per row.
187
+
In the `OpenAICompletion` object, instead of setting the **Prompt** column to `"prompt"`, you can specify `"batchPrompt"` to create the **batchPrompt** column. To support this method, create a dataframe with a list of prompts per row.
183
188
184
189
> [!NOTE]
185
-
> There's currently a limit of 20 prompts in a single request and a limit of 2048 "tokens," or approximately 1500 words.
190
+
> There's currently a limit of 20 prompts in a single request and a limit of 2048 tokens, or approximately 1500 words.
Next, you create the `OpenAICompletion` object. Rather than setting the "prompt" column, you set the "batchPrompt" column if your column is of type `Array[String]`.
201
+
Next, create the `OpenAICompletion` object. Rather than setting the `"prompt"` column, set the `"batchPrompt"` column if your column is of type `Array[String]`.
197
202
198
203
```python
199
204
batch_completion = (
@@ -220,7 +225,7 @@ The following image shows example output with completions for multiple prompts i
220
225
:::image type="content" source="../media/how-to/synapse-studio-request-batch-output.png" alt-text="Screenshot that shows completions for multiple prompts in a single request in Azure Synapse Analytics Studio." border="false":::
221
226
222
227
> [!NOTE]
223
-
> There's currently a limit of 20 prompts in a single request and a limit of 2048 "tokens," or approximately 1500 words.
228
+
> There's currently a limit of 20 prompts in a single request and a limit of 2048 tokens, or approximately 1500 words.
224
229
225
230
### Use an automatic mini-batcher
226
231
@@ -246,7 +251,7 @@ The following image shows example output for an automatic mini-batcher that tran
246
251
247
252
### Prompt engineering for translation
248
253
249
-
Azure OpenAI can solve many different natural language tasks through [prompt engineering](completions.md). In this example, you can prompt for language translation:
254
+
Azure OpenAI can solve many different natural language tasks through _prompt engineering_. For more information, see [Learn how to generate or manipulate text](completions.md). In this example, you can prompt for language translation:
0 commit comments