Skip to content

Commit ad03a73

Browse files
committed
review edits
1 parent 5427f2a commit ad03a73

File tree

1 file changed

+28
-23
lines changed

1 file changed

+28
-23
lines changed

articles/ai-services/openai/how-to/integrate-synapseml.md

Lines changed: 28 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -16,21 +16,26 @@ recommendations: false
1616

1717
# Use Azure OpenAI with large datasets
1818

19-
Azure OpenAI can be used to solve a large number of natural language tasks through prompting the completion API. To make it easier to scale your prompting workflows from a few examples to large datasets of examples, Azure OpenAI Service is integrated with the distributed machine learning library [SynapseML](https://www.microsoft.com/research/blog/synapseml-a-simple-multilingual-and-massively-parallel-machine-learning-library/). This integration makes it easy to use the [Apache Spark](https://spark.apache.org/) distributed computing framework to process millions of prompts with Azure OpenAI Service. This tutorial shows how to apply large language models at a distributed scale by using Azure OpenAI and Azure Synapse Analytics.
19+
Azure OpenAI can be used to solve a large number of natural language tasks through prompting the completion API. To make it easier to scale your prompting workflows from a few examples to large datasets of examples, Azure OpenAI Service is integrated with the distributed machine learning library [SynapseML](https://www.microsoft.com/research/blog/synapseml-a-simple-multilingual-and-massively-parallel-machine-learning-library/). This integration makes it easy to use the [Apache Spark](https://spark.apache.org/) distributed computing framework to process millions of prompts with Azure OpenAI Service.
20+
21+
This tutorial shows how to apply large language models at a distributed scale by using Azure OpenAI and Azure Synapse Analytics.
2022

2123
## Prerequisites
2224

2325
- An Azure subscription. <a href="https://azure.microsoft.com/free/cognitive-services" target="_blank">Create one for free</a>.
24-
- Access granted to Azure OpenAI in the desired Azure subscription.
26+
27+
- Access granted to Azure OpenAI in your Azure subscription.
28+
2529
- An Azure OpenAI resource. [Create a resource](create-resource.md?pivots=web-portal#create-a-resource).
30+
2631
- An Apache Spark cluster with SynapseML installed.
2732
- Create a [serverless Apache Spark pool](../../../synapse-analytics/get-started-analyze-spark.md#create-a-serverless-apache-spark-pool).
2833
- To install SynapseML for your Apache Spark cluster, see [Install SynapseML](#step-3-install-synapseml).
2934

3035
> [!NOTE]
3136
> Currently, you must submit an application to access Azure OpenAI Service. To apply for access, complete <a href="https://aka.ms/oai/access" target="_blank">this form</a>. If you need assistance, open an issue on this repo to contact Microsoft.
3237
33-
Microsoft recommends that you [create an Azure Synapse workspace](../../../synapse-analytics/get-started-create-workspace.md). However, you can also use Azure Databricks, Azure HDInsight, Spark on Kubernetes, or the Python environment with the `pyspark` package.
38+
We recommend that you [create an Azure Synapse workspace](../../../synapse-analytics/get-started-create-workspace.md). However, you can also use Azure Databricks, Azure HDInsight, Spark on Kubernetes, or the Python environment with the `pyspark` package.
3439

3540
## Use example code as a notebook
3641

@@ -41,25 +46,23 @@ To use the example code in this article with your Apache Spark cluster, complete
4146
1. Install SynapseML for your Apache Spark cluster in your notebook.
4247
1. Configure the notebook to work with your Azure OpenAI service resource.
4348

44-
### Step 1: Prepare your notebook
45-
46-
You can create a new notebook in your Apache Spark platform, or you can download an existing notebook and import it into Azure Synapse. You can add each snippet of example code in this article as a new cell in your notebook.
49+
### Prepare your notebook
4750

48-
#### (Optional) Download demonstration notebook
51+
You can create a new notebook in your Apache Spark platform, or you can import an existing notebook. After you have a notebook in place, you can add each snippet of example code in this article as a new cell in your notebook.
4952

50-
As an option, you can download a demonstration notebook and connect it with your workspace.
53+
- To use a notebook in Azure Synapse Analytics, see [Create, develop, and maintain Synapse notebooks in Azure Synapse Analytics](../../../synapse-analytics/spark/apache-spark-development-using-notebooks.md).
5154

52-
1. Download [this demonstration notebook](https://github.com/microsoft/SynapseML/blob/master/docs/Explore%20Algorithms/OpenAI/OpenAI.ipynb). During the download process, select **Raw**, and then save the file.
55+
- To use a notebook in Azure Databricks, see [Manage notebooks for Azure Databricks](/azure/databricks/notebooks/notebooks-manage.md).
5356

54-
1. Import the notebook [into the Synapse Workspace](../../../synapse-analytics/spark/apache-spark-development-using-notebooks.md#create-a-notebook), or if you're using Azure Databricks, import the notebook [into the Azure Databricks Workspace](/azure/databricks/notebooks/notebooks-manage#create-a-notebook).
57+
- (Optional) Download [this demonstration notebook](https://github.com/microsoft/SynapseML/blob/master/docs/Explore%20Algorithms/OpenAI/OpenAI.ipynb) and connect it with your workspace. During the download process, select **Raw**, and then save the file.
5558

56-
### Step 2: Connect your cluster
59+
### Connect your cluster
5760

5861
When you have a notebook ready, connect or _attach_ your notebook to an Apache Spark cluster.
5962

60-
### Step 3: Install SynapseML
63+
### Install SynapseML
6164

62-
To run the exercises, you need to install SynapseML on your Apache Spark cluster. For more information about the installation process, see the link for Azure Synapse at the bottom of [the SynapseML website](https://microsoft.github.io/SynapseML/).
65+
To run the exercises, you need to install SynapseML on your Apache Spark cluster. For more information, see [Install SynapseML](https://microsoft.github.io/SynapseML/docs/Get%20Started/Install%20SynapseML/) on the [SynapseML website](https://microsoft.github.io/SynapseML/).
6366

6467
To install SynapseML, create a new cell at the top of your notebook and run the following code.
6568

@@ -98,7 +101,7 @@ To install SynapseML, create a new cell at the top of your notebook and run the
98101

99102
The connection process can take several minutes.
100103

101-
### Step 4: Configure the notebook
104+
### Configure the notebook
102105

103106
Create a new code cell and run the following code to configure the notebook for your service. Set the `resource_name`, `deployment_name`, `location`, and `key` variables to the corresponding values for your Azure OpenAI resource.
104107

@@ -138,7 +141,9 @@ df = spark.createDataFrame(
138141

139142
## Create the OpenAICompletion Apache Spark client
140143

141-
To apply the Azure OpenAI Completion service to the dataframe, create an `OpenAICompletion` object that serves as a distributed client. Parameters of the service can be set either with a single value, or by a column of the dataframe with the appropriate setters on the `OpenAICompletion` object. In this example, you set the `maxTokens` parameter to 200. A token is around four characters, and this limit applies to the sum of the prompt and the result. You also set the `promptCol` parameter with the name of the prompt column in the dataframe.
144+
To apply the Azure OpenAI Completion service to the dataframe, create an `OpenAICompletion` object that serves as a distributed client. Parameters of the service can be set either with a single value, or by a column of the dataframe with the appropriate setters on the `OpenAICompletion` object.
145+
146+
In this example, you set the `maxTokens` parameter to 200. A token is around four characters, and this limit applies to the sum of the prompt and the result. You also set the `promptCol` parameter with the name of the prompt column in the dataframe.
142147

143148
```python
144149
from synapse.ml.cognitive import OpenAICompletion
@@ -157,7 +162,7 @@ completion = (
157162

158163
## Transform the dataframe with the OpenAICompletion client
159164

160-
After you have the dataframe and completion client, you can transform your input dataset and add a column called `completions` with all of the information the service adds. In this example, you select only the text for simplicity.
165+
After you have the dataframe and completion client, you can transform your input dataset and add a column called `completions` with all of the information the service adds. In this example, select only the text for simplicity.
161166

162167
```python
163168
from pyspark.sql.functions import col
@@ -167,22 +172,22 @@ display(completed_df.select(
167172
col("prompt"), col("error"), col("completions.choices.text").getItem(0).alias("text")))
168173
```
169174

170-
The following image shows example output with completions in Azure Synapse Analytics Studio. Keep in mind that completions text can vary so your output might look different.
175+
The following image shows example output with completions in Azure Synapse Analytics Studio. Keep in mind that completions text can vary. Your output might look different.
171176

172177
:::image type="content" source="../media/how-to/synapse-studio-transform-dataframe-output.png" alt-text="Screenshot that shows sample completions in Azure Synapse Analytics Studio." border="false":::
173178

174179
## Explore other usage scenarios
175180

176-
Let's review some other use case scenarios for working with Azure OpenAI Service and large datasets.
181+
Here are some other use cases for working with Azure OpenAI Service and large datasets.
177182

178183
### Improve throughput with request batching
179184

180185
You can use Azure OpenAI Service with large datasets to improve throughput with request batching. In the previous example, you make several requests to the service, one for each prompt. To complete multiple prompts in a single request, you can use batch mode.
181186

182-
In the `OpenAICompletion` object, instead of setting the **Prompt** column to "prompt," you can specify "batchPrompt" to create the **BatchPrompt** column. To support this method, you create a dataframe with a list of prompts per row.
187+
In the `OpenAICompletion` object, instead of setting the **Prompt** column to `"prompt"`, you can specify `"batchPrompt"` to create the **batchPrompt** column. To support this method, create a dataframe with a list of prompts per row.
183188

184189
> [!NOTE]
185-
> There's currently a limit of 20 prompts in a single request and a limit of 2048 "tokens," or approximately 1500 words.
190+
> There's currently a limit of 20 prompts in a single request and a limit of 2048 tokens, or approximately 1500 words.
186191
187192
```python
188193
batch_df = spark.createDataFrame(
@@ -193,7 +198,7 @@ batch_df = spark.createDataFrame(
193198
).toDF("batchPrompt")
194199
```
195200

196-
Next, you create the `OpenAICompletion` object. Rather than setting the "prompt" column, you set the "batchPrompt" column if your column is of type `Array[String]`.
201+
Next, create the `OpenAICompletion` object. Rather than setting the `"prompt"` column, set the `"batchPrompt"` column if your column is of type `Array[String]`.
197202

198203
```python
199204
batch_completion = (
@@ -220,7 +225,7 @@ The following image shows example output with completions for multiple prompts i
220225
:::image type="content" source="../media/how-to/synapse-studio-request-batch-output.png" alt-text="Screenshot that shows completions for multiple prompts in a single request in Azure Synapse Analytics Studio." border="false":::
221226

222227
> [!NOTE]
223-
> There's currently a limit of 20 prompts in a single request and a limit of 2048 "tokens," or approximately 1500 words.
228+
> There's currently a limit of 20 prompts in a single request and a limit of 2048 tokens, or approximately 1500 words.
224229
225230
### Use an automatic mini-batcher
226231

@@ -246,7 +251,7 @@ The following image shows example output for an automatic mini-batcher that tran
246251

247252
### Prompt engineering for translation
248253

249-
Azure OpenAI can solve many different natural language tasks through [prompt engineering](completions.md). In this example, you can prompt for language translation:
254+
Azure OpenAI can solve many different natural language tasks through _prompt engineering_. For more information, see [Learn how to generate or manipulate text](completions.md). In this example, you can prompt for language translation:
250255

251256
```python
252257
translate_df = spark.createDataFrame(

0 commit comments

Comments
 (0)