Skip to content

Commit 5427f2a

Browse files
committed
edits
1 parent cda31da commit 5427f2a

File tree

1 file changed

+46
-27
lines changed

1 file changed

+46
-27
lines changed

articles/ai-services/openai/how-to/integrate-synapseml.md

Lines changed: 46 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Azure OpenAI can be used to solve a large number of natural language tasks throu
2525
- An Azure OpenAI resource. [Create a resource](create-resource.md?pivots=web-portal#create-a-resource).
2626
- An Apache Spark cluster with SynapseML installed.
2727
- Create a [serverless Apache Spark pool](../../../synapse-analytics/get-started-analyze-spark.md#create-a-serverless-apache-spark-pool).
28-
- To install SynapseML for your Apache Spark cluster, see [Install SynapseML](#install-synapseml).
28+
- To install SynapseML for your Apache Spark cluster, see [Install SynapseML](#step-3-install-synapseml).
2929

3030
> [!NOTE]
3131
> Currently, you must submit an application to access Azure OpenAI Service. To apply for access, complete <a href="https://aka.ms/oai/access" target="_blank">this form</a>. If you need assistance, open an issue on this repo to contact Microsoft.
@@ -59,29 +59,48 @@ When you have a notebook ready, connect or _attach_ your notebook to an Apache S
5959

6060
### Step 3: Install SynapseML
6161

62-
To run the exercises, you need to install SynapseML on your Apache Spark cluster. You complete this task in a code cell at the top of your notebook. For more information about the installation process, see the link for Azure Synapse at the bottom of [the SynapseML website](https://microsoft.github.io/SynapseML/).
63-
64-
To install SynapseML, create a new cell at the top of your notebook and run the following code:
65-
66-
```python
67-
%%configure -f
68-
{
69-
"name": "synapseml",
70-
"conf": {
71-
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.11.2-spark3.3",
72-
"spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
73-
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
74-
"spark.yarn.user.classpath.first": "true",
75-
"spark.sql.parquet.enableVectorizedReader": "false"
76-
}
77-
}
78-
```
62+
To run the exercises, you need to install SynapseML on your Apache Spark cluster. For more information about the installation process, see the link for Azure Synapse at the bottom of [the SynapseML website](https://microsoft.github.io/SynapseML/).
63+
64+
To install SynapseML, create a new cell at the top of your notebook and run the following code.
65+
66+
- For a **Spark3.2 pool**, use the following code:
67+
68+
```python
69+
%%configure -f
70+
{
71+
"name": "synapseml",
72+
"conf": {
73+
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.11.2,org.apache.spark:spark-avro_2.12:3.3.1",
74+
"spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
75+
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
76+
"spark.yarn.user.classpath.first": "true",
77+
"spark.sql.parquet.enableVectorizedReader": "false",
78+
"spark.sql.legacy.replaceDatabricksSparkAvro.enabled": "true"
79+
}
80+
}
81+
```
82+
83+
- For a **Spark3.3 pool**, use the following code:
84+
85+
```python
86+
%%configure -f
87+
{
88+
"name": "synapseml",
89+
"conf": {
90+
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.11.2-spark3.3",
91+
"spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
92+
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
93+
"spark.yarn.user.classpath.first": "true",
94+
"spark.sql.parquet.enableVectorizedReader": "false"
95+
}
96+
}
97+
```
7998

8099
The connection process can take several minutes.
81100

82101
### Step 4: Configure the notebook
83102

84-
After the top cell in your notebook, add a new cell to configure the notebook for your service by running the following code. Set the `resource_name`, `deployment_name`, `location`, and `key` variables to the corresponding values for your Azure OpenAI resource.
103+
Create a new code cell and run the following code to configure the notebook for your service. Set the `resource_name`, `deployment_name`, `location`, and `key` variables to the corresponding values for your Azure OpenAI resource.
85104

86105
```python
87106
import os
@@ -103,7 +122,7 @@ Now you're ready to start running the example code.
103122

104123
## Create a dataset of prompts
105124

106-
The next step is to create a dataframe consisting of a series of rows, with one prompt per row.
125+
The first step is to create a dataframe consisting of a series of rows, with one prompt per row.
107126

108127
You can also load data directly from Azure Data Lake Storage or other databases. For more information about loading and preparing Spark dataframes, see the [Apache Spark Data Sources](https://spark.apache.org/docs/latest/sql-data-sources.html).
109128

@@ -148,9 +167,9 @@ display(completed_df.select(
148167
col("prompt"), col("error"), col("completions.choices.text").getItem(0).alias("text")))
149168
```
150169

151-
The following image shows example output with completions for the transformed dataframe in Azure Synapse Analytics Studio. Keep in mind that completions text can vary so your output might look different.
170+
The following image shows example output with completions in Azure Synapse Analytics Studio. Keep in mind that completions text can vary so your output might look different.
152171

153-
:::image type="content" source="../media/how-to/synapse-studio-transform-dataframe-output.png" alt-text="Screenshot that shows sample completions for the transformed dataframe in Azure Synapse Analytics Studio." border="false":::
172+
:::image type="content" source="../media/how-to/synapse-studio-transform-dataframe-output.png" alt-text="Screenshot that shows sample completions in Azure Synapse Analytics Studio." border="false":::
154173

155174
## Explore other usage scenarios
156175

@@ -196,9 +215,9 @@ completed_batch_df = batch_completion.transform(batch_df).cache()
196215
display(completed_batch_df)
197216
```
198217

199-
The following image shows example output with completions for multiple prompts in a batch prompt request:
218+
The following image shows example output with completions for multiple prompts in a request:
200219

201-
:::image type="content" source="../media/how-to/synapse-studio-request-batch-output.png" alt-text="Screenshot that shows completions for multiple prompts in a batch prompt request in Azure Synapse Analytics Studio." border="false":::
220+
:::image type="content" source="../media/how-to/synapse-studio-request-batch-output.png" alt-text="Screenshot that shows completions for multiple prompts in a single request in Azure Synapse Analytics Studio." border="false":::
202221

203222
> [!NOTE]
204223
> There's currently a limit of 20 prompts in a single request and a limit of 2048 "tokens," or approximately 1500 words.
@@ -223,7 +242,7 @@ display(completed_autobatch_df)
223242

224243
The following image shows example output for an automatic mini-batcher that transposes data to row format:
225244

226-
:::image type="content" source="../media/how-to/synapse-studio-transpose-data-output.png" alt-text="Screenshot that shows completions for an automatic mini-batcher that transposes data to row format in Azure Synapse Analytics Studio." border="false":::
245+
:::image type="content" source="../media/how-to/synapse-studio-transpose-data-output.png" alt-text="Screenshot that shows completions for an automatic mini-batcher in Azure Synapse Analytics Studio." border="false":::
227246

228247
### Prompt engineering for translation
229248

@@ -260,9 +279,9 @@ qa_df = spark.createDataFrame(
260279
display(completion.transform(qa_df))
261280
```
262281

263-
The following image shows example output when you prompt the GPT-3 model to create general-knowledge question answering:
282+
The following image shows example output for general-knowledge question answering:
264283

265-
:::image type="content" source="../media/how-to/synapse-studio-question-answer-output.png" alt-text="Screenshot that shows completions for prompting the GPT-3 model to create general-knowledge question answering in Azure Synapse Analytics Studio." border="false":::
284+
:::image type="content" source="../media/how-to/synapse-studio-question-answer-output.png" alt-text="Screenshot that shows completions for general-knowledge question answering in Azure Synapse Analytics Studio." border="false":::
266285

267286
## Next steps
268287

0 commit comments

Comments
 (0)