Skip to content

Commit edcf132

Browse files
committed
refresh code, steps, output, add images
1 parent 7e25d36 commit edcf132

6 files changed

+77
-30
lines changed

articles/ai-services/openai/how-to/integrate-synapseml.md

Lines changed: 77 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -23,46 +23,84 @@ Azure OpenAI can be used to solve a large number of natural language tasks throu
2323
- An Azure subscription. <a href="https://azure.microsoft.com/free/cognitive-services" target="_blank">Create one for free</a>.
2424
- Access granted to Azure OpenAI in the desired Azure subscription.
2525
- An Azure OpenAI resource. [Create a resource](create-resource.md?pivots=web-portal#create-a-resource).
26-
- An Apache Spark cluster with SynapseML installed. Create a [serverless Apache Spark pool](../../../synapse-analytics/get-started-analyze-spark.md#create-a-serverless-apache-spark-pool)
26+
- An Apache Spark cluster with SynapseML installed.
27+
- Create a [serverless Apache Spark pool](../../../synapse-analytics/get-started-analyze-spark.md#create-a-serverless-apache-spark-pool).
28+
- To install SynapseML for your Apache Spark cluster, see [Install SynapseML](#install-synapseml).
2729

2830
> [!NOTE]
2931
> Currently, you must submit an application to access Azure OpenAI Service. To apply for access, complete <a href="https://aka.ms/oai/access" target="_blank">this form</a>. If you need assistance, open an issue on this repo to contact Microsoft.
3032
3133
Microsoft recommends that you [create an Azure Synapse workspace](../../../synapse-analytics/get-started-create-workspace.md). However, you can also use Azure Databricks, Azure HDInsight, Spark on Kubernetes, or the Python environment with the `pyspark` package.
3234

33-
## Import example code as a notebook
35+
## Use example code as a notebook
3436

35-
To use the example code in this article with your Spark cluster, you have two options:
36-
- Create a notebook in your Spark platform and copy the code into this notebook to run the demo.
37-
- Download the notebook and import it into Azure Synapse.
37+
To use the example code in this article with your Apache Spark cluster, complete the following steps:
3838

39-
1. [Download this demo as a notebook](https://github.com/microsoft/SynapseML/blob/master/docs/Explore%20Algorithms/OpenAI/OpenAI.ipynb). During the download process, select **Raw**, and then save the file.
39+
1. Prepare a new or existing notebook.
40+
1. Connect your Apache Spark cluster with your notebook.
41+
1. Install SynapseML for your Apache Spark cluster in your notebook.
42+
1. Configure the notebook to work with your Azure OpenAI service resource.
43+
44+
### Step 1: Prepare your notebook
45+
46+
You can create a new notebook in your Apache Spark platform, or you can download an existing notebook and import it into Azure Synapse. You can add each snippet of example code in this article as a new cell in your notebook.
47+
48+
#### (Optional) Download demonstration notebook
49+
50+
As an option, you can download a demonstration notebook and connect it with your workspace.
51+
52+
1. Download [this demonstration notebook](https://github.com/microsoft/SynapseML/blob/master/docs/Explore%20Algorithms/OpenAI/OpenAI.ipynb). During the download process, select **Raw**, and then save the file.
4053

4154
1. Import the notebook [into the Synapse Workspace](../../../synapse-analytics/spark/apache-spark-development-using-notebooks.md#create-a-notebook), or if you're using Azure Databricks, import the notebook [into the Azure Databricks Workspace](/azure/databricks/notebooks/notebooks-manage#create-a-notebook).
4255

43-
1. Install SynapseML on your cluster. See the installation instructions for Azure Synapse at the bottom of [the SynapseML website](https://microsoft.github.io/SynapseML/). This task requires pasting another cell at the top of the notebook you imported.
56+
### Step 2: Connect your cluster
4457

45-
1. Connect your notebook to a cluster and follow along with editing and running the cells later in this article.
58+
When you have a notebook ready, connect or _attach_ your notebook to an Apache Spark cluster.
4659

47-
## Fill in your service information
60+
### Step 3: Install SynapseML
4861

49-
When the notebook is ready, you need to edit a few cells in your notebook to point to your service. Set the `resource_name`, `deployment_name`, `location`, and `key` variables to the corresponding values for your Azure OpenAI resource.
62+
To run the exercises, you need to install SynapseML on your Apache Spark cluster. You complete this task in a code cell at the top of your notebook. For more information about the installation process, see the link for Azure Synapse at the bottom of [the SynapseML website](https://microsoft.github.io/SynapseML/).
5063

51-
> [!IMPORTANT]
52-
> Remember to remove the key from your code when you're done, and never post it publicly. For production, use a secure way of storing and accessing your credentials like [Azure Key Vault](../../../key-vault/general/overview.md). For more information, see [Azure AI services security](../../security-features.md).
64+
To install SynapseML, create a new cell at the top of your notebook and run the following code:
65+
66+
```python
67+
%%configure -f
68+
{
69+
"name": "synapseml",
70+
"conf": {
71+
"spark.jars.packages": "com.microsoft.azure:synapseml_2.12:0.11.2-spark3.3",
72+
"spark.jars.repositories": "https://mmlspark.azureedge.net/maven",
73+
"spark.jars.excludes": "org.scala-lang:scala-reflect,org.apache.spark:spark-tags_2.12,org.scalactic:scalactic_2.12,org.scalatest:scalatest_2.12,com.fasterxml.jackson.core:jackson-databind",
74+
"spark.yarn.user.classpath.first": "true",
75+
"spark.sql.parquet.enableVectorizedReader": "false"
76+
}
77+
}
78+
```
79+
80+
The connection process can take several minutes.
81+
82+
### Step 4: Configure the notebook
83+
84+
After the top cell in your notebook, add a new cell to configure the notebook for your service by running the following code. Set the `resource_name`, `deployment_name`, `location`, and `key` variables to the corresponding values for your Azure OpenAI resource.
5385

5486
```python
5587
import os
5688

5789
# Replace the following values with your Azure OpenAI resource information
58-
resource_name = "RESOURCE_NAME" # The name of your Azure OpenAI resource.
59-
deployment_name = "DEPLOYMENT_NAME" # The name of your Azure OpenAI deployment.
60-
location = "RESOURCE_LOCATION" # The location or region ID for your resource.
61-
key = "RESOURCE_API_KEY" # The key for your resource.
90+
resource_name = "<RESOURCE_NAME>" # The name of your Azure OpenAI resource.
91+
deployment_name = "<DEPLOYMENT_NAME>" # The name of your Azure OpenAI deployment.
92+
location = "<RESOURCE_LOCATION>" # The location or region ID for your resource.
93+
key = "<RESOURCE_API_KEY>" # The key for your resource.
6294

6395
assert key is not None and resource_name is not None
6496
```
6597

98+
Now you're ready to start running the example code.
99+
100+
> [!IMPORTANT]
101+
> Remember to remove the key from your code when you're done, and never post it publicly. For production, use a secure way of storing and accessing your credentials like [Azure Key Vault](../../../key-vault/general/overview.md). For more information, see [Azure AI services security](../../security-features.md).
102+
103+
66104
## Create a dataset of prompts
67105

68106
The next step is to create a dataframe consisting of a series of rows, with one prompt per row.
@@ -110,21 +148,9 @@ display(completed_df.select(
110148
col("prompt"), col("error"), col("completions.choices.text").getItem(0).alias("text")))
111149
```
112150

113-
Your output should look something like the following example. Keep in mind that the completion text can vary so your output might look different.
114-
115-
```output
116-
prompt error text
117-
------------------------------------------------------------------------------------------------------------------------------------------------------
118-
Hello my name is undefined Makaveli
119-
I'm eighteen years old and I want to be a rapper when I grow up
120-
I love writing and making music
121-
I'm from Los Angeles, CA
122-
123-
The best code is code that's undefined understandable
124-
This is a subjective statement, and there is no definitive answer.
151+
The following image shows example output with completions for the transformed dataframe in Azure Synapse Analytics Studio. Keep in mind that completions text can vary so your output might look different.
125152

126-
SynapseML is undefined A machine learning algorithm that is able to learn how to predict the future outcome of events.
127-
```
153+
:::image type="content" source="../media/how-to/synapse-studio-transform-dataframe-output.png" alt-text="Screenshot that shows sample completions for the transformed dataframe in Azure Synapse Analytics Studio." border="false":::
128154

129155
## Explore other usage scenarios
130156

@@ -170,6 +196,10 @@ completed_batch_df = batch_completion.transform(batch_df).cache()
170196
display(completed_batch_df)
171197
```
172198

199+
The following image shows example output with completions for multiple prompts in a batch prompt request:
200+
201+
:::image type="content" source="../media/how-to/synapse-studio-request-batch-output.png" alt-text="Screenshot that shows completions for multiple prompts in a batch prompt request in Azure Synapse Analytics Studio." border="false":::
202+
173203
> [!NOTE]
174204
> There's currently a limit of 20 prompts in a single request and a limit of 2048 "tokens," or approximately 1500 words.
175205
@@ -191,6 +221,10 @@ completed_autobatch_df = (df
191221
display(completed_autobatch_df)
192222
```
193223

224+
The following image shows example output for an automatic mini-batcher that transposes data to row format:
225+
226+
:::image type="content" source="../media/how-to/synapse-studio-transpose-data-output.png" alt-text="Screenshot that shows completions for an automatic mini-batcher that transposes data to row format in Azure Synapse Analytics Studio." border="false":::
227+
194228
### Prompt engineering for translation
195229

196230
Azure OpenAI can solve many different natural language tasks through [prompt engineering](completions.md). In this example, you can prompt for language translation:
@@ -206,6 +240,10 @@ translate_df = spark.createDataFrame(
206240
display(completion.transform(translate_df))
207241
```
208242

243+
The following image shows example output for language translation prompts:
244+
245+
:::image type="content" source="../media/how-to/synapse-studio-language-translation-output.png" alt-text="Screenshot that shows completions for language translation prompts in Azure Synapse Analytics Studio." border="false":::
246+
209247
### Prompt for question answering
210248

211249
Azure OpenAI also supports prompting the GPT-3 model for general-knowledge question answering:
@@ -221,3 +259,12 @@ qa_df = spark.createDataFrame(
221259

222260
display(completion.transform(qa_df))
223261
```
262+
263+
The following image shows example output when you prompt the GPT-3 model to create general-knowledge question answering:
264+
265+
:::image type="content" source="../media/how-to/synapse-studio-question-answer-output.png" alt-text="Screenshot that shows completions for prompting the GPT-3 model to create general-knowledge question answering in Azure Synapse Analytics Studio." border="false":::
266+
267+
## Next steps
268+
269+
- Learn how to work with the [GPT-35-Turbo and GPT-4 models](/azure/ai-services/openai/how-to/chatgpt?pivots=programming-language-chat-completions).
270+
- Learn more about the [Azure OpenAI Service models](../concepts/models.md).
124 KB
Loading
61.5 KB
Loading
134 KB
Loading
44.1 KB
Loading
83.3 KB
Loading

0 commit comments

Comments
 (0)