Skip to content

Commit 4972d28

Browse files
committed
checkpoint
1 parent 7101212 commit 4972d28

File tree

6 files changed

+14
-10
lines changed

6 files changed

+14
-10
lines changed
44.9 KB
Loading
38.2 KB
Loading
27.5 KB
Loading
5.22 KB
Loading
11.7 KB
Loading

articles/search/search-synapseml-cognitive-services.md

Lines changed: 14 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.service: azure-ai-search
1010
ms.custom:
1111
- ignite-2023
1212
ms.topic: tutorial
13-
ms.date: 04/22/2024
13+
ms.date: 01/23/2025
1414
---
1515

1616
# Tutorial: Index large data from Apache Spark using SynapseML and Azure AI Search
@@ -37,14 +37,14 @@ You need the `synapseml` library and several Azure resources. If possible, use t
3737

3838
+ [SynapseML package](https://microsoft.github.io/SynapseML/docs/Get%20Started/Install%20SynapseML/#python) <sup>1</sup>
3939
+ [Azure AI Search](search-create-service-portal.md) (any tier) <sup>2</sup>
40-
+ [Azure AI services](/azure/ai-services/multi-service-resource?pivots=azportal) (any tier) <sup>3</sup>
40+
+ [Azure AI multi-service account](/azure/ai-services/multi-service-resource?pivots=azportal) (any tier) <sup>3</sup>
4141
+ [Azure Databricks](/azure/databricks/scenarios/quickstart-create-databricks-workspace-portal?tabs=azure-portal) (any tier) <sup>4</sup>
4242

4343
<sup>1</sup> This link resolves to a tutorial for loading the package.
4444

4545
<sup>2</sup> You can use the free search tier to index the sample data, but [choose a higher tier](search-sku-tier.md) if your data volumes are large. For billable tiers, provide the [search API key](search-security-api-keys.md#find-existing-keys) in the [Set up dependencies](#step-2-set-up-dependencies) step further on.
4646

47-
<sup>3</sup> This tutorial uses Azure AI Document Intelligence and Azure AI Translator. In the instructions that follow, provide a [multi-service](/azure/ai-services/multi-service-resource?pivots=azportal) key and the region. The same key works for both services.
47+
<sup>3</sup> This tutorial uses Azure AI Document Intelligence and Azure AI Translator. In the instructions that follow, provide a [multi-service account](/azure/ai-services/multi-service-resource?pivots=azportal) key and the region. The same key works for both services.
4848

4949
<sup>4</sup> In this tutorial, Azure Databricks provides the Spark computing platform. We used the [portal instructions](/azure/databricks/scenarios/quickstart-create-databricks-workspace-portal?tabs=azure-portal) to set up the workspace.
5050

@@ -63,6 +63,10 @@ In this section, create a cluster, install the `synapseml` library, and create a
6363

6464
1. Accept the default configuration. It takes several minutes to create the cluster.
6565

66+
1. Verify the cluster is operational and running. A green dot by the cluster name confirms its status.
67+
68+
:::image type="content" source="media/search-synapseml-cognitive-services/cluster-green-dot.png" alt-text="Screenshot of a Data Bricks compute page with a green dot by the cluster name.":::
69+
6670
1. Install the `synapseml` library after the cluster is created:
6771

6872
1. Select **Libraries** from the tabs at the top of the cluster's page.
@@ -73,7 +77,7 @@ In this section, create a cluster, install the `synapseml` library, and create a
7377

7478
1. Select **Maven**.
7579

76-
1. In Coordinates, enter `com.microsoft.azure:synapseml_2.12:1.0.4`
80+
1. In Coordinates, search for or type `com.microsoft.azure:synapseml_2.12:1.0.9`
7781

7882
1. Select **Install**.
7983

@@ -85,15 +89,15 @@ In this section, create a cluster, install the `synapseml` library, and create a
8589

8690
1. Give the notebook a name, select **Python** as the default language, and select the cluster that has the `synapseml` library.
8791

88-
1. Create seven consecutive cells. Paste code into each one.
92+
1. Create seven consecutive cells. You use these to paste in code in the following sections.
8993

9094
:::image type="content" source="media/search-synapseml-cognitive-services/create-seven-cells.png" alt-text="Screenshot of the notebook with placeholder cells." border="true":::
9195

9296
## Step 2: Set up dependencies
9397

9498
Paste the following code into the first cell of your notebook.
9599

96-
Replace the placeholders with endpoints and access keys for each resource. Provide a name for a new search index. No other modifications are required, so run the code when you're ready.
100+
Replace the placeholders with endpoints and access keys for each resource. Provide a name for a new search index that's created for you. No other modifications are required, so run the code when you're ready.
97101

98102
This code imports multiple packages and sets up access to the Azure resources used in this workflow.
99103

@@ -103,12 +107,12 @@ from pyspark.sql.functions import udf, trim, split, explode, col, monotonically_
103107
from pyspark.sql.types import StringType
104108
from synapse.ml.core.spark import FluentAPI
105109

106-
cognitive_services_key = "placeholder-cognitive-services-multi-service-key"
107-
cognitive_services_region = "placeholder-cognitive-services-region"
110+
cognitive_services_key = "placeholder-azure-ai-services-multi-service-key"
111+
cognitive_services_region = "placeholder-azure-ai-services-region"
108112

109113
search_service = "placeholder-search-service-name"
110-
search_key = "placeholder-search-service-api-key"
111-
search_index = "placeholder-search-index-name"
114+
search_key = "placeholder-search-service-admin-api-key"
115+
search_index = "placeholder-for-new-search-index-name"
112116
```
113117

114118
## Step 3: Load data into Spark

0 commit comments

Comments
 (0)