Skip to content

Commit d2ac80f

Browse files
Merge pull request #273013 from HeidiSteen/heidist-vectors
[azure search] Content freshness: synapse, mvc tutorial. Archive synonym example
2 parents 1a4460a + a8fac04 commit d2ac80f

9 files changed

+55
-237
lines changed

articles/search/.openpublishing.redirection.search.json

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
{
22
"redirections": [
3+
{
4+
"source_path_from_root": "/articles/search/search-synonyms-tutorial-sdk.md",
5+
"redirect_url": "https://github.com/Azure/azure-sdk-for-net/blob/main/sdk/search/Azure.Search.Documents/samples/Sample02_Service.md#create-a-synonym-map",
6+
"redirect_document_id": false
7+
},
38
{
49
"source_path_from_root": "/articles/search/search-case-studies.md",
510
"redirect_url": "https://azure.microsoft.com/case-studies",

articles/search/TOC.yml

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -346,13 +346,15 @@
346346
href: semantic-how-to-query-request.md
347347
- name: Typeahead query
348348
href: search-add-autocomplete-suggestions.md
349-
- name: Use simple syntax (examples)
349+
- name: Quety examples (simple syntax)
350350
href: search-query-simple-examples.md
351-
- name: Add spell check to queries
351+
- name: Add spell check
352352
href: speller-how-to-add.md
353-
- name: Configure a suggester for typeahead
353+
- name: Add synonyms
354+
href: search-synonyms.md
355+
- name: Add a suggester for typeahead
354356
href: index-add-suggesters.md
355-
- name: Design a multi-language index
357+
- name: Design a multilingual index
356358
href: search-language-support.md
357359
- name: Model complex data types
358360
href: search-howto-complex-data-types.md
@@ -366,12 +368,6 @@
366368
href: index-add-language-analyzers.md
367369
- name: Add a custom analyzer
368370
href: index-add-custom-analyzers.md
369-
- name: Synonyms
370-
items:
371-
- name: Add synonyms
372-
href: search-synonyms.md
373-
- name: Synonyms C# example
374-
href: search-synonyms-tutorial-sdk.md
375371
- name: Filters
376372
items:
377373
- name: Filters in text queries
-12.4 KB
Loading
37.8 KB
Loading

articles/search/samples-dotnet.md

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,6 @@ Code samples from the Azure AI Search team demonstrate features and workflows. A
5858
| [multiple-data-sources](https://github.com/Azure-Samples/azure-search-dotnet-scale/tree/main/multiple-data-sources) | [Tutorial: Index from multiple data sources](tutorial-multiple-data-sources.md). | Merges content from two data sources into one search index.
5959
| [Optimize-data-indexing](https://github.com/Azure-Samples/azure-search-dotnet-scale/tree/main/optimize-data-indexing) | [Tutorial: Optimize indexing with the push API](tutorial-optimize-indexing-push-api.md).| Demonstrates optimization techniques for pushing data into a search index. |
6060
| [DotNetHowTo](https://github.com/Azure-Samples/search-dotnet-getting-started/tree/master/DotNetHowTo) | [How to use the .NET client library](search-howto-dotnet-sdk.md) | Steps through the basic workflow, but in more detail and with discussion of API usage. |
61-
| [DotNetHowToSynonyms](https://github.com/Azure-Samples/search-dotnet-getting-started/tree/master/DotNetHowToSynonyms) | [Example: Add synonyms in C#](search-synonyms-tutorial-sdk.md) | Synonym lists are used for query expansion, providing matchable terms that are external to an index. |
6261
| [DotNetToIndexers](https://github.com/Azure-Samples/search-dotnet-getting-started/tree/master/DotNetHowToIndexers) | [Tutorial: Index Azure SQL data](search-indexer-tutorial.md) | Shows how to configure an Azure SQL indexer that has a schedule, field mappings, and parameters. |
6362
| [DotNetHowToEncryptionUsingCMK](https://github.com/Azure-Samples/search-dotnet-getting-started/tree/master/DotNetHowToEncryptionUsingCMK) | [How to configure customer-managed keys for data encryption](search-security-manage-encryption-keys.md) | Shows how to create objects that are encrypted with a Customer Key. |
6463
| [DotNetVectorDemo](https://github.com/Azure/azure-search-vector-samples/tree/main/demo-dotnet/DotNetVectorDemo) | [readme](https://github.com/Azure/azure-search-vector-samples/tree/main/demo-dotnet/DotNetVectorDemo/readme.md) | Create, load, and query a vector index. |

articles/search/search-synapseml-cognitive-services.md

Lines changed: 29 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: 'Tutorial: Index at scale (Spark)'
33
titleSuffix: Azure AI Search
4-
description: Search big data from Apache Spark that's been transformed by SynapseML. You'll load invoices into data frames, apply machine learning, and then send output to a generated search index.
4+
description: Search big data from Apache Spark that's been transformed by SynapseML. Load invoices into data frames, apply machine learning, and then send output to a generated search index.
55

66
manager: nitinme
77
author: HeidiSteen
@@ -10,12 +10,12 @@ ms.service: cognitive-search
1010
ms.custom:
1111
- ignite-2023
1212
ms.topic: tutorial
13-
ms.date: 02/01/2023
13+
ms.date: 04/22/2024
1414
---
1515

1616
# Tutorial: Index large data from Apache Spark using SynapseML and Azure AI Search
1717

18-
In this Azure AI Search tutorial, learn how to index and query large data loaded from a Spark cluster. You'll set up a Jupyter Notebook that performs the following actions:
18+
In this Azure AI Search tutorial, learn how to index and query large data loaded from a Spark cluster. Set up a Jupyter Notebook that performs the following actions:
1919

2020
> [!div class="checklist"]
2121
> + Load various forms (invoices) into a data frame in an Apache Spark session
@@ -24,7 +24,7 @@ In this Azure AI Search tutorial, learn how to index and query large data loaded
2424
> + Write the output to a search index hosted in Azure AI Search
2525
> + Explore and query over the content you created
2626
27-
This tutorial takes a dependency on [SynapseML](https://www.microsoft.com/research/blog/synapseml-a-simple-multilingual-and-massively-parallel-machine-learning-library/), an open source library that supports massively parallel machine learning over big data. In SynapseML, search indexing and machine learning are exposed through *transformers* that perform specialized tasks. Transformers tap into a wide range of AI capabilities. In this exercise, you'll use the **AzureSearchWriter** APIs for analysis and AI enrichment.
27+
This tutorial takes a dependency on [SynapseML](https://www.microsoft.com/research/blog/synapseml-a-simple-multilingual-and-massively-parallel-machine-learning-library/), an open source library that supports massively parallel machine learning over big data. In SynapseML, search indexing and machine learning are exposed through *transformers* that perform specialized tasks. Transformers tap into a wide range of AI capabilities. In this exercise, use the **AzureSearchWriter** APIs for analysis and AI enrichment.
2828

2929
Although Azure AI Search has native [AI enrichment](cognitive-search-concept-intro.md), this tutorial shows you how to access AI capabilities outside of Azure AI Search. By using SynapseML instead of indexers or skills, you're not subject to data limits or other constraints associated with those objects.
3030

@@ -33,7 +33,7 @@ Although Azure AI Search has native [AI enrichment](cognitive-search-concept-int
3333
3434
## Prerequisites
3535

36-
You'll need the `synapseml` library and several Azure resources. If possible, use the same subscription and region for your Azure resources and put everything into one resource group for simple cleanup later. The following links are for portal installs. The sample data is imported from a public site.
36+
You need the `synapseml` library and several Azure resources. If possible, use the same subscription and region for your Azure resources and put everything into one resource group for simple cleanup later. The following links are for portal installs. The sample data is imported from a public site.
3737

3838
+ [SynapseML package](https://microsoft.github.io/SynapseML/docs/Get%20Started/Install%20SynapseML/#python) <sup>1</sup>
3939
+ [Azure AI Search](search-create-service-portal.md) (any tier) <sup>2</sup>
@@ -42,38 +42,38 @@ You'll need the `synapseml` library and several Azure resources. If possible, us
4242

4343
<sup>1</sup> This link resolves to a tutorial for loading the package.
4444

45-
<sup>2</sup> You can use the free search tier to index the sample data, but [choose a higher tier](search-sku-tier.md) if your data volumes are large. For non-free tiers, you'll need to provide the [search API key](search-security-api-keys.md#find-existing-keys) in the [Set up dependencies](#2---set-up-dependencies) step further on.
45+
<sup>2</sup> You can use the free search tier to index the sample data, but [choose a higher tier](search-sku-tier.md) if your data volumes are large. For billable tiers, provide the [search API key](search-security-api-keys.md#find-existing-keys) in the [Set up dependencies](#step-2-set-up-dependencies) step further on.
4646

47-
<sup>3</sup> This tutorial uses Azure AI Document Intelligence and Azure AI Translator. In the instructions that follow, you'll provide a [multi-service key](../ai-services/multi-service-resource.md?pivots=azportal#get-the-keys-for-your-resource) and the region, and it will work for both services.
47+
<sup>3</sup> This tutorial uses Azure AI Document Intelligence and Azure AI Translator. In the instructions that follow, provide a [multi-service key](../ai-services/multi-service-resource.md?pivots=azportal#get-the-keys-for-your-resource) and the region. The same key works for both services.
4848

49-
<sup>4</sup> In this tutorial, Azure Databricks provides the Spark computing platform and the instructions in the link will tell you how to set up the workspace. For this tutorial, we used the portal steps in "Create a workspace".
49+
<sup>4</sup> In this tutorial, Azure Databricks provides the Spark computing platform. We used the [portal instructions](/azure/databricks/scenarios/quickstart-create-databricks-workspace-portal?tabs=azure-portal) to set up the workspace.
5050

5151
> [!NOTE]
5252
> All of the above Azure resources support security features in the Microsoft Identity platform. For simplicity, this tutorial assumes key-based authentication, using endpoints and keys copied from the portal pages of each service. If you implement this workflow in a production environment, or share the solution with others, remember to replace hard-coded keys with integrated security or encrypted keys.
5353
54-
## 1 - Create a Spark cluster and notebook
54+
## Step 1: Create a Spark cluster and notebook
5555

56-
In this section, you'll create a cluster, install the `synapseml` library, and create a notebook to run the code.
56+
In this section, create a cluster, install the `synapseml` library, and create a notebook to run the code.
5757

5858
1. In Azure portal, find your Azure Databricks workspace and select **Launch workspace**.
5959

6060
1. On the left menu, select **Compute**.
6161

62-
1. Select **Create cluster**.
62+
1. Select **Create compute**.
6363

64-
1. Give the cluster a name, accept the default configuration, and then create the cluster. It takes several minutes to create the cluster.
64+
1. Accept the default configuration. It takes several minutes to create the cluster.
6565

6666
1. Install the `synapseml` library after the cluster is created:
6767

68-
1. Select **Library** from the tabs at the top of the cluster's page.
68+
1. Select **Libraries** from the tabs at the top of the cluster's page.
6969

7070
1. Select **Install new**.
7171

7272
:::image type="content" source="media/search-synapseml-cognitive-services/install-library.png" alt-text="Screenshot of the Install New command." border="true":::
7373

7474
1. Select **Maven**.
7575

76-
1. In Coordinates, enter `com.microsoft.azure:synapseml_2.12:0.10.0`
76+
1. In Coordinates, enter `com.microsoft.azure:synapseml_2.12:1.0.4`
7777

7878
1. Select **Install**.
7979

@@ -85,13 +85,15 @@ In this section, you'll create a cluster, install the `synapseml` library, and c
8585

8686
1. Give the notebook a name, select **Python** as the default language, and select the cluster that has the `synapseml` library.
8787

88-
1. Create seven consecutive cells. You'll paste code into each one.
88+
1. Create seven consecutive cells. Paste code into each one.
8989

9090
:::image type="content" source="media/search-synapseml-cognitive-services/create-seven-cells.png" alt-text="Screenshot of the notebook with placeholder cells." border="true":::
9191

92-
## 2 - Set up dependencies
92+
## Step 2: Set up dependencies
9393

94-
Paste the following code into the first cell of your notebook. Replace the placeholders with endpoints and access keys for each resource. No other modifications are required, so run the code when you're ready.
94+
Paste the following code into the first cell of your notebook.
95+
96+
Replace the placeholders with endpoints and access keys for each resource. Provide a name for a new search index. No other modifications are required, so run the code when you're ready.
9597

9698
This code imports multiple packages and sets up access to the Azure resources used in this workflow.
9799

@@ -109,11 +111,11 @@ search_key = "placeholder-search-service-api-key"
109111
search_index = "placeholder-search-index-name"
110112
```
111113

112-
## 3 - Load data into Spark
114+
## Step 3: Load data into Spark
113115

114116
Paste the following code into the second cell. No modifications are required, so run the code when you're ready.
115117

116-
This code loads a few external files from an Azure storage account that's used for demo purposes. The files are various invoices, and they're read into a data frame.
118+
This code loads a few external files from an Azure storage account. The files are various invoices, and they're read into a data frame.
117119

118120
```python
119121
def blob_to_url(blob):
@@ -135,11 +137,11 @@ df2 = (spark.read.format("binaryFile")
135137
display(df2)
136138
```
137139

138-
## 4 - Add document intelligence
140+
## Step 4: Add document intelligence
139141

140142
Paste the following code into the third cell. No modifications are required, so run the code when you're ready.
141143

142-
This code loads the [AnalyzeInvoices transformer](https://mmlspark.blob.core.windows.net/docs/0.11.2/pyspark/synapse.ml.cognitive.form.html#module-synapse.ml.cognitive.form.AnalyzeInvoices) and passes a reference to the data frame containing the invoices. It calls the pre-built [invoice model](../ai-services/document-intelligence/concept-invoice.md) of Azure AI Document Intelligence to extract information from the invoices.
144+
This code loads the [AnalyzeInvoices transformer](https://mmlspark.blob.core.windows.net/docs/0.11.2/pyspark/synapse.ml.cognitive.form.html#module-synapse.ml.cognitive.form.AnalyzeInvoices) and passes a reference to the data frame containing the invoices. It calls the prebuilt [invoice model](../ai-services/document-intelligence/concept-invoice.md) of Azure AI Document Intelligence to extract information from the invoices.
143145

144146
```python
145147
from synapse.ml.cognitive import AnalyzeInvoices
@@ -161,7 +163,7 @@ The output from this step should look similar to the next screenshot. Notice how
161163

162164
:::image type="content" source="media/search-synapseml-cognitive-services/analyze-forms-output.png" alt-text="Screenshot of the AnalyzeInvoices output." border="true":::
163165

164-
## 5 - Restructure document intelligence output
166+
## Step 5: Restructure document intelligence output
165167

166168
Paste the following code into the fourth cell and run it. No modifications are required.
167169

@@ -183,11 +185,11 @@ itemized_df = (FormOntologyLearner()
183185
display(itemized_df)
184186
```
185187

186-
Notice how this transformation recasts the nested fields into a table, which enables the next two transformations. This screenshot is trimmed for brevity. If you're following along in your own notebook, you'll have 19 columns and 26 rows.
188+
Notice how this transformation recasts the nested fields into a table, which enables the next two transformations. This screenshot is trimmed for brevity. If you're following along in your own notebook, you have 19 columns and 26 rows.
187189

188190
:::image type="content" source="media/search-synapseml-cognitive-services/form-ontology-learner-output.png" alt-text="Screenshot of the FormOntologyLearner output." border="true":::
189191

190-
## 6 - Add translations
192+
## Step 6: Add translations
191193

192194
Paste the following code into the fifth cell. No modifications are required, so run the code when you're ready.
193195

@@ -217,11 +219,11 @@ display(translated_df)
217219
>
218220
> :::image type="content" source="media/search-synapseml-cognitive-services/translated-strings.png" alt-text="Screenshot of table output, showing the Translations column." border="true":::
219221
220-
## 7 - Add a search index with AzureSearchWriter
222+
## Step 7: Add a search index with AzureSearchWriter
221223

222224
Paste the following code in the sixth cell and then run it. No modifications are required.
223225

224-
This code loads [AzureSearchWriter](https://microsoft.github.io/SynapseML/docs/Explore%20Algorithms/AI%20Services/Overview/#azure-cognitive-search-sample). It consumes a tabular dataset and infers a search index schema that defines one field for each column. The translations structure is an array, so it's articulated in the index as a complex collection with subfields for each language translation. The generated index will have a document key and use the default values for fields created using the [Create Index REST API](/rest/api/searchservice/create-index).
226+
This code loads [AzureSearchWriter](https://microsoft.github.io/SynapseML/docs/Explore%20Algorithms/AI%20Services/Overview/#azure-cognitive-search-sample). It consumes a tabular dataset and infers a search index schema that defines one field for each column. Because the translations structure is an array, it's articulated in the index as a complex collection with subfields for each language translation. The generated index has a document key and use the default values for fields created using the [Create Index REST API](/rest/api/searchservice/create-index).
225227

226228
```python
227229
from synapse.ml.cognitive import *
@@ -242,7 +244,7 @@ You can check the search service pages in Azure portal to explore the index defi
242244
> [!NOTE]
243245
> If you can't use default search index, you can provide an external custom definition in JSON, passing its URI as a string in the "indexJson" property. Generate the default index first so that you know which fields to specify, and then follow with customized properties if you need specific analyzers, for example.
244246
245-
## 8 - Query the index
247+
## Step 8: Query the index
246248

247249
Paste the following code into the seventh cell and then run it. No modifications are required, except that you might want to vary the syntax or try more examples to further explore your content:
248250

0 commit comments

Comments
 (0)