Skip to content

Commit de51ed2

Browse files
committed
Restored cog-search-demo content
1 parent 099589c commit de51ed2

File tree

1 file changed

+59
-46
lines changed

1 file changed

+59
-46
lines changed
Lines changed: 59 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
11
---
2-
title: "Quickstart: Text translation and entity recognition"
2+
title: "Quickstart: Create a skillset in the Azure portal"
33
titleSuffix: Azure Cognitive Search
4-
description: Use the Import Data wizard and AI cognitive skills to detect language, translate text, and recognize entities. The new fields created through AI become searchable text in an Azure Cognitive Search index.
4+
description: In this portal quickstart, use the Import Data wizard to add cognitive skills to an indexing pipeline to generate searchable text from images and unstructured documents. Skills in this quickstart include OCR, image analysis, and natural language processing.
5+
56
manager: nitinme
67
author: HeidiSteen
78
ms.author: heidist
89
ms.service: cognitive-search
910
ms.topic: quickstart
1011
ms.date: 05/31/2022
11-
ms.custom: mode-ui
1212
---
13-
# Quickstart: Translate text and recognize entities using the Import data wizard
13+
# Quickstart: Create an Azure Cognitive Search skillset in the Azure portal
1414

15-
Learn how AI enrichment in Azure Cognitive Search adds language detection, text translation, and entity recognition to create searchable content in a search index.
15+
Learn how AI enrichment in Azure Cognitive Search adds Optical Character Recognition (OCR), image analysis, language detection, text translation, and entity recognition to create searchable content in a search index.
1616

17-
In this quickstart, you'll run the **Import data** wizard to analyze French and Spanish descriptions of several national museums located in Spain. Output is a searchable index containing translated text and entities, queryable in the portal using [Search explorer](search-explorer.md).
17+
In this quickstart, you'll run the **Import data** wizard to apply skills that transform and enrich content during indexing. Output is a searchable index containing image text, translated text, and entities. Enriched content is queryable in the portal using [Search explorer](search-explorer.md).
1818

1919
To prepare, you'll create a few resources and upload sample files before running the wizard.
2020

@@ -28,11 +28,7 @@ Before you begin, have the following prerequisites in place:
2828

2929
+ Azure Cognitive Search. [Create a service](search-create-service-portal.md) or [find an existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices). You can use a free service for this quickstart.
3030

31-
+ Azure Storage account with Blob Storage. [Create a storage account](../storage/common/storage-account-create.md?tabs=azure-portal) or [find an existing account](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Storage%2storageAccounts/).
32-
33-
+ Choose the same subscription if you want the wizard to find your storage account and set up the connection.
34-
+ Choose the same region as Azure Cognitive Search to avoid bandwidth charges.
35-
+ Choose the StorageV2 (general purpose V2).
31+
+ Azure Storage account with Blob Storage.
3632

3733
> [!NOTE]
3834
> This quickstart uses [Cognitive Services](https://azure.microsoft.com/services/cognitive-services/) for the AI. Because the workload is so small, Cognitive Services is tapped behind the scenes for free processing for up to 20 transactions. You can complete this exercise without having to create a Cognitive Services resource.
@@ -41,56 +37,63 @@ Before you begin, have the following prerequisites in place:
4137

4238
In the following steps, set up a blob container in Azure Storage to store heterogeneous content files.
4339

44-
1. [Download sample data](https://github.com/Azure-Samples/azure-search-sample-data) from GitHub. There are multiple data sets. Use the files in the **spanish-museums** folder for this quickstart.
40+
1. [Download sample data](https://1drv.ms/f/s!As7Oy81M_gVPa-LCb5lC_3hbS-4) consisting of a small file set of different types. Unzip the files.
41+
42+
1. Sign in to the [Azure portal](https://portal.azure.com/) with your Azure account.
43+
44+
1. [Create an Azure Storage account](../storage/common/storage-account-create.md?tabs=azure-portal) or [find an existing account](https://ms.portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Storage%2storageAccounts/).
4545

46-
1. Upload the sample data to a blob container.
46+
+ Choose the same region as Azure Cognitive Search to avoid bandwidth charges.
4747

48-
1. Sign in to the [Azure portal](https://portal.azure.com/) and find your storage account.
49-
1. In the left navigation pane, select **Containers**.
50-
1. [Create a container](../storage/blobs/storage-quickstart-blobs-portal.md#create-a-container) named "spanish-museums". Use the default public access level.
51-
1. In the "spanish-museums" container, select **Upload** to upload the files from your local **spanish-museums** folder.
48+
+ Choose the StorageV2 (general purpose V2).
5249

53-
You should have 10 files containing French and Spanish descriptions of national museums located in Spain.
50+
1. In Azure portal, open your Azure Storage page and create a container. You can use the default public access level.
5451

55-
:::image type="content" source="media/cognitive-search-quickstart-blob/museums-container.png" alt-text="List of docx files in a blob container" border="true":::
52+
1. In Container, click **Upload** to upload the sample files you downloaded in the first step. Notice that you have a wide range of content types, including images and application files that are not full text searchable in their native formats.
53+
54+
:::image type="content" source="media/cognitive-search-quickstart-blob/sample-data.png" alt-text="Source files in Azure Blob Storage" border="false":::
5655

5756
You are now ready to move on the Import data wizard.
5857

5958
## Run the Import data wizard
6059

6160
1. Sign in to the [Azure portal](https://portal.azure.com/) with your Azure account.
6261

63-
1. [Find your search service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Storage%2storageAccounts/) and on the Overview page, click **Import data** on the command bar to set up cognitive enrichment in four steps.
62+
1. [Find your search service](https://ms.portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Storage%2storageAccounts/) and on the Overview page, click **Import data** on the command bar to set up cognitive enrichment in four steps.
6463

6564
:::image type="content" source="media/search-import-data-portal/import-data-cmd.png" alt-text="Screenshot of the Import data command" border="true":::
6665

6766
### Step 1 - Create a data source
6867

6968
1. In **Connect to your data**, choose **Azure Blob Storage**. Choose an existing connection to the storage account and container you created. Give the data source a name, and use default values for the rest.
7069

71-
:::image type="content" source="media/cognitive-search-quickstart-blob/connect-to-spanish-museums.png" alt-text="Azure blob configuration" border="true":::
70+
:::image type="content" source="media/cognitive-search-quickstart-blob/blob-datasource.png" alt-text="Azure blob configuration" border="false":::
71+
72+
Continue to the next page.
7273

7374
### Step 2 - Add cognitive skills
7475

75-
Next, configure AI enrichment to invoke language detection, text translation, and entity recognition.
76+
Next, configure AI enrichment to invoke OCR, image analysis, and natural language processing.
7677

77-
1. For this quickstart, you can use the **Free** Cognitive Services resource. The sample data consists of 10 files, so the daily, per-indexer allotment of 20 free transactions on Cognitive Services is sufficient for this quickstart.
78+
1. For this quickstart, we are using the **Free** Cognitive Services resource. The sample data consists of 14 files, so the free allotment of 20 transaction on Cognitive Services is sufficient for this quickstart.
7879

79-
:::image type="content" source="media/cognitive-search-quickstart-blob/free-enrichments.png" alt-text="Attach free Cognitive Services processing" border="true":::
80+
:::image type="content" source="media/cognitive-search-quickstart-blob/cog-search-attach.png" alt-text="Attach Cognitive Services attach base service" border="false":::
8081

81-
1. In the same page, expand **Add enrichments** and make five selections:
82+
1. Expand **Add enrichments** and make four selections.
8283

83-
Choose entity recognition (people, organizations, locations)
84+
Enable OCR to add image analysis skills to wizard page.
8485

85-
Choose language detection and text translation
86+
Set granularity to Pages to break up text into smaller chunks. Several text skills are limited to 5-KB inputs.
8687

87-
:::image type="content" source="media/cognitive-search-quickstart-blob/select-entity-lang-enrichments.png" alt-text="Attach Cognitive Services select services for skillset" border="true":::
88+
Choose entity recognition (people, organizations, locations) and image analysis skills.
8889

89-
In blobs, the "Content" field contains the content of the file. In the sample data, the content is multiple paragraphs about a given museum, in either French or Spanish. The "Granularity" is the field itself. Some skills work better on smaller chunks of text, but for the skills in this quickstart, field granularity is sufficient.
90+
:::image type="content" source="media/cognitive-search-quickstart-blob/skillset.png" alt-text="Attach Cognitive Services select services for skillset" border="false":::
91+
92+
Continue to the next page.
9093

9194
### Step 3 - Configure the index
9295

93-
An index contains your searchable content and the **Import data** wizard can usually infer the schema for you by sampling the data. In this step, review the generated schema and potentially revise any settings. Below is the default schema created for the demo data set.
96+
An index contains your searchable content and the **Import data** wizard can usually create the schema for you by sampling the data source. In this step, review the generated schema and potentially revise any settings. Below is the default schema created for the demo Blob data set.
9497

9598
For this quickstart, the wizard does a good job setting reasonable defaults:
9699

@@ -100,27 +103,27 @@ For this quickstart, the wizard does a good job setting reasonable defaults:
100103

101104
+ Default attributes are **Retrievable** and **Searchable**. **Searchable** allows full text search a field. **Retrievable** means field values can be returned in results. The wizard assumes you want these fields to be retrievable and searchable because you created them via a skillset.
102105

103-
+ Select the filterable checkbox for "Language". The wizard won't set the folder for you, but the ability to filter by language is useful in this demo given that there are multiple languages.
104-
105-
:::image type="content" source="media/cognitive-search-quickstart-blob/index-fields-lang-entities.png" alt-text="Index fields" border="true":::
106+
:::image type="content" source="media/cognitive-search-quickstart-blob/index-fields.png" alt-text="Index fields" border="false":::
106107

107-
Marking a field as **Retrievable** doesn't mean that the field *must* be present in the search results. You can precisely control search results composition by using the **$select** query parameter to specify which fields to include. For text-heavy fields like `content`, the **$select** parameter is your solution for shaping manageable search results to the human users of your application, while ensuring client code has access to all the information it needs via the **Retrievable** attribute.
108+
Marking a field as **Retrievable** does not mean that the field *must* be present in the search results. You can control search results composition by using the **$select** query parameter to specify which fields to include.
109+
110+
Continue to the next page.
108111

109112
### Step 4 - Configure the indexer
110113

111-
The indexer is a high-level resource that drives the indexing process. It specifies the data source name, a target index, and frequency of execution. The **Import data** wizard creates several objects, and of them is always an indexer that you can run repeatedly.
114+
The indexer drives the indexing process. It specifies the data source name, a target index, and frequency of execution. The **Import data** wizard creates several objects, including an indexer that you can reset and run repeatedly.
112115

113116
1. In the **Indexer** page, you can accept the default name and click the **Once** schedule option to run it immediately.
114117

115-
:::image type="content" source="media/cognitive-search-quickstart-blob/indexer-spanish-museum.png" alt-text="Indexer definition" border="true":::
118+
:::image type="content" source="media/cognitive-search-quickstart-blob/indexer-def.png" alt-text="Indexer definition" border="false":::
116119

117120
1. Click **Submit** to create and simultaneously run the indexer.
118121

119122
## Monitor status
120123

121-
Cognitive skills indexing takes longer to complete than typical text-based indexing. To monitor progress, go to the Overview page and select the **Indexers** tab in the middle of page.
124+
Cognitive skills indexing takes longer to complete than typical text-based indexing, especially OCR and image analysis. To monitor progress, go to the Overview page and click **Indexers** in the middle of page.
122125

123-
:::image type="content" source="media/cognitive-search-quickstart-blob/indexer-status-spanish-museums.png" alt-text="Indexer status" border="true":::
126+
:::image type="content" source="media/cognitive-search-quickstart-blob/indexer-notification.png" alt-text="Azure Cognitive Search notification" border="false":::
124127

125128
To check details about execution status, select an indexer from the list.
126129

@@ -132,30 +135,40 @@ After an index is created, you can run queries to return results. In the portal,
132135

133136
1. Select **Change Index** at the top to select the index you created.
134137

135-
1. In Query string, enter a search string to query the index, such as `search="picasso museum" &$select=people,organizations,locations,language,translated_text &$count=true &$filter=language eq 'fr'`, and then select **Search**.
136-
137-
:::image type="content" source="media/cognitive-search-quickstart-blob/search-explorer-query-string-spanish-museums.png" alt-text="Query string in search explorer" border="true":::
138+
1. Enter a search string to query the index, such as `search=Microsoft&$select=people,organizations,locations,imageTags`.
138139

139140
Results are returned as JSON, which can be verbose and hard to read, especially in large documents originating from Azure blobs. Some tips for searching in this tool include the following techniques:
140141

141142
+ Append `$select` to specify which fields to include in results.
142143
+ Use CTRL-F to search within the JSON for specific properties or terms.
143144

144-
:::image type="content" source="media/cognitive-search-quickstart-blob/search-explorer-results-spanish-museums.png" alt-text="Search explorer example" border="true":::
145-
146145
Query strings are case-sensitive so if you get an "unknown field" message, check **Fields** or **Index Definition (JSON)** to verify name and case.
147146

147+
:::image type="content" source="media/cognitive-search-quickstart-blob/search-explorer.png" alt-text="Search explorer example" border="false":::
148+
149+
## Takeaways
150+
151+
You've now created your first skillset and learned important concepts useful for prototyping an enriched search solution using your own data.
152+
153+
Some key concepts that we hope you picked up include the dependency on Azure data sources. A skillset is bound to an indexer, and indexers are Azure and source-specific. Although this quickstart uses Azure Blob Storage, other Azure data sources are possible. For more information, see [Indexers in Azure Cognitive Search](search-indexer-overview.md).
154+
155+
Another important concept is that skills operate over content types, and when working with heterogeneous content, some inputs will be skipped. Also, large files or fields might exceed the indexer limits of your service tier. It's normal to see warnings when these events occur.
156+
157+
Output is directed to a search index, and there is a mapping between name-value pairs created during indexing and individual fields in your index. Internally, the portal sets up [annotations](cognitive-search-concept-annotations-syntax.md) and defines a [skillset](cognitive-search-defining-skillset.md), establishing the order of operations and general flow. These steps are hidden in the portal, but when you start writing code, these concepts become important.
158+
159+
Finally, you learned that can verify content by querying the index. In the end, what Azure Cognitive Search provides is a searchable index, which you can query using either the [simple](/rest/api/searchservice/simple-query-syntax-in-azure-search) or [fully extended query syntax](/rest/api/searchservice/lucene-query-syntax-in-azure-search). An index containing enriched fields is like any other. If you want to incorporate standard or [custom analyzers](search-analyzers.md), [scoring profiles](/rest/api/searchservice/add-scoring-profiles-to-a-search-index), [synonyms](search-synonyms.md), [faceted navigation](search-faceted-navigation.md), geo-search, or any other Azure Cognitive Search feature, you can certainly do so.
160+
148161
## Clean up resources
149162

150163
When you're working in your own subscription, it's a good idea at the end of a project to identify whether you still need the resources you created. Resources left running can cost you money. You can delete resources individually or delete the resource group to delete the entire set of resources.
151164

152165
You can find and manage resources in the portal, using the **All resources** or **Resource groups** link in the left-navigation pane.
153166

154-
If you're using a free service, remember that you're limited to three indexes, indexers, and data sources. You can delete individual items in the portal to stay under the limit.
167+
If you are using a free service, remember that you are limited to three indexes, indexers, and data sources. You can delete individual items in the portal to stay under the limit.
155168

156169
## Next steps
157170

158-
Cognitive Search has other built-in skills that can be exercised in the Import data wizard. As a next step, try the next quickstart that creates a knowledge store.
171+
You can create skillsets using the portal, .NET SDK, or REST API. To further your knowledge, try the REST API using Postman and more sample data.
159172

160173
> [!div class="nextstepaction"]
161-
> [Quickstart: Create a knowledge store](knowledge-store-create-portal.md)
174+
> [Tutorial: Extract text and structure from JSON blobs using REST APIs ](cognitive-search-tutorial-blob.md)

0 commit comments

Comments
 (0)