You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
title: "Quickstart: Text translation and entity recognition"
2
+
title: "Quickstart: Create a skillset in the Azure portal"
3
3
titleSuffix: Azure Cognitive Search
4
-
description: Use the Import Data wizard and AI cognitive skills to detect language, translate text, and recognize entities. The new fields created through AI become searchable text in an Azure Cognitive Search index.
4
+
description: In this portal quickstart, use the Import Data wizard to add cognitive skills to an indexing pipeline to generate searchable text from images and unstructured documents. Skills in this quickstart include OCR, image analysis, and natural language processing.
5
+
5
6
manager: nitinme
6
7
author: HeidiSteen
7
8
ms.author: heidist
8
9
ms.service: cognitive-search
9
10
ms.topic: quickstart
10
11
ms.date: 05/31/2022
11
-
ms.custom: mode-ui
12
12
---
13
-
# Quickstart: Translate text and recognize entities using the Import data wizard
13
+
# Quickstart: Create an Azure Cognitive Search skillset in the Azure portal
14
14
15
-
Learn how AI enrichment in Azure Cognitive Search adds language detection, text translation, and entity recognition to create searchable content in a search index.
15
+
Learn how AI enrichment in Azure Cognitive Search adds Optical Character Recognition (OCR), image analysis, language detection, text translation, and entity recognition to create searchable content in a search index.
16
16
17
-
In this quickstart, you'll run the **Import data** wizard to analyze French and Spanish descriptions of several national museums located in Spain. Output is a searchable index containing translated text and entities, queryable in the portal using [Search explorer](search-explorer.md).
17
+
In this quickstart, you'll run the **Import data** wizard to apply skills that transform and enrich content during indexing. Output is a searchable index containing image text, translated text, and entities. Enriched content is queryable in the portal using [Search explorer](search-explorer.md).
18
18
19
19
To prepare, you'll create a few resources and upload sample files before running the wizard.
20
20
@@ -28,11 +28,7 @@ Before you begin, have the following prerequisites in place:
28
28
29
29
+ Azure Cognitive Search. [Create a service](search-create-service-portal.md) or [find an existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices). You can use a free service for this quickstart.
30
30
31
-
+ Azure Storage account with Blob Storage. [Create a storage account](../storage/common/storage-account-create.md?tabs=azure-portal) or [find an existing account](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Storage%2storageAccounts/).
32
-
33
-
+ Choose the same subscription if you want the wizard to find your storage account and set up the connection.
34
-
+ Choose the same region as Azure Cognitive Search to avoid bandwidth charges.
35
-
+ Choose the StorageV2 (general purpose V2).
31
+
+ Azure Storage account with Blob Storage.
36
32
37
33
> [!NOTE]
38
34
> This quickstart uses [Cognitive Services](https://azure.microsoft.com/services/cognitive-services/) for the AI. Because the workload is so small, Cognitive Services is tapped behind the scenes for free processing for up to 20 transactions. You can complete this exercise without having to create a Cognitive Services resource.
@@ -41,56 +37,63 @@ Before you begin, have the following prerequisites in place:
41
37
42
38
In the following steps, set up a blob container in Azure Storage to store heterogeneous content files.
43
39
44
-
1.[Download sample data](https://github.com/Azure-Samples/azure-search-sample-data) from GitHub. There are multiple data sets. Use the files in the **spanish-museums** folder for this quickstart.
40
+
1.[Download sample data](https://1drv.ms/f/s!As7Oy81M_gVPa-LCb5lC_3hbS-4) consisting of a small file set of different types. Unzip the files.
41
+
42
+
1. Sign in to the [Azure portal](https://portal.azure.com/) with your Azure account.
43
+
44
+
1.[Create an Azure Storage account](../storage/common/storage-account-create.md?tabs=azure-portal) or [find an existing account](https://ms.portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Storage%2storageAccounts/).
45
45
46
-
1. Upload the sample data to a blob container.
46
+
+ Choose the same region as Azure Cognitive Search to avoid bandwidth charges.
47
47
48
-
1. Sign in to the [Azure portal](https://portal.azure.com/) and find your storage account.
49
-
1. In the left navigation pane, select **Containers**.
50
-
1.[Create a container](../storage/blobs/storage-quickstart-blobs-portal.md#create-a-container) named "spanish-museums". Use the default public access level.
51
-
1. In the "spanish-museums" container, select **Upload** to upload the files from your local **spanish-museums** folder.
48
+
+ Choose the StorageV2 (general purpose V2).
52
49
53
-
You should have 10 files containing French and Spanish descriptions of national museums located in Spain.
50
+
1. In Azure portal, open your Azure Storage page and create a container. You can use the default public access level.
54
51
55
-
:::image type="content" source="media/cognitive-search-quickstart-blob/museums-container.png" alt-text="List of docx files in a blob container" border="true":::
52
+
1. In Container, click **Upload** to upload the sample files you downloaded in the first step. Notice that you have a wide range of content types, including images and application files that are not full text searchable in their native formats.
53
+
54
+
:::image type="content" source="media/cognitive-search-quickstart-blob/sample-data.png" alt-text="Source files in Azure Blob Storage" border="false":::
56
55
57
56
You are now ready to move on the Import data wizard.
58
57
59
58
## Run the Import data wizard
60
59
61
60
1. Sign in to the [Azure portal](https://portal.azure.com/) with your Azure account.
62
61
63
-
1.[Find your search service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Storage%2storageAccounts/) and on the Overview page, click **Import data** on the command bar to set up cognitive enrichment in four steps.
62
+
1.[Find your search service](https://ms.portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Storage%2storageAccounts/) and on the Overview page, click **Import data** on the command bar to set up cognitive enrichment in four steps.
64
63
65
64
:::image type="content" source="media/search-import-data-portal/import-data-cmd.png" alt-text="Screenshot of the Import data command" border="true":::
66
65
67
66
### Step 1 - Create a data source
68
67
69
68
1. In **Connect to your data**, choose **Azure Blob Storage**. Choose an existing connection to the storage account and container you created. Give the data source a name, and use default values for the rest.
Next, configure AI enrichment to invoke language detection, text translation, and entity recognition.
76
+
Next, configure AI enrichment to invoke OCR, image analysis, and natural language processing.
76
77
77
-
1. For this quickstart, you can use the **Free** Cognitive Services resource. The sample data consists of 10 files, so the daily, per-indexer allotment of 20 free transactions on Cognitive Services is sufficient for this quickstart.
78
+
1. For this quickstart, we are using the **Free** Cognitive Services resource. The sample data consists of 14 files, so the free allotment of 20 transaction on Cognitive Services is sufficient for this quickstart.
Choose entity recognition (people, organizations, locations) and image analysis skills.
88
89
89
-
In blobs, the "Content" field contains the content of the file. In the sample data, the content is multiple paragraphs about a given museum, in either French or Spanish. The "Granularity" is the field itself. Some skills work better on smaller chunks of text, but for the skills in this quickstart, field granularity is sufficient.
An index contains your searchable content and the **Import data** wizard can usually infer the schema for you by sampling the data. In this step, review the generated schema and potentially revise any settings. Below is the default schema created for the demo data set.
96
+
An index contains your searchable content and the **Import data** wizard can usually create the schema for you by sampling the data source. In this step, review the generated schema and potentially revise any settings. Below is the default schema created for the demo Blob data set.
94
97
95
98
For this quickstart, the wizard does a good job setting reasonable defaults:
96
99
@@ -100,27 +103,27 @@ For this quickstart, the wizard does a good job setting reasonable defaults:
100
103
101
104
+ Default attributes are **Retrievable** and **Searchable**. **Searchable** allows full text search a field. **Retrievable** means field values can be returned in results. The wizard assumes you want these fields to be retrievable and searchable because you created them via a skillset.
102
105
103
-
+ Select the filterable checkbox for "Language". The wizard won't set the folder for you, but the ability to filter by language is useful in this demo given that there are multiple languages.
Marking a field as **Retrievable** doesn't mean that the field *must* be present in the search results. You can precisely control search results composition by using the **$select** query parameter to specify which fields to include. For text-heavy fields like `content`, the **$select** parameter is your solution for shaping manageable search results to the human users of your application, while ensuring client code has access to all the information it needs via the **Retrievable** attribute.
108
+
Marking a field as **Retrievable** does not mean that the field *must* be present in the search results. You can control search results composition by using the **$select** query parameter to specify which fields to include.
109
+
110
+
Continue to the next page.
108
111
109
112
### Step 4 - Configure the indexer
110
113
111
-
The indexer is a high-level resource that drives the indexing process. It specifies the data source name, a target index, and frequency of execution. The **Import data** wizard creates several objects, and of them is always an indexer that you can run repeatedly.
114
+
The indexer drives the indexing process. It specifies the data source name, a target index, and frequency of execution. The **Import data** wizard creates several objects, including an indexer that you can reset and run repeatedly.
112
115
113
116
1. In the **Indexer** page, you can accept the default name and click the **Once** schedule option to run it immediately.
1. Click **Submit** to create and simultaneously run the indexer.
118
121
119
122
## Monitor status
120
123
121
-
Cognitive skills indexing takes longer to complete than typical text-based indexing. To monitor progress, go to the Overview page and select the **Indexers** tab in the middle of page.
124
+
Cognitive skills indexing takes longer to complete than typical text-based indexing, especially OCR and image analysis. To monitor progress, go to the Overview page and click **Indexers** in the middle of page.
To check details about execution status, select an indexer from the list.
126
129
@@ -132,30 +135,40 @@ After an index is created, you can run queries to return results. In the portal,
132
135
133
136
1. Select **Change Index** at the top to select the index you created.
134
137
135
-
1. In Query string, enter a search string to query the index, such as `search="picasso museum" &$select=people,organizations,locations,language,translated_text &$count=true &$filter=language eq 'fr'`, and then select **Search**.
136
-
137
-
:::image type="content" source="media/cognitive-search-quickstart-blob/search-explorer-query-string-spanish-museums.png" alt-text="Query string in search explorer" border="true":::
138
+
1. Enter a search string to query the index, such as `search=Microsoft&$select=people,organizations,locations,imageTags`.
138
139
139
140
Results are returned as JSON, which can be verbose and hard to read, especially in large documents originating from Azure blobs. Some tips for searching in this tool include the following techniques:
140
141
141
142
+ Append `$select` to specify which fields to include in results.
142
143
+ Use CTRL-F to search within the JSON for specific properties or terms.
143
144
144
-
:::image type="content" source="media/cognitive-search-quickstart-blob/search-explorer-results-spanish-museums.png" alt-text="Search explorer example" border="true":::
145
-
146
145
Query strings are case-sensitive so if you get an "unknown field" message, check **Fields** or **Index Definition (JSON)** to verify name and case.
147
146
147
+
:::image type="content" source="media/cognitive-search-quickstart-blob/search-explorer.png" alt-text="Search explorer example" border="false":::
148
+
149
+
## Takeaways
150
+
151
+
You've now created your first skillset and learned important concepts useful for prototyping an enriched search solution using your own data.
152
+
153
+
Some key concepts that we hope you picked up include the dependency on Azure data sources. A skillset is bound to an indexer, and indexers are Azure and source-specific. Although this quickstart uses Azure Blob Storage, other Azure data sources are possible. For more information, see [Indexers in Azure Cognitive Search](search-indexer-overview.md).
154
+
155
+
Another important concept is that skills operate over content types, and when working with heterogeneous content, some inputs will be skipped. Also, large files or fields might exceed the indexer limits of your service tier. It's normal to see warnings when these events occur.
156
+
157
+
Output is directed to a search index, and there is a mapping between name-value pairs created during indexing and individual fields in your index. Internally, the portal sets up [annotations](cognitive-search-concept-annotations-syntax.md) and defines a [skillset](cognitive-search-defining-skillset.md), establishing the order of operations and general flow. These steps are hidden in the portal, but when you start writing code, these concepts become important.
158
+
159
+
Finally, you learned that can verify content by querying the index. In the end, what Azure Cognitive Search provides is a searchable index, which you can query using either the [simple](/rest/api/searchservice/simple-query-syntax-in-azure-search) or [fully extended query syntax](/rest/api/searchservice/lucene-query-syntax-in-azure-search). An index containing enriched fields is like any other. If you want to incorporate standard or [custom analyzers](search-analyzers.md), [scoring profiles](/rest/api/searchservice/add-scoring-profiles-to-a-search-index), [synonyms](search-synonyms.md), [faceted navigation](search-faceted-navigation.md), geo-search, or any other Azure Cognitive Search feature, you can certainly do so.
160
+
148
161
## Clean up resources
149
162
150
163
When you're working in your own subscription, it's a good idea at the end of a project to identify whether you still need the resources you created. Resources left running can cost you money. You can delete resources individually or delete the resource group to delete the entire set of resources.
151
164
152
165
You can find and manage resources in the portal, using the **All resources** or **Resource groups** link in the left-navigation pane.
153
166
154
-
If you're using a free service, remember that you're limited to three indexes, indexers, and data sources. You can delete individual items in the portal to stay under the limit.
167
+
If you are using a free service, remember that you are limited to three indexes, indexers, and data sources. You can delete individual items in the portal to stay under the limit.
155
168
156
169
## Next steps
157
170
158
-
Cognitive Search has other built-in skills that can be exercised in the Import data wizard. As a next step, try the next quickstart that creates a knowledge store.
171
+
You can create skillsets using the portal, .NET SDK, or REST API. To further your knowledge, try the REST API using Postman and more sample data.
159
172
160
173
> [!div class="nextstepaction"]
161
-
> [Quickstart: Create a knowledge store](knowledge-store-create-portal.md)
174
+
> [Tutorial: Extract text and structure from JSON blobs using REST APIs ](cognitive-search-tutorial-blob.md)
0 commit comments