You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-tutorial-debug-sessions.md
+40-55Lines changed: 40 additions & 55 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,35 +10,35 @@ ms.service: cognitive-search
10
10
ms.custom:
11
11
- ignite-2023
12
12
ms.topic: tutorial
13
-
ms.date: 10/09/2023
13
+
ms.date: 03/06/2024
14
14
---
15
15
16
16
# Tutorial: Debug a skillset using Debug Sessions
17
17
18
-
Skillsets coordinate a series of actions that analyzeor transform content, where the output of one skill becomes the input of another. When inputs depend on outputs, mistakes in skillset definitions and field associations can result in missed operations and data.
18
+
A skillset coordinates the actions of skills that analyze, transform, or create searchable content. Frequently, the output of one skill becomes the input of another. When inputs depend on outputs, mistakes in skillset definitions and field associations can result in missed operations and data.
19
19
20
-
**Debug sessions** is a tool in the Azure portal provides a holistic visualization of a skillset. Using this tool, you can drill down to specific steps to easily see where an action might be falling down.
20
+
**Debug sessions** is an Azure portal tool that provides a holistic visualization of a skillset. Using this tool, you can drill down to specific steps to easily see where an action might be falling down.
21
21
22
-
In this article, use **Debug sessions** to find and fix missing inputs and outputs. The tutorial is all-inclusive. It provides sample data, a Postman collection that creates objects, and instructions for debugging problems in the skillset.
22
+
In this article, use **Debug sessions** to find and fix missing inputs and outputs. The tutorial is all-inclusive. It provides sample data, a REST file that creates objects, and instructions for debugging problems in the skillset.
23
23
24
-
## Prerequisites
25
-
26
-
Before you begin, have the following prerequisites in place:
24
+
If you don't have an Azure subscription, create a [free account](https://azure.microsoft.com/free/?WT.mc_id=A261C142F) before you begin.
27
25
28
-
+ An active subscription. [Create an account for free](https://azure.microsoft.com/free/).
26
+
## Prerequisites
29
27
30
28
+ Azure AI Search. [Create a service](search-create-service-portal.md) or [find an existing service](https://portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices) under your current subscription. You can use a free service for this tutorial.
31
29
32
30
+ Azure Storage account with [Blob storage](../storage/blobs/index.yml), used for hosting sample data, and for persisting cached data created during a debug session.
33
31
34
-
+[Postman app](https://www.postman.com/downloads/) and a [Postman collection](https://github.com/Azure-Samples/azure-search-postman-samples/tree/main/Debug-sessions) to create objects using the REST APIs.
32
+
+[Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client).
+[Sample debug-sessions.rest file](https://github.com/Azure-Samples/azure-search-postman-samples/blob/main/Debug-sessions/debug-sessions.rest) used to create the enrichment pipeline.
37
+
38
38
> [!NOTE]
39
39
> This tutorial also uses [Azure AI services](https://azure.microsoft.com/services/cognitive-services/) for language detection, entity recognition, and key phrase extraction. Because the workload is so small, Azure AI services is tapped behind the scenes for free processing for up to 20 transactions. This means that you can complete this exercise without having to create a billable Azure AI services resource.
40
40
41
-
## Set up your data
41
+
## Set up the sample data
42
42
43
43
This section creates the sample data set in Azure Blob Storage so that the indexer and skillset have content to work with.
44
44
@@ -52,53 +52,33 @@ This section creates the sample data set in Azure Blob Storage so that the index
52
52
53
53
1. Navigate to the Azure Storage services pages in the portal and create a Blob container. Best practice is to specify the access level "private". Name your container `clinicaltrialdataset`.
54
54
55
-
1. In container, click**Upload** to upload the sample files you downloaded and unzipped in the first step.
55
+
1. In container, select**Upload** to upload the sample files you downloaded and unzipped in the first step.
56
56
57
-
1. While in the portal, get and save off the connection string for Azure Storage. You'll need it for the REST API calls that index data. You can get the connection string from **Settings** > **Access Keys** in the portal.
57
+
1. While in the portal, copy the connection string for Azure Storage. You can get the connection string from **Settings** > **Access Keys** in the portal.
58
58
59
-
## Get a key and URL
59
+
## Copy a key and URL
60
60
61
-
REST calls require the service URL and an access key on every request. A search service is created with both, so if you added Azure AI Search to your subscription, follow these steps to get the necessary information:
61
+
REST calls require the search service endpoint and an API key on every request. You can get these values from the Azure portal.
62
62
63
-
1. Sign in to the [Azure portal](https://portal.azure.com), and in your search service **Overview** page, get the URL. An example endpoint might look like `https://mydemo.search.windows.net`.
63
+
1. Sign in to the [Azure portal](https://portal.azure.com), navigate to the **Overview** page, and copy the URL. An example endpoint might look like `https://mydemo.search.windows.net`.
64
64
65
-
1.In**Settings** > **Keys**, get an admin key for full rights on the service. There are two interchangeable admin keys, provided for business continuity in case you need to roll one over. You can use either the primary or secondary key on requests for adding, modifying, and deleting objects.
65
+
1.Under**Settings** > **Keys**, copy an admin key. Admin keys are used to add, modify, and delete objects. There are two interchangeable admin keys. Copy either one.
66
66
67
-
:::image type="content" source="media/search-get-started-rest/get-url-key.png" alt-text="Get an HTTP endpoint and access key" border="false":::
67
+
:::image type="content" source="media/search-get-started-rest/get-url-key.png" alt-text="Screenshot of the URL and API keys in the Azure portal.":::
68
68
69
-
All requests require an api-key on every request sent to your service. Having a valid key establishes trust, on a per request basis, between the application sending the request and the service that handles it.
69
+
A valid API key establishes trust, on a per request basis, between the application sending the request and the search service handling it.
70
70
71
71
## Create data source, skillset, index, and indexer
72
72
73
-
In this section, you will import a Postman collection containing a "buggy" workflow that you will fix in this tutorial.
74
-
75
-
1. Start Postman and import the [DebugSessions.postman_collection.json](https://github.com/Azure-Samples/azure-search-postman-samples/tree/main/Debug-sessions) collection. If you're unfamiliar with this tool, see [Quickstart: Text search using REST](search-get-started-rest.md).
76
-
77
-
1. Under **Files** > **New**, select the collection.
73
+
In this section, create a "buggy" workflow that you can fix in this tutorial.
78
74
79
-
1.After the collection is imported, expand the actions list (...).
75
+
1.Start Visual Studio Code and open the `debug-sessions.rest` file.
80
76
81
-
1.Select **Edit** to set variables used in each request.
77
+
1.Provide the following variables: search service URL, search services admin API key, storage connection string, and the name of the blob container storing the PDFs.
82
78
83
-
| Current value | Description |
84
-
|---------------|-------------|
85
-
| searchService | The name of your search service (for example, if the endpoint is `https://mydemo.search.windows.net`, then the service name is `mydemo`). |
86
-
| apiKey | The primary or secondary key obtained from the **Keys** page of your search service. |
87
-
| storageConnectionString | The connection string obtained from the **Access Keys** page of your Azure Storage account. |
88
-
| containerName | The name of the container you created for the sample data. |
79
+
1. Send each request in turn. Creating the indexer takes several minutes to complete.
89
80
90
-
1.**Save** your changes. The requests fail unless you save the variables.
91
-
92
-
1. You should see four REST calls in the collection.
93
-
94
-
+ CreateDataSource adds `clinical-trials-ds`
95
-
+ CreateSkillset adds `clinical-trials-ss`
96
-
+ CreateIndex adds `clinical-trials`
97
-
+ CreateIndexer adds `clinical-trials-idxr`
98
-
99
-
1. Open each request in turn, and select **Send** to send each request to the search service. The last one will take several minutes to complete.
100
-
101
-
1. Close Postman and return to the Azure portal.
81
+
1. Close the file.
102
82
103
83
## Check results in the portal
104
84
@@ -108,11 +88,16 @@ The sample code intentionally creates a buggy index as a consequence of problems
108
88
109
89
1. Select *clinical-trials*.
110
90
111
-
1. Enter this query string: `$select=metadata_storage_path, organizations, locations&$count=true` to return fields for specific documents (identified by the unique `metadata_storage_path` field).
91
+
1. Enter this JSON query string in Search explorer's JSON view. It returns fields for specific documents (identified by the unique `metadata_storage_path` field).
1.Select **Search** to run the query. You should see empty values for "organizations" and "locations".
98
+
1. Run the query. You should see empty values for `organizations` and `locations`.
114
99
115
-
These fields should have been populated through the skillset's [Entity Recognition skill](cognitive-search-skill-entity-recognition-v3.md), used to detect organizations and locations anywhere within the blob's content. In the next exercise, you'll debug the skillset to determine what went wrong.
100
+
These fields should have been populated through the skillset's [Entity Recognition skill](cognitive-search-skill-entity-recognition-v3.md), used to detect organizations and locations anywhere within the blob's content. In the next exercise, you'll debug the skillset to determine what went wrong.
116
101
117
102
Another way to investigate errors and warnings is through the Azure portal.
118
103
@@ -154,7 +139,7 @@ Any issues reported by the indexer can be found in the adjacent **Errors/Warning
154
139
155
140
:::image type="content" source="media/cognitive-search-debug/debug-session-errors-warnings.png" alt-text="Screenshot of the errors and warnings tab." border="true":::
156
141
157
-
Notice that the **Errors/Warnings** tab will provide a much smaller list than the one displayed earlier because this list is only detailing the errors for a single document. Like the list displayed by the indexer, you can click on a warning message and see the details of this warning.
142
+
Notice that the **Errors/Warnings** tab will provide a much smaller list than the one displayed earlier because this list is only detailing the errors for a single document. Like the list displayed by the indexer, you can select on a warning message and see the details of this warning.
158
143
159
144
Select **Errors/Warnings** to review the notifications. You should see four:
160
145
@@ -181,11 +166,11 @@ In the **Errors/Warnings** tab, there are two missing inputs for an operation la
181
166
182
167
1. Select the **Executions** tab and locate the input for "text".
183
168
184
-
1. Select the **</>** symbol to pop open the Expression Evaluator. The displayed result for this input doesn’t look like a text input. It looks like a series of new line characters `\n \n\n\n\n` instead of text. The lack of text means that no entities can be identified, so either this document fails to meet the prerequisites of the skill, or there is another input that should be used instead.
169
+
1. Select the **</>** symbol to pop open the Expression Evaluator. The displayed result for this input doesn’t look like a text input. It looks like a series of new line characters `\n \n\n\n\n` instead of text. The lack of text means that no entities can be identified, so either this document fails to meet the prerequisites of the skill, or there's another input that should be used instead.
185
170
186
171
:::image type="content" source="media/cognitive-search-debug/expression-evaluator-text.png" alt-text="Screenshot of Expression Evaluator for the text input." border="true":::
187
172
188
-
1. Switch the left pane to **Enriched Data Structure** and scroll down the list of enrichment nodes for this document. Notice the `\n \n\n\n\n` for "content" has no originating source, but another value for "merged_content" has OCR output. Although there is no indication, the content of this PDF appears to be a JPEG file, as evidenced by the extracted and processed text in "merged_content".
173
+
1. Switch the left pane to **Enriched Data Structure** and scroll down the list of enrichment nodes for this document. Notice the `\n \n\n\n\n` for "content" has no originating source, but another value for "merged_content" has OCR output. Although there's no indication, the content of this PDF appears to be a JPEG file, as evidenced by the extracted and processed text in "merged_content".
189
174
190
175
:::image type="content" source="media/cognitive-search-debug/enriched-data-structure-content.png" alt-text="Screenshot of Enriched Data Structure." border="true":::
191
176
@@ -222,19 +207,19 @@ In the **Errors/Warnings** tab, there are two missing inputs for an operation la
222
207
223
208
:::image type="content" source="media/cognitive-search-debug/expression-evaluator-language.png" alt-text="Screenshot of Expression Evaluator for the language input." border="true":::
224
209
225
-
There are two ways to research this error. The first is to look at where the input is coming from - what skill in the hierarchy is supposed to produce this result? The Executions tab in the skill details pane should display the source of the input. If there is no source, this indicates a field mapping error.
210
+
There are two ways to research this error. The first is to look at where the input is coming from - what skill in the hierarchy is supposed to produce this result? The Executions tab in the skill details pane should display the source of the input. If there's no source, this indicates a field mapping error.
226
211
227
-
1. In the **Executions** tab, check the INPUTS and find "languageCode". There is no source for this input listed.
212
+
1. In the **Executions** tab, check the INPUTS and find "languageCode". There's no source for this input listed.
228
213
229
-
1. Switch the left pane to **Enriched Data Structure**. Scroll down the list of enrichment nodes for this document. Notice that there is no "languageCode" node, but there is one for "language". So, there is a typo in the skill settings.
214
+
1. Switch the left pane to **Enriched Data Structure**. Scroll down the list of enrichment nodes for this document. Notice that there's no "languageCode" node, but there's one for "language". So, there's a typo in the skill settings.
230
215
231
216
:::image type="content" source="media/cognitive-search-debug/enriched-data-structure-language.png" alt-text="Screenshot of Enriched Data Structure, with language highlighted." border="true":::
232
217
233
218
1. Still in the **Enriched Data Structure**, open the Expression Evaluator **</>** for the "language" node and copy the expression `/document/language`.
234
219
235
220
1. In the right pane, select **Skill Settings** for the #1 skill and open the Expression Evaluator **</>** for the input "languageCode".
236
221
237
-
1. Paste the new value, `/document/language` into the Expression box and click**Evaluate**. It should display the correct input "en".
222
+
1. Paste the new value, `/document/language` into the Expression box and select**Evaluate**. It should display the correct input "en".
238
223
239
224
1. Select **Save**.
240
225
@@ -250,7 +235,7 @@ The messages say to check the 'outputFieldMappings' property of your indexer, so
250
235
251
236
:::image type="content" source="media/cognitive-search-debug/output-field-mappings-locations-organizations.png" alt-text="Screenshot of the output field mappings." border="true":::
252
237
253
-
1. If there is no problem with the index, the next step is to check skill outputs. As before, select the **Enriched Data Structure**, and scroll the nodes to find "locations" and "organizations". Notice that the parent is "content" instead of "merged_content". The context is wrong.
238
+
1. If there's no problem with the index, the next step is to check skill outputs. As before, select the **Enriched Data Structure**, and scroll the nodes to find "locations" and "organizations". Notice that the parent is "content" instead of "merged_content". The context is wrong.
254
239
255
240
:::image type="content" source="media/cognitive-search-debug/enriched-data-structure-wrong-parent.png" alt-text="Screenshot of Enriched Data Structure with wrong context." border="true":::
256
241
@@ -304,7 +289,7 @@ When you're working in your own subscription, it's a good idea at the end of a p
304
289
305
290
You can find and manage resources in the portal, using the **All resources** or **Resource groups** link in the left-navigation pane.
306
291
307
-
If you're using a free service, remember that you're limited to three indexes, indexers, and data sources. You can delete individual items in the portal to stay under the limit.
292
+
The free service is limited to three indexes, indexers, and data sources. You can delete individual items in the portal to stay under the limit.
0 commit comments