You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-concept-troubleshooting.md
+22-63Lines changed: 22 additions & 63 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,39 +8,32 @@ ms.service: cognitive-search
8
8
ms.custom:
9
9
- ignite-2023
10
10
ms.topic: conceptual
11
-
ms.date: 09/16/2022
11
+
ms.date: 02/16/2024
12
12
---
13
-
# Tips for AI enrichment in Azure AI Search
14
13
15
-
This article contains a list of tips and tricks to keep you moving as you get started with AI enrichment capabilities in Azure AI Search.
14
+
# Tips for AI enrichment in Azure AI Search
16
15
17
-
If you haven't already, step through [Quickstart: Create a skillset for AI enrichment](cognitive-search-quickstart-blob.md) for a light-weight introduction to enrichment of blob data.
16
+
This article contains tips to help you get started with AI enrichment and skillsets used during indexing.
18
17
19
-
## Tip 1: Start with a small dataset
18
+
## Tip 1: Start simple and start small
20
19
21
-
The best way to find issues quickly is to increase the speed at which you can fix issues, which means working with smaller or simpler documents.
20
+
Both the [Import data wizard](ognitive-search-quickstart-blob.md) and [Import and vectorize data wizard](search-get-started-portal-import-vectors.md) in the Azure portal support AI enrichment. Without writing any code, you can create all of the objects used in an enrichment pipeline: an index, indexer, data source, and skillset.
22
21
23
-
Start by creating a data source with just a handful of documents or rows in a table that are representative of the documents that will be indexed.
24
-
25
-
Run your sample through the end-to-end pipeline and check that the results meet your needs. Once you're satisfied with the results, you're ready to add more files to your data source.
22
+
Another way to start simply is by creating a data source with just a handful of documents or rows in a table that are representative of the documents that will be indexed. A small data set is the best way to increase the speed of finding and fixing issues.Run your sample through the end-to-end pipeline and check that the results meet your needs. Once you're satisfied with the results, you're ready to add more files to your data source.
26
23
27
24
## Tip 2: Make sure your data source credentials are correct
28
25
29
-
The data source connection isn't validated until you define an indexer that uses it. If you get connection errors, make sure that:
30
-
31
-
+ Your connection string is correct. Specially when you're creating SAS tokens, make sure to use the format expected by Azure AI Search. See [How to specify credentials section](search-howto-indexing-azure-blob-storage.md#credentials) to learn about the different formats supported.
32
-
33
-
+ Your container name in the indexer is correct.
26
+
The data source connection isn't validated until indexer execution. If you get connection errors, check the connection string, permissions, and the folder or container name.
34
27
35
28
## Tip 3: See what works even if there are some failures
36
29
37
30
Sometimes a small failure stops an indexer in its tracks. That is fine if you plan to fix issues one by one. However, you might want to ignore a particular type of error, allowing the indexer to continue so that you can see what flows are actually working.
38
31
39
-
In that case, you may want to tell the indexer to ignore errors. Do that by setting *maxFailedItems* and *maxFailedItemsPerBatch* as -1 as part of the indexer definition.
32
+
To ignore errors during development, set `maxFailedItems` and `maxFailedItemsPerBatch` as -1 as part of the indexer definition.
40
33
41
-
```
34
+
```json
42
35
{
43
-
"// rest of your indexer definition
36
+
// rest of your indexer definition
44
37
"parameters":
45
38
{
46
39
"maxFailedItems":-1,
@@ -50,68 +43,34 @@ In that case, you may want to tell the indexer to ignore errors. Do that by sett
50
43
```
51
44
52
45
> [!NOTE]
53
-
> As a best practice, set the maxFailedItems, maxFailedItemsPerBatch to 0 for production workloads
54
-
55
-
## Tip 4: Use Debug sessions to identify and resolve issues with your skillset
56
-
57
-
**Debug sessions** is a visual editor that works with an existing skillset in the Azure portal. Within a debug session you can identify and resolve errors, validate changes, and commit changes to a production skillset in the AI enrichment pipeline. This is a preview feature [read the documentation](./cognitive-search-debug-session.md). For more information about concepts and getting started, see [Debug sessions](./cognitive-search-tutorial-debug-sessions.md).
58
-
59
-
Debug sessions work on a single document are a great way for you to iteratively build more complex enrichment pipelines.
60
-
61
-
## Tip 5: Looking at enriched documents under the hood
62
-
63
-
Enriched documents are temporary structures created during enrichment, and then deleted when processing is complete.
46
+
> As a best practice, set the `maxFailedItems` and `maxFailedItemsPerBatch` to 0 for production workloads
64
47
65
-
To capture a snapshot of the enriched document created during indexing, add a field called ```enriched``` to your index. The indexer automatically dumps into the field a string representation of all the enrichments for that document.
48
+
## Tip 4: Use Debug session to identify and resolve issues with your skillset
66
49
67
-
The ```enriched``` field will contain a string that is a logical representation of the in-memory enriched document in JSON. The field value is a valid JSON document, however. Quotes are escaped so you'll need to replace `\"` with `"` in order to view the document as formatted JSON.
50
+
[**Debug session**](./cognitive-search-debug-session.md) is a visual editor that shows a skillset's dependency graph, inputs and outputs, and definitions. It works by loading a single document from your search index, with the current indexer and skillset configuration. You can then run the entire skillset, scoped to a single document. Within a debug session, you can identify and resolve errors, validate changes, and commit changes to a parent skillset. For a walkthrough, see [Tutorial: debug sessions](./cognitive-search-tutorial-debug-sessions.md).
68
51
69
-
The enriched field is intended for debugging purposes only, to help you understand the logical shape of the content that expressions are being evaluated against. You shouldn't depend on this field for indexing purposes.
52
+
## Tip 5: Expected content fails to appear
70
53
71
-
Add an ```enriched``` field as part of your index definition for debugging purposes:
72
-
73
-
#### Request Body Syntax
74
-
75
-
```json
76
-
{
77
-
"fields": [
78
-
// other fields go here.
79
-
{
80
-
"name": "enriched",
81
-
"type": "Edm.String",
82
-
"searchable": false,
83
-
"sortable": false,
84
-
"filterable": false,
85
-
"facetable": false
86
-
}
87
-
]
88
-
}
89
-
```
90
-
91
-
## Tip 6: Expected content fails to appear
92
-
93
-
Missing content could be the result of documents getting dropped during indexing. Free and Basic tiers have low limits on document size. Any file exceeding the limit is dropped during indexing. You can check for dropped documents in the Azure portal. In the search service dashboard, double-click the Indexers tile. Review the ratio of successful documents indexed. If it isn't 100%, you can select the ratio to get more detail.
54
+
If you're missing content, check for dropped documents in the Azure portal. In the search service page, open **Indexers** and look at the **Docs succeeded** column. Click through to indexer execution history to review specific errors.
94
55
95
56
If the problem is related to file size, you might see an error like this: "The blob \<file-name>" has the size of \<file-size> bytes, which exceed the maximum size for document extraction for your current service tier." For more information on indexer limits, see [Service limits](search-limits-quotas-capacity.md).
96
57
97
58
A second reason for content failing to appear might be related input/output mapping errors. For example, an output target name is "People" but the index field name is lower-case "people". The system could return 201 success messages for the entire pipeline so you think indexing succeeded, when in fact a field is empty.
98
59
99
-
## Tip 7: Extend processing beyond maximum run time (24-hour window)
60
+
## Tip 6: Extend processing beyond maximum run time (24-hour window)
100
61
101
-
Image analysis is computationally intensive for even simple cases, so when images are especially large or complex, processing times can exceed the maximum time allowed.
62
+
Image analysis is computationally intensive for even simple cases, so when images are especially large or complex, processing times can exceed the maximum time allowed.
102
63
103
-
Maximum run time varies by tier: several minutes on the Free tier, 24-hour indexing on billable tiers. If processing fails to complete within a 24-hour period for on-demand processing, switch to a schedule to have the indexer pick up processing where it left off.
64
+
For indexers that have skillsets, skillset execution is [capped at 2 hours for most tiers](search-limits-quotas-capacity.md#indexer-limits). If skillset processing fails to complete within that period, you can put your indexer on a 2-hour recurring schedule to have the indexer pick up processing where it left off.
104
65
105
-
For scheduled indexers, indexing resumes on schedule at the last known good document. By using a recurring schedule, the indexer can work its way through the image backlog over a series of hours or days, until all unprocessed images are processed. For more information on schedule syntax, see [Schedule an indexer](search-howto-schedule-indexers.md).
66
+
Scheduled indexing resumes at the last known good document. On a recurring schedule, the indexer can work its way through the image backlog over a series of hours or days, until all unprocessed images are processed. For more information on schedule syntax, see [Schedule an indexer](search-howto-schedule-indexers.md).
106
67
107
68
> [!NOTE]
108
-
> If an indexer is set to a certain schedule but repeatedly fails on the same document over and over again each time it runs, the indexer will begin running on a less frequent interval (up to the maximum of at least once every 24 hours) until it successfully makes progress again. If you believe you have fixed whatever the issue that was causing the indexer to be stuck at a certain point, you can perform an on-demand run of the indexer, and if that successfully makes progress, the indexer will return to its set schedule interval again.
109
-
110
-
For portal-based indexing (as described in the quickstart), choosing the "run once" indexer option limits processing to 1 hour (`"maxRunTime": "PT1H"`). You might want to extend the processing window to something longer.
69
+
> If an indexer is set to a certain schedule but repeatedly fails on the same document over and over again each time it runs, the indexer will begin running on a less frequent interval (up to the maximum of at least once every 24 hours) until it successfully makes progress again. = If you believe you have fixed whatever the issue that was causing the indexer to be stuck at a certain point, you can perform an on-demand run of the indexer, and if that successfully makes progress, the indexer will return to its set schedule interval again.
111
70
112
-
## Tip 8: Increase indexing throughput
71
+
## Tip 7: Increase indexing throughput
113
72
114
-
For [parallel indexing](search-howto-large-index.md), place your data into multiple containers or multiple virtual folders inside the same container. Then create multiple data source and indexer pairs. All indexers can use the same skillset and write into the same target search index, so your search app doesn’t need to be aware of this partitioning.
73
+
For [parallel indexing](search-howto-large-index.md), distribute your data into multiple containers or multiple virtual folders inside the same container. Then create multiple data source and indexer pairs. All indexers can use the same skillset and write into the same target search index, so your search app doesn’t need to be aware of this partitioning.
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-incremental-indexing-conceptual.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ ms.date: 02/16/2024
16
16
> [!IMPORTANT]
17
17
> This feature is in public preview under [supplemental terms of use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). The [preview REST API](/rest/api/searchservice/index-preview) supports this feature.
18
18
19
-
*Incremental enrichment* refers to the use of cached enrichments during [skillset execution](cognitive-search-working-with-skillsets.md) so that only new and changed skills and documents incur AI processing charges. The cache contains the output from [document cracking](search-indexer-overview.md#document-cracking), plus the outputs of each skill for every document. Although caching is billable (it uses Azure Storage), the overall cost of enrichment is reduced because the costs of storage are less than image extraction and AI processing.
19
+
*Incremental enrichment* refers to the use of cached enrichments during [skillset execution](cognitive-search-working-with-skillsets.md) so that only new and changed skills and documents incur pay-as-you-go processing charges for API calls to Azure AI services. The cache contains the output from [document cracking](search-indexer-overview.md#document-cracking), plus the outputs of each skill for every document. Although caching is billable (it uses Azure Storage), the overall cost of enrichment is reduced because the costs of storage are less than image extraction and AI processing.
20
20
21
21
When you enable caching, the indexer evaluates your updates to determine whether existing enrichments can be pulled from the cache. Image and text content from the document cracking phase, plus skill outputs that are upstream or orthogonal to your edits, are likely to be reusable.
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-skill-textsplit.md
+5-3Lines changed: 5 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ ms.service: cognitive-search
8
8
ms.custom:
9
9
- ignite-2023
10
10
ms.topic: reference
11
-
ms.date: 10/25/2023
11
+
ms.date: 02/18/2024
12
12
---
13
13
14
14
# Text split cognitive skill
@@ -44,7 +44,7 @@ Parameters are case-sensitive.
44
44
45
45
| Parameter name | Description |
46
46
|--------------------|-------------|
47
-
|`textItems`| An array of substrings that were extracted. |
47
+
|`textItems`| An array of substrings that were extracted. `textItems` is the default name of the output. `targetName` is optional, but if you have multiple text split skills, make sure to set `targetName` so that you don't overwrite the data from the first skill with the second one. If `targetName` is set, use it in output field mappings or in downstream skills that use the skill output.|
48
48
49
49
## Sample definition
50
50
@@ -135,7 +135,9 @@ This example is for integrated vectorization, currently in preview. It adds prev
135
135
136
136
This definition adds `pageOverlapLength` of 100 characters and `maximumPagesToTake` of one.
137
137
138
-
Assuming the `maximumPageLength` is 5000 characters (the default), then `"maximumPagesToTake": 1` processes the first 5000 characters of each source document.
138
+
Assuming the `maximumPageLength` is 5,000 characters (the default), then `"maximumPagesToTake": 1` processes the first 5,000 characters of each source document.
139
+
140
+
This example sets `textItems` to `myPages` through `targetName`. Because `targetName` is set, `myPages` is the value you should use to select the output from the Text Split skill. Use `/document/mypages/*` in downstream skills, indexer output field mappings, knowledge store projection, or index projections.
0 commit comments