Skip to content

Commit f0477ab

Browse files
authored
Merge pull request #211610 from HeidiSteen/heidist-fresh
Freshness pass over moreLikeThis
2 parents f672baf + b7dd57e commit f0477ab

File tree

2 files changed

+27
-19
lines changed

2 files changed

+27
-19
lines changed

articles/search/cognitive-search-concept-troubleshooting.md

Lines changed: 24 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -3,31 +3,36 @@ title: Tips for AI enrichment design
33
titleSuffix: Azure Cognitive Search
44
description: Tips and troubleshooting for setting up AI enrichment pipelines in Azure Cognitive Search.
55

6-
author: LiamCavanagh
7-
ms.author: liamca
6+
author: HeidiSteen
7+
ms.author: heidist
88
ms.service: cognitive-search
99
ms.topic: conceptual
10-
ms.date: 09/16/2021
10+
ms.date: 09/16/2022
1111
---
1212
# Tips for AI enrichment in Azure Cognitive Search
1313

1414
This article contains a list of tips and tricks to keep you moving as you get started with AI enrichment capabilities in Azure Cognitive Search.
1515

16-
If you haven't already, step through [Quickstart: Create a skillset for AI enrichment](cognitive-search-quickstart-blob.md) for an introduction to enrichment of blob data.
16+
If you haven't already, step through [Quickstart: Create a skillset for AI enrichment](cognitive-search-quickstart-blob.md) for a light-weight introduction to enrichment of blob data.
1717

1818
## Tip 1: Start with a small dataset
19-
The best way to find issues quickly is to increase the speed at which you can fix issues. The best way to reduce the indexing time is by reducing the number of documents to be indexed.
2019

21-
Start by creating a data source with just a handful of documents/records. Your document sample should be a good representation of the variety of documents that will be indexed.
20+
The best way to find issues quickly is to increase the speed at which you can fix issues, which means working with smaller or simpler documents.
2221

23-
Run your document sample through the end-to-end pipeline and check that the results meet your needs. Once you are satisfied with the results, you can add more files to your data source.
22+
Start by creating a data source with just a handful of documents or rows in a table that are representative of the documents that will be indexed.
23+
24+
Run your sample through the end-to-end pipeline and check that the results meet your needs. Once you're satisfied with the results, you're ready to add more files to your data source.
2425

2526
## Tip 2: Make sure your data source credentials are correct
26-
The data source connection is not validated until you define an indexer that uses it. If you see any errors mentioning that the indexer cannot get to the data, make sure that:
27-
- Your connection string is correct. Specially when you are creating SAS tokens, make sure to use the format expected by Azure Cognitive Search. See [How to specify credentials section](search-howto-indexing-azure-blob-storage.md#credentials) to learn about the different formats supported.
28-
- Your container name in the indexer is correct.
27+
28+
The data source connection isn't validated until you define an indexer that uses it. If you get connection errors, make sure that:
29+
30+
+ Your connection string is correct. Specially when you're creating SAS tokens, make sure to use the format expected by Azure Cognitive Search. See [How to specify credentials section](search-howto-indexing-azure-blob-storage.md#credentials) to learn about the different formats supported.
31+
32+
+ Your container name in the indexer is correct.
2933

3034
## Tip 3: See what works even if there are some failures
35+
3136
Sometimes a small failure stops an indexer in its tracks. That is fine if you plan to fix issues one by one. However, you might want to ignore a particular type of error, allowing the indexer to continue so that you can see what flows are actually working.
3237

3338
In that case, you may want to tell the indexer to ignore errors. Do that by setting *maxFailedItems* and *maxFailedItemsPerBatch* as -1 as part of the indexer definition.
@@ -42,27 +47,30 @@ In that case, you may want to tell the indexer to ignore errors. Do that by sett
4247
}
4348
}
4449
```
50+
4551
> [!NOTE]
4652
> As a best practice, set the maxFailedItems, maxFailedItemsPerBatch to 0 for production workloads
4753
4854
## Tip 4: Use Debug sessions to identify and resolve issues with your skillset
4955

50-
Debug sessions is a visual editor that works with an existing skillset in the Azure portal. Within a debug session you can identify and resolve errors, validate changes, and commit changes to a production skillset in the AI enrichment pipeline. This is a preview feature [read the documentation](./cognitive-search-debug-session.md). For more information about concepts and getting started, see [Debug sessions](./cognitive-search-tutorial-debug-sessions.md).
56+
**Debug sessions** is a visual editor that works with an existing skillset in the Azure portal. Within a debug session you can identify and resolve errors, validate changes, and commit changes to a production skillset in the AI enrichment pipeline. This is a preview feature [read the documentation](./cognitive-search-debug-session.md). For more information about concepts and getting started, see [Debug sessions](./cognitive-search-tutorial-debug-sessions.md).
5157

5258
Debug sessions work on a single document are a great way for you to iteratively build more complex enrichment pipelines.
5359

5460
## Tip 5: Looking at enriched documents under the hood
61+
5562
Enriched documents are temporary structures created during enrichment, and then deleted when processing is complete.
5663

5764
To capture a snapshot of the enriched document created during indexing, add a field called ```enriched``` to your index. The indexer automatically dumps into the field a string representation of all the enrichments for that document.
5865

5966
The ```enriched``` field will contain a string that is a logical representation of the in-memory enriched document in JSON. The field value is a valid JSON document, however. Quotes are escaped so you'll need to replace `\"` with `"` in order to view the document as formatted JSON.
6067

61-
The enriched field is intended for debugging purposes only, to help you understand the logical shape of the content that expressions are being evaluated against. You should not depend on this field for indexing purposes.
68+
The enriched field is intended for debugging purposes only, to help you understand the logical shape of the content that expressions are being evaluated against. You shouldn't depend on this field for indexing purposes.
6269

6370
Add an ```enriched``` field as part of your index definition for debugging purposes:
6471

6572
#### Request Body Syntax
73+
6674
```json
6775
{
6876
"fields": [
@@ -81,19 +89,19 @@ Add an ```enriched``` field as part of your index definition for debugging purpo
8189

8290
## Tip 6: Expected content fails to appear
8391

84-
Missing content could be the result of documents getting dropped during indexing. Free and Basic tiers have low limits on document size. Any file exceeding the limit is dropped during indexing. You can check for dropped documents in the Azure portal. In the search service dashboard, double-click the Indexers tile. Review the ratio of successful documents indexed. If it is not 100%, you can click the ratio to get more detail.
92+
Missing content could be the result of documents getting dropped during indexing. Free and Basic tiers have low limits on document size. Any file exceeding the limit is dropped during indexing. You can check for dropped documents in the Azure portal. In the search service dashboard, double-click the Indexers tile. Review the ratio of successful documents indexed. If it isn't 100%, you can select the ratio to get more detail.
8593

86-
If the problem is related to file size, you might see an error like this: "The blob \<file-name>" has the size of \<file-size> bytes, which exceeds the maximum size for document extraction for your current service tier." For more information on indexer limits, see [Service limits](search-limits-quotas-capacity.md).
94+
If the problem is related to file size, you might see an error like this: "The blob \<file-name>" has the size of \<file-size> bytes, which exceed the maximum size for document extraction for your current service tier." For more information on indexer limits, see [Service limits](search-limits-quotas-capacity.md).
8795

8896
A second reason for content failing to appear might be related input/output mapping errors. For example, an output target name is "People" but the index field name is lower-case "people". The system could return 201 success messages for the entire pipeline so you think indexing succeeded, when in fact a field is empty.
8997

9098
## Tip 7: Extend processing beyond maximum run time (24-hour window)
9199

92-
Image analysis is computationally-intensive for even simple cases, so when images are especially large or complex, processing times can exceed the maximum time allowed.
100+
Image analysis is computationally intensive for even simple cases, so when images are especially large or complex, processing times can exceed the maximum time allowed.
93101

94102
Maximum run time varies by tier: several minutes on the Free tier, 24-hour indexing on billable tiers. If processing fails to complete within a 24-hour period for on-demand processing, switch to a schedule to have the indexer pick up processing where it left off.
95103

96-
For scheduled indexers, indexing resumes on schedule at the last known good document. By using a recurring schedule, the indexer can work its way through the image backlog over a series of hours or days, until all un-processed images are processed. For more information on schedule syntax, see [Schedule an indexer](search-howto-schedule-indexers.md).
104+
For scheduled indexers, indexing resumes on schedule at the last known good document. By using a recurring schedule, the indexer can work its way through the image backlog over a series of hours or days, until all unprocessed images are processed. For more information on schedule syntax, see [Schedule an indexer](search-howto-schedule-indexers.md).
97105

98106
> [!NOTE]
99107
> If an indexer is set to a certain schedule but repeatedly fails on the same document over and over again each time it runs, the indexer will begin running on a less frequent interval (up to the maximum of at least once every 24 hours) until it successfully makes progress again. If you believe you have fixed whatever the issue that was causing the indexer to be stuck at a certain point, you can perform an on-demand run of the indexer, and if that successfully makes progress, the indexer will return to its set schedule interval again.

articles/search/search-more-like-this.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: bevloh
77
ms.author: beloh
88
ms.service: cognitive-search
99
ms.topic: conceptual
10-
ms.date: 10/06/2021
10+
ms.date: 10/06/2022
1111

1212
---
1313

@@ -16,11 +16,11 @@ ms.date: 10/06/2021
1616
> [!IMPORTANT]
1717
> This feature is in public preview under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). The [preview REST API](/rest/api/searchservice/index-preview) supports this feature.
1818
19-
`moreLikeThis=[key]` is a query parameter in the [Search Documents API](/rest/api/searchservice/search-documents) that finds documents similar to the document specified by the document key. When a search request is made with `moreLikeThis`, a query is generated with search terms extracted from the given document that describe that document best. The generated query is then used to make the search request. The `moreLikeThis` parameter cannot be used with the search parameter, `search=[string]`.
19+
`moreLikeThis=[key]` is a query parameter in the [Search Documents API](/rest/api/searchservice/search-documents) that finds documents similar to the document specified by the document key. When a search request is made with `moreLikeThis`, a query is generated with search terms extracted from the given document that describe that document best. The generated query is then used to make the search request. The `moreLikeThis` parameter can't be used with the search parameter, `search=[string]`.
2020

2121
By default, the contents of all top-level searchable fields are considered. If you want to specify particular fields instead, you can use the `searchFields` parameter.
2222

23-
`MoreLikeThis` on searchable sub-fields in a [complex type](search-howto-complex-data-types.md) is not supported. For indexes that have these types of fields, `searchFields` parameter must be used so that the top-level searchable fields are specified. For example, if the index has a searchable `field1` which is Edm.String and `field2` which is complex type with searchable sub-fields, the value of `searchFields` must be set to `field1` to exclude `field2`.
23+
The `moreLikeThis` parameter isn't supported for [complex types](search-howto-complex-data-types.md) and the presence of complex types will impact your query logic. If your index is a complex type, you must set `searchFields` to the top-level searchable fields over which `moreLikeThis` iterates. For example, if the index has a searchable `field1` of type `Edm.String`, and `field2` that's a complex type with searchable subfields, the value of `searchFields` must be set to `field1` to exclude `field2`.
2424

2525
## Examples
2626

0 commit comments

Comments
 (0)