Skip to content

Commit 96c8d8d

Browse files
authored
Merge pull request #278522 from gmndrg/main
Supportability requested updates in documentation
2 parents 9c63703 + 525f4cd commit 96c8d8d

5 files changed

+75
-5
lines changed

articles/search/cognitive-search-incremental-indexing-conceptual.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.service: cognitive-search
88
ms.custom:
99
- ignite-2023
1010
ms.topic: conceptual
11-
ms.date: 02/16/2024
11+
ms.date: 06/18/2024
1212
---
1313

1414
# Incremental enrichment and caching in Azure AI Search
@@ -18,6 +18,8 @@ ms.date: 02/16/2024
1818
1919
*Incremental enrichment* refers to the use of cached enrichments during [skillset execution](cognitive-search-working-with-skillsets.md) so that only new and changed skills and documents incur pay-as-you-go processing charges for API calls to Azure AI services. The cache contains the output from [document cracking](search-indexer-overview.md#document-cracking), plus the outputs of each skill for every document. Although caching is billable (it uses Azure Storage), the overall cost of enrichment is reduced because the costs of storage are less than image extraction and AI processing.
2020

21+
To ensure synchronization between your data source data and your index, it's important to understand your unique [data source](search-data-sources-gallery.md) change and deletion tracking prerequisites. This guide specifically addresses how to manage incremental modifications in terms of your skills processing and how to utilize cache for this purpose.
22+
2123
When you enable caching, the indexer evaluates your updates to determine whether existing enrichments can be pulled from the cache. Image and text content from the document cracking phase, plus skill outputs that are upstream or orthogonal to your edits, are likely to be reusable.
2224

2325
After skillset processing is finished, the refreshed results are written back to the cache, and also to the search index or knowledge store.

articles/search/search-data-sources-gallery.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.custom:
1010
- ignite-2023
1111
ms.topic: conceptual
1212
layout: LandingPage
13-
ms.date: 05/22/2024
13+
ms.date: 06/18/2024
1414
---
1515

1616
# Data sources gallery
@@ -21,6 +21,11 @@ Find a data connector from Microsoft or a partner that works with [an indexer](s
2121
+ [Preview data sources by Azure AI Search](#preview)
2222
+ [Data sources from our Partners](#partners)
2323

24+
25+
> [!NOTE]
26+
> The connectors mentioned in this article don't represent the only methods for indexing data from data sources to AI Search, but low/no-code options to accomplish this task. You have the option to develop your own connector utilizing the [Push REST API/SDK](search-what-is-data-import.md#pushing-data-to-an-index). This implies that provided you can programmatically extract data from a source, you can also employ the corresponding programmatic Push method to index your data.
27+
28+
2429
<a name="ga"></a>
2530

2631
## Generally available data sources by Azure AI Search

articles/search/search-howto-index-cosmosdb.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.custom:
1010
- devx-track-dotnet
1111
- ignite-2023
1212
ms.topic: how-to
13-
ms.date: 01/18/2024
13+
ms.date: 06/18/2024
1414
---
1515

1616
# Index data from Azure Cosmos DB for NoSQL for queries in Azure AI Search
@@ -303,6 +303,9 @@ The following example shows a [data source definition](#define-the-data-source)
303303
},
304304
```
305305

306+
> [!NOTE]
307+
> When you assign a `null` value to a field in your Azure Cosmos DB, the AI Search indexer is unable to distinguish between `null` and a missing field value. Therefore, if a field in the index is empty, it will not be substituted with a `null` value, even if that modification was specifically made in your database.
308+
306309
<a name="IncrementalProgress"></a>
307310

308311
### Incremental indexing and custom queries

articles/search/search-howto-indexing-azure-blob-storage.md

Lines changed: 56 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.service: cognitive-search
1010
ms.custom:
1111
- ignite-2023
1212
ms.topic: how-to
13-
ms.date: 05/04/2024
13+
ms.date: 06/17/2024
1414
---
1515

1616
# Index data from Azure Blob Storage
@@ -243,6 +243,61 @@ Once the index and data source have been created, you're ready to create the ind
243243

244244
An indexer runs automatically when it's created. You can prevent this by setting "disabled" to true. To control indexer execution, [run an indexer on demand](search-howto-run-reset-indexers.md) or [put it on a schedule](search-howto-schedule-indexers.md).
245245

246+
## Indexing data from multiple Azure Blob containers to a single index
247+
248+
Keep in mind that an indexer can only index data from a single container. If your requirement is to index data from multiple containers and consolidate it into a single AI Search index, this can be achieved by configuring multiple indexers, all directed to the same index. Please be aware of the [maximum number of indexers available per SKU](search-limits-quotas-capacity.md#indexer-limits).
249+
250+
To illustrate, let's consider an example of two indexers, pulling data from two distinct data sources, named `my-blob-datasource1` and `my-blob-datasource2`. Each data source points to a separate Azure Blob container, but both direct to the same index named `my-search-index`.
251+
252+
First indexer definition example:
253+
254+
```http
255+
POST https://[service name].search.windows.net/indexers?api-version=2023-11-01
256+
{
257+
"name" : "my-blob-indexer1",
258+
"dataSourceName" : "my-blob-datasource1",
259+
"targetIndexName" : "my-search-index",
260+
"parameters": {
261+
"batchSize": null,
262+
"maxFailedItems": null,
263+
"maxFailedItemsPerBatch": null,
264+
"base64EncodeKeys": null,
265+
"configuration": {
266+
"indexedFileNameExtensions" : ".pdf,.docx",
267+
"excludedFileNameExtensions" : ".png,.jpeg",
268+
"dataToExtract": "contentAndMetadata",
269+
"parsingMode": "default"
270+
}
271+
},
272+
"schedule" : { },
273+
"fieldMappings" : [ ]
274+
}
275+
```
276+
Second indexer definition that runs in parallel example:
277+
278+
```http
279+
POST https://[service name].search.windows.net/indexers?api-version=2023-11-01
280+
{
281+
"name" : "my-blob-indexer2",
282+
"dataSourceName" : "my-blob-datasource2",
283+
"targetIndexName" : "my-search-index",
284+
"parameters": {
285+
"batchSize": null,
286+
"maxFailedItems": null,
287+
"maxFailedItemsPerBatch": null,
288+
"base64EncodeKeys": null,
289+
"configuration": {
290+
"indexedFileNameExtensions" : ".pdf,.docx",
291+
"excludedFileNameExtensions" : ".png,.jpeg",
292+
"dataToExtract": "contentAndMetadata",
293+
"parsingMode": "default"
294+
}
295+
},
296+
"schedule" : { },
297+
"fieldMappings" : [ ]
298+
}
299+
```
300+
246301
## Check indexer status
247302

248303
To monitor the indexer status and execution history, send a [Get Indexer Status](/rest/api/searchservice/get-indexer-status) request:

articles/search/search-indexer-troubleshooting.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.service: cognitive-search
99
ms.custom:
1010
- ignite-2023
1111
ms.topic: conceptual
12-
ms.date: 01/11/2024
12+
ms.date: 06/17/2024
1313
---
1414

1515
# Indexer troubleshooting guidance for Azure AI Search
@@ -266,6 +266,11 @@ Conditions under which a document is processed twice is explained in the followi
266266

267267
In practice, this scenario only happens when on-demand indexers are manually invoked within minutes of each other, for certain data sources. It can result in mismatched numbers (like the indexer processed 345 documents total according to the indexer execution stats, but there are 340 documents in the data source and index) or potentially increased billing if you're running the same skills for the same document multiple times. Running an indexer using a schedule is the preferred recommendation.
268268

269+
## Parallel indexing
270+
271+
When multiple indexers are operating simultaneously, it's typical for some to enter a queue, waiting for available resources to begin execution. The number of indexers that can run concurrently depends on several factors. If the indexers are not linked with [skillsets](cognitive-search-working-with-skillsets.md), the capacity to run in parallel relies on the number of [replicas and partitions](search-capacity-planning.md#concepts-search-units-replicas-partitions) set up in the AI Search service.
272+
273+
On the other hand, if an indexer is associated with a skillset, it operates within the AI Search's internal clusters. The ability to run concurrently in this case is determined by the complexity of the skillset and whether other skillsets are running simultaneously. Built-in indexers are designed to reliably extract data from the source, so no data is missed if running on a schedule. However, it is expected that the indexer processes of parallelization and scaling out may require some time.
269274

270275
## Indexing documents with sensitivity labels
271276

0 commit comments

Comments
 (0)