Skip to content

Commit 9b3c38e

Browse files
authored
Merge pull request #7488 from haileytap/regions
{Azure Search] Document AI enrichment caveats in 21Vianet regions
2 parents 2dfab49 + d0abc91 commit 9b3c38e

6 files changed

+65
-61
lines changed

articles/search/cognitive-search-concept-intro.md

Lines changed: 30 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -9,29 +9,31 @@ ms.service: azure-ai-search
99
ms.custom:
1010
- ignite-2023
1111
ms.topic: conceptual
12-
ms.date: 09/23/2025
12+
ms.date: 10/06/2025
1313
ms.update-cycle: 180-days
1414
---
1515

1616
# AI enrichment in Azure AI Search
1717

18-
In Azure AI Search, *AI enrichment* refers to integration with [Azure AI services](/azure/ai-services/what-are-ai-services) to process content that isn't searchable in its raw form. Through enrichment, analysis and inference are used to create searchable content and structure where none previously existed.
18+
In Azure AI Search, *AI enrichment* refers to integration with [Azure AI services](/azure/ai-services/what-are-ai-services) to process content that isn't searchable in its raw form. Through enrichment, analysis and inference are used to create searchable content and structure where none previously existed.
1919

20-
Because Azure AI Search is used for text and vector queries, the purpose of AI enrichment is to improve the utility of your content in search-related scenarios. Raw content must be text or images (you can't enrich vectors), but the output of an enrichment pipeline can be vectorized and indexed in a search index using skills like [Text Split skill](cognitive-search-skill-textsplit.md) for chunking and [AzureOpenAIEmbedding skill](cognitive-search-skill-azure-openai-embedding.md) for vector encoding. For more information about using skills in vector scenarios, see [Integrated data chunking and embedding](vector-search-integrated-vectorization.md).
20+
Because Azure AI Search is used for text and vector queries, the purpose of AI enrichment is to improve the utility of your content in search-related scenarios. Raw content must be text or images (you can't enrich vectors), but the output of an enrichment pipeline can be vectorized and indexed in a search index using skills like [Text Split skill](cognitive-search-skill-textsplit.md) for chunking and [Azure OpenAI Embedding skill](cognitive-search-skill-azure-openai-embedding.md) for vector encoding. For more information about using skills in vector scenarios, see [Integrated data chunking and embedding](vector-search-integrated-vectorization.md).
2121

2222
AI enrichment is based on [*skills*](cognitive-search-working-with-skillsets.md).
2323

24-
Built-in skills tap Azure AI services. They apply the following transformations and processing to raw content:
24+
[Built-in skills](cognitive-search-predefined-skills.md) tap Azure AI services. They apply the following transformations and processing to raw content:
2525

26-
+ Translation and language detection for multi-lingual search
27-
+ Entity recognition to extract people names, places, and other entities from large chunks of text
28-
+ Key phrase extraction to identify and output important terms
29-
+ Optical Character Recognition (OCR) to recognize printed and handwritten text in binary files
30-
+ Image analysis to describe image content, and output the descriptions as searchable text fields
26+
+ Translation and language detection for multilingual search.
27+
+ Entity recognition to extract people names, places, and other entities from large chunks of text.
28+
+ Key phrase extraction to identify and output important terms.
29+
+ Optical character recognition (OCR) to recognize printed and handwritten text in binary files.
30+
+ Image analysis to describe image content and output the descriptions as searchable text fields.
31+
+ Text embeddings via Azure OpenAI for integrated vectorization.
32+
+ Multimodal embeddings via Azure AI Vision for text and image vectorization.
3133

32-
Custom skills run your external code. Custom skills can be used for any custom processing that you want to include in the pipeline.
34+
Custom skills run your external code. You can use custom skills for any custom processing you want to include in the pipeline.
3335

34-
AI enrichment is an extension of an [**indexer pipeline**](search-indexer-overview.md) that connects to Azure data sources. An enrichment pipeline has all of the components of an indexer pipeline (indexer, data source, index), plus a [**skillset**](cognitive-search-working-with-skillsets.md) that specifies atomic enrichment steps.
36+
AI enrichment is an extension of an [indexer pipeline](search-indexer-overview.md) that connects to Azure data sources. An enrichment pipeline has all of the components of an indexer pipeline (indexer, data source, index) and a [skillset](cognitive-search-working-with-skillsets.md) that specifies atomic enrichment steps.
3537

3638
The following diagram shows the progression of AI enrichment:
3739

@@ -41,9 +43,9 @@ The following diagram shows the progression of AI enrichment:
4143

4244
**Enrich & Index** covers most of the AI enrichment pipeline:
4345

44-
+ Enrichment starts when the indexer ["cracks documents"](search-indexer-overview.md#document-cracking) and extracts images and text. The kind of processing that occurs next depends on your data and which skills you've added to a skillset. If you have images, they can be forwarded to skills that perform image processing. Text content is queued for text and natural language processing. Internally, skills create an ["enriched document"](cognitive-search-working-with-skillsets.md#enrichment-tree) that collects the transformations as they occur.
46+
+ Enrichment starts when the indexer *[cracks documents](search-indexer-overview.md#document-cracking)* and extracts images and text. The type of processing that occurs next depends on your data and the skills you've added to a skillset. Images can be forwarded to [skills that perform image processing](cognitive-search-concept-image-scenarios.md). Text content is queued for text and natural language processing. Internally, skills create an *[enriched document](cognitive-search-working-with-skillsets.md#enrichment-tree)* that collects transformations as they occur.
4547

46-
+ Enriched content is generated during skillset execution, and is temporary unless you save it. You can enable an [enrichment cache](enrichment-cache-how-to-configure.md) to persist cracked documents and skill outputs for subsequent reuse during future skillset executions.
48+
+ Enriched content is generated during skillset execution and is temporary unless you save it. You can enable an [enrichment cache](enrichment-cache-how-to-configure.md) to persist skill outputs for reuse in future skillset executions.
4749

4850
+ To get content into a search index, the indexer must have mapping information for sending enriched content to target field. [Field mappings](search-indexer-field-mappings.md) (explicit or implicit) set the data path from source data to a search index. [Output field mappings](cognitive-search-output-field-mapping.md) set the data path from enriched documents to an index.
4951

@@ -53,9 +55,9 @@ The following diagram shows the progression of AI enrichment:
5355

5456
## When to use AI enrichment
5557

56-
Enrichment is useful if raw content is unstructured text, image content, or content that needs language detection and translation. Applying AI through the [**built-in skills**](cognitive-search-predefined-skills.md) can unlock this content for full text search and data science applications.
58+
Enrichment is useful if raw content is unstructured text, image content, or content that needs language detection and translation. Applying AI through the [built-in skills](cognitive-search-predefined-skills.md) can unlock this content for full-text search and data science applications.
5759

58-
You can also create [**custom skills**](cognitive-search-create-custom-skill-example.md) to provide external processing.
60+
You can also create [custom skills](cognitive-search-create-custom-skill-example.md) to provide external processing.
5961
Open-source, third-party, or first-party code can be integrated into the pipeline as a custom skill. Classification models that identify salient characteristics of various document types fall into this category, but any external package that adds value to your content could be used.
6062

6163
### Use-cases for built-in skills
@@ -64,33 +66,33 @@ Built-in skills are based on the Azure AI services APIs: [Azure AI Computer Visi
6466

6567
A [skillset](cognitive-search-defining-skillset.md) that's assembled using built-in skills is well suited for the following application scenarios:
6668

67-
+ **Image processing** skills include [Optical Character Recognition (OCR)](cognitive-search-skill-ocr.md) and identification of [visual features](cognitive-search-skill-image-analysis.md), such as facial detection, image interpretation, image recognition (famous people and landmarks), or attributes like image orientation. These skills create text representations of image content for full text search in Azure AI Search.
69+
+ **Image processing** skills include [Optical Character Recognition (OCR)](cognitive-search-skill-ocr.md) and identification of [visual features](cognitive-search-skill-image-analysis.md), such as facial detection, image interpretation, image recognition (famous people and landmarks), or attributes like image orientation. These skills create text representations of image content for full-text search in Azure AI Search.
6870

6971
+ **Machine translation** is provided by the [Text Translation](cognitive-search-skill-text-translation.md) skill, often paired with [language detection](cognitive-search-skill-language-detection.md) for multi-language solutions.
7072

71-
+ **Natural language processing** analyzes chunks of text. Skills in this category include [Entity Recognition](cognitive-search-skill-entity-recognition-v3.md), [Sentiment Detection (including opinion mining)](cognitive-search-skill-sentiment-v3.md), and [Personal Identifiable Information Detection](cognitive-search-skill-pii-detection.md). With these skills, unstructured text is mapped as searchable and filterable fields in an index.
73+
+ **Natural language processing** analyzes chunks of text. Skills in this category include [Entity Recognition](cognitive-search-skill-entity-recognition-v3.md), [Sentiment Detection (including opinion mining)](cognitive-search-skill-sentiment-v3.md), and [Personal Identifiable Information Detection](cognitive-search-skill-pii-detection.md). With these skills, unstructured text is mapped as searchable and filterable fields in an index.
7274

7375
### Use-cases for custom skills
7476

75-
[**Custom skills**](cognitive-search-create-custom-skill-example.md) execute external code that you provide and wrap in the [custom skill web interface](cognitive-search-custom-skill-interface.md). Several examples of custom skills can be found in the [azure-search-power-skills](https://github.com/Azure-Samples/azure-search-power-skills/blob/main/README.md) GitHub repository.
77+
[Custom skills](cognitive-search-create-custom-skill-example.md) execute external code that you provide and wrap in the [custom skill web interface](cognitive-search-custom-skill-interface.md). Several examples of custom skills can be found in the [azure-search-power-skills](https://github.com/Azure-Samples/azure-search-power-skills/blob/main/README.md) GitHub repository.
7678

77-
Custom skills aren’t always complex. For example, if you have an existing package that provides pattern matching or a document classification model, you can wrap it in a custom skill.
79+
Custom skills aren’t always complex. For example, if you have an existing package that provides pattern matching or a document classification model, you can wrap it in a custom skill.
7880

7981
## Storing output
8082

8183
In Azure AI Search, an indexer saves the output it creates. A single indexer run can create up to three data structures that contain enriched and indexed output.
8284

8385
| Data store | Required | Location | Description |
8486
|------------|----------|----------|-------------|
85-
| [**searchable index**](search-what-is-an-index.md) | Required | Search service | Used for full text search and other query forms. Specifying an index is an indexer requirement. Index content is populated from skill outputs, plus any source fields that are mapped directly to fields in the index. |
86-
| [**knowledge store**](knowledge-store-concept-intro.md) | Optional | Azure Storage | Used for downstream apps like knowledge mining or data science. A knowledge store is defined within a skillset. Its definition determines whether your enriched documents are projected as tables or objects (files or blobs) in Azure Storage. |
87-
| [**enrichment cache**](enrichment-cache-how-to-configure.md) | Optional | Azure Storage | Used for caching enrichments for reuse in subsequent skillset executions. The cache stores imported, unprocessed content (cracked documents). It also stores the enriched documents created during skillset execution. Caching is helpful if you're using image analysis or OCR, and you want to avoid the time and expense of reprocessing image files. |
87+
| [searchable index](search-what-is-an-index.md) | Required | Search service | Used for full-text search and other query forms. Specifying an index is an indexer requirement. Index content is populated from skill outputs, plus any source fields that are mapped directly to fields in the index. |
88+
| [knowledge store](knowledge-store-concept-intro.md) | Optional | Azure Storage | Used for downstream apps like knowledge mining, data science, and multimodal search. A knowledge store is defined within a skillset. Its definition determines whether your enriched documents are projected as tables or objects (files or blobs) in Azure Storage. For [multimodal search scenarios](multimodal-search-overview.md#how-multimodal-search-works-in-azure-ai-search), you can save extracted images to the knowledge store and reference them at query time, allowing the images to be returned directly to client apps. |
89+
| [enrichment cache](enrichment-cache-how-to-configure.md) | Optional | Azure Storage | Used for caching enrichments for reuse in subsequent skillset executions. The cache stores imported, unprocessed content (cracked documents). It also stores the enriched documents created during skillset execution. Caching is helpful if you're using image analysis or OCR, and you want to avoid the time and expense of reprocessing image files. |
8890

8991
Indexes and knowledge stores are fully independent of each other. While you must attach an index to satisfy indexer requirements, if your sole objective is a knowledge store, you can ignore the index after it's populated.
9092

9193
## Exploring content
9294

93-
After you've defined and loaded a [search index](search-what-is-an-index.md) or a [knowledge store](knowledge-store-concept-intro.md), you can explore its data.
95+
After you define and load a [search index](search-what-is-an-index.md) or [knowledge store](knowledge-store-concept-intro.md), you can explore its data.
9496

9597
### Query a search index
9698

@@ -100,15 +102,15 @@ After you've defined and loaded a [search index](search-what-is-an-index.md) or
100102

101103
In Azure Storage, a [knowledge store](knowledge-store-concept-intro.md) can assume the following forms: a blob container of JSON documents, a blob container of image objects, or tables in Table Storage. You can use [Storage Explorer](/azure/vs-azure-tools-storage-manage-with-storage-explorer), [Power BI](knowledge-store-connect-power-bi.md), or any app that connects to Azure Storage to access your content.
102104

103-
+ A blob container captures enriched documents in their entirety, which is useful if you're creating a feed into other processes.
105+
+ A blob container captures enriched documents in their entirety, which is useful if you're creating a feed into other processes.
104106

105107
+ A table is useful if you need slices of enriched documents, or if you want to include or exclude specific parts of the output. For analysis in Power BI, tables are the recommended data source for data exploration and visualization in Power BI.
106108

107109
## Availability and pricing
108110

109-
Enrichment is available in regions that have Azure AI services. You can check the availability of enrichment on the [regions list](search-region-support.md) page.
111+
AI enrichment is available in regions that offer Azure AI services. To check the availability of AI enrichment, see the [regions list](search-region-support.md).
110112

111-
Billing follows a Standard pricing model. The costs of using built-in skills are passed on when a multi-region Azure AI services key is specified in the skillset. There are also costs associated with image extraction, as metered by Azure AI Search. Text extraction and utility skills, however, aren't billable. For more information, see [How you're charged for Azure AI Search](search-sku-manage-costs.md#how-youre-charged-for-the-base-service).
113+
Billing follows a Standard pricing model. Costs associated with built-in skills are incurred when you specify an Azure OpenAI in Azure AI Foundry Models resource or Azure AI services multi-service resource key in the skillset. There are also costs associated with image extraction, as metered by Azure AI Search. However, text extraction and utility skills aren't billable. For more information, see [How you're charged for Azure AI Search](search-sku-manage-costs.md#how-youre-charged-for-the-base-service).
112114

113115
## Checklist: A typical workflow
114116

@@ -122,15 +124,15 @@ Start with a subset of data in a [supported data source](search-indexer-overview
122124

123125
1. [Create an index schema](search-how-to-create-search-index.md) that defines a search index.
124126

125-
1. [Create and run the indexer](search-howto-create-indexers.md) to bring all of the above components together. This step retrieves the data, runs the skillset, and loads the index.
127+
1. [Create and run the indexer](search-howto-create-indexers.md) to bring all of the previous components together. This step retrieves the data, runs the skillset, and loads the index.
126128

127129
An indexer is also where you specify field mappings and output field mappings that set up the data path to a search index.
128130

129131
Optionally, [enable enrichment caching](enrichment-cache-how-to-configure.md) in the indexer configuration. This step allows you to reuse existing enrichments later on.
130132

131133
1. [Run queries](search-query-create.md) to evaluate results or [start a debug session](cognitive-search-how-to-debug-skillset.md) to work through any skillset issues.
132134

133-
To repeat any of the above steps, [reset the indexer](search-howto-reindex.md) before you run it. Or, delete and recreate the objects on each run (recommended if youre using the free tier). If you enabled caching the indexer pulls from the cache if data is unchanged at the source, and if your edits to the pipeline don't invalidate the cache.
135+
To repeat any of the previous steps, [reset the indexer](search-howto-reindex.md) before you run it. Alternatively, you can delete and recreate the objects on each run (recommended if you're using the free tier). If you enabled caching, the indexer pulls from the cache if the source data is unchanged and if your edits to the pipeline don't invalidate the cache.
134136

135137
## Next steps
136138

0 commit comments

Comments
 (0)