You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-concept-intro.md
+29-29Lines changed: 29 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,13 +13,13 @@ ms.custom: references_regions
13
13
---
14
14
# AI enrichment in Azure Cognitive Search
15
15
16
-
In Azure Cognitive Search, AI enrichment refers to a pipeline process that adds machine learning to [indexer-based indexing](search-indexer-overview.md). Steps in the pipeline create new information where none previously existed: extracting information from images, detecting sentiment or key phrases from chunks of text, and recognizing entities, to name a few. All of these processes result in making previously unsearchable content available to full text search and knowledge mining scenarios.
16
+
In Azure Cognitive Search, AI enrichment refers to a pipeline process that adds machine learning to [indexer-based indexing](search-indexer-overview.md). Steps in the pipeline create information where none previously existed. For example, steps in the pipeline can extract information from images, detect sentiment or key phrases from chunks of text, and recognize entities. These processes transform unsearchable content into searchable text, for full text search and knowledge mining scenarios.
17
17
18
-
[**Azure Blob Storage**](../storage/blobs/storage-blobs-overview.md)) is the most commonly used input, but any indexer-supported data source can provide the initial content. A [**skillset**](cognitive-search-working-with-skillsets.md), attached to an indexer, adds the AI processing. The indexer extracts content and sets up the pipeline, while AI processing identifies, analyzes, and creates new information out of blobs, images, and raw text. Output is a [**search index**](search-what-is-an-index.md) or optional[**knowledge store**](knowledge-store-concept-intro.md).
18
+
[**Azure Blob Storage**](../storage/blobs/storage-blobs-overview.md) is a frequently used input, but any supported data source can provide the initial content. A [**skillset**](cognitive-search-working-with-skillsets.md), attached to an indexer, adds the AI processing. The indexer extracts content and sets up the pipeline. The AI processing identifies, analyzes, and creates information out of blob, image, and raw text inputs. Output is always a [**search index**](search-what-is-an-index.md), and optionally a[**knowledge store**](knowledge-store-concept-intro.md).
Skillsets are composed of built-in skills from Cognitive Search or [*custom skills*](cognitive-search-create-custom-skill-example.md) for external processing that you provide. Custom skills might sound complex but can be simple and straightforward in terms of implementation. If you have existing packages that provide pattern matching or document classification models, the content you extract during indexing could be passed to these models for processing.
22
+
Skillsets are composed of [*built-in skills*](cognitive-search-predefined-skills.md) from Cognitive Search or [*custom skills*](cognitive-search-create-custom-skill-example.md) for external processing that you provide. Custom skills are not always complex. For example, if you have existing packages that provide pattern matching or document classification models, you can wrap them in a custom skill.
23
23
24
24
Built-in skills fall into these categories:
25
25
@@ -29,45 +29,47 @@ Built-in skills fall into these categories:
29
29
30
30
+**Natural language processing** skills include [entity recognition](cognitive-search-skill-entity-recognition-v3.md), [language detection](cognitive-search-skill-language-detection.md), [key phrase extraction](cognitive-search-skill-keyphrases.md), text manipulation, [sentiment detection (including opinion mining)](cognitive-search-skill-sentiment-v3.md), and [personal identifiable information detection](cognitive-search-skill-pii-detection.md). With these skills, unstructured text is mapped as searchable and filterable fields in an index.
31
31
32
-
Built-in skills are based on pre-trained machine learning models in Cognitive Services APIs: [Computer Vision](../cognitive-services/computer-vision/index.yml) and [Language Service](../cognitive-services/language-service/overview.md). You should [attach a billable Cognitive Services resource](cognitive-search-attach-cognitive-services.md)if you want these resources for larger workloads.
32
+
Built-in skills are based on the Cognitive Services APIs: [Computer Vision](../cognitive-services/computer-vision/index.yml) and [Language Service](../cognitive-services/language-service/overview.md). Unless your content input is small, expect to [attach a billable Cognitive Services resource](cognitive-search-attach-cognitive-services.md)to run larger workloads.
33
33
34
-
Natural language and image processing is applied during the data ingestion phase, with results becoming part of a document's composition in a searchable index in Azure Cognitive Search. Data is sourced as an Azure data set and then pushed through an indexing pipeline using whichever [built-in skills](cognitive-search-predefined-skills.md) you need.
34
+
## Availability and pricing
35
35
36
-
## Feature availability
37
-
38
-
AI enrichment is available in regions where Azure Cognitive Services is also available. You can check the current availability of AI enrichment on the [Azure products available by region](https://azure.microsoft.com/global-infrastructure/services/?products=search) page. AI enrichment is available in all supported regions except:
36
+
AI enrichment is available in regions that have Azure Cognitive Services. You can check the availability of AI enrichment on the [Azure products available by region](https://azure.microsoft.com/global-infrastructure/services/?products=search) page. AI enrichment is available in all regions except:
39
37
40
38
+ Australia Southeast
41
39
+ China North 2
42
40
+ Germany West Central
43
41
44
-
If your search service is located in one of these regions, you will not be able to create and use skillsets, but all other search service functionality is available and fully supported.
42
+
Billing follows a pay-as-you-go pricing model. The costs of using built-in skills are passed on when a multi-region Cognitive Services key is specified in the skillset. There are also costs associated with image extraction, as metered by Cognitive Search. Text extraction and utility skills, however, aren't billable. For more information, see [How you're charged for Azure Cognitive Search](search-sku-manage-costs.md#how-youre-charged-for-azure-cognitive-search).
45
43
46
44
## When to use AI enrichment
47
45
48
-
You should consider enrichment if your raw content is unstructured text, image content, or content that needs language detection and translation. Applying AI through the built-in cognitive skills can unlock this content for full text search and data science applications.
46
+
Enrichment is useful if raw content is unstructured text, image content, or content that needs language detection and translation. Applying AI through the built-in cognitive skills can unlock this content for full text search and data science applications.
49
47
50
-
Additionally, you might consider adding a custom skill if you have open-source, third-party, or first-party code that you'd like to integrate into the pipeline. Classification models that identify salient characteristics of various document types fall into this category, but any package that adds value to your content could be used.
48
+
Enrichment also unlocks external processing. Open-source, third-party, or first-party code can be integrated into the pipeline as a custom skill. Classification models that identify salient characteristics of various document types fall into this category, but any external package that adds value to your content could be used.
51
49
52
50
### Use-cases for built-in skills
53
51
54
52
A [skillset](cognitive-search-defining-skillset.md) that's assembled using built-in skills is well suited for the following application scenarios:
55
53
56
-
+[Optical Character Recognition (OCR)](cognitive-search-skill-ocr.md) that recognizes typeface and handwritten text in scanned documents (JPEG) is perhaps the most commonly used skill. Attaching the OCR skill will identify, extract, and ingest text from JPEG files.
54
+
+[Optical Character Recognition (OCR)](cognitive-search-skill-ocr.md) that recognizes typeface and handwritten text in scanned documents (JPEG) is perhaps the most commonly used skill.
57
55
58
-
+[Text translation](cognitive-search-skill-text-translation.md) of multilingual content is another commonly used skill. Language detection is built into Text Translation, but you can also run [Language Detection](cognitive-search-skill-language-detection.md)independently if you just want the language codes of the content in your corpus.
56
+
+[Text translation](cognitive-search-skill-text-translation.md) of multilingual content is another commonly used skill. Language detection is built into Text Translation, but you can also run [Language Detection](cognitive-search-skill-language-detection.md)as a separate skill to output a language code for each chunk of content.
59
57
60
-
+ PDFs with combined image and text. Text in PDFs can be extracted during indexing without the use of enrichment steps, but the addition of image and natural language processing can often produce a better outcome than a standard indexing provides.
58
+
+ PDFs with combined image and text. Embedded text can be extracted without AI enrichment, but adding image and language skills can unlock more information than what could be obtained through standard text-based indexing.
61
59
62
60
+ Unstructured or semi-structured documents containing content that has inherent meaning or context that is hidden in the larger document.
63
61
64
-
Blobs in particular often contain a large body of content that is packed into a single "field". By attaching image and natural language processing skills to an indexer, you can create new information that is extant in the raw content, but not otherwise surfaced as distinct fields. Some ready-to-use built-in cognitive skills that can help: [Key Phrase Extraction](cognitive-search-skill-keyphrases.md) and [Entity Recognition](cognitive-search-skill-entity-recognition-v3.md) (people, organizations, and locations to name a few).
62
+
Blobs in particular often contain a large body of content that is packed into a single "field". By attaching image and natural language processing skills to an indexer, you can create information that is extant in the raw content, but not otherwise surfaced as distinct fields. Some ready-to-use built-in cognitive skills that can help: [Key Phrase Extraction](cognitive-search-skill-keyphrases.md) and [Entity Recognition](cognitive-search-skill-entity-recognition-v3.md) (people, organizations, and locations to name a few).
65
63
66
64
Additionally, built-in skills can also be used restructure content through text split, merge, and shape operations.
67
65
68
66
### Use-cases for custom skills
69
67
70
-
Custom skills can support more complex scenarios, such as recognizing forms, or custom entity detection using a model that you provide and wrap in the [custom skill web interface](cognitive-search-custom-skill-interface.md). Several examples of custom skills include [Forms Recognizer](../applied-ai-services/form-recognizer/overview.md), integration of the [Bing Entity Search API](./cognitive-search-create-custom-skill-example.md), and [custom entity recognition](https://github.com/Microsoft/SkillsExtractorCognitiveSearch).
68
+
Custom skills can support more complex scenarios, such as recognizing forms, or custom entity detection using a model that you provide and wrap in the [custom skill web interface](cognitive-search-custom-skill-interface.md). Several examples of custom skills include:
@@ -97,55 +99,53 @@ Internally, the pipeline generates a collection of enriched documents. You can d
97
99
98
100
### Step 3: Indexing
99
101
100
-
Indexing is the process wherein raw and enriched content is ingested as fields in a search index, and as [projections](knowledge-store-projection-overview.md) if you are also creating a knowledge store. The same enriched content can appear in both, using implicit or explicit field mappings to send the content to the correct fields.
102
+
Indexing is the process wherein raw and enriched content is ingested as fields in a search index, and as [projections](knowledge-store-projection-overview.md) if you're also creating a knowledge store. The same enriched content can appear in both, using implicit or explicit field mappings to send the content to the correct fields.
101
103
102
104
Enriched content is generated during skillset execution, and is temporary unless you save it. In order for enriched content to appear in a search index, the indexer must have mapping information so that it can send enriched content to a field in a search index. [Output field mappings](cognitive-search-output-field-mapping.md) set up these associations.
103
105
104
-
## Saving enriched output
106
+
## Storing enriched output
105
107
106
108
In Azure Cognitive Search, an indexer saves the output it creates.
107
109
108
110
A [**searchable index**](search-what-is-an-index.md) is one of the outputs that is always created by an indexer. Specification of an index is an indexer requirement, and when you attach a skillset, the output of the skillset, plus any fields that are mapped directly from the source, are used to populate the index. Usually, the outputs of specific skills, such as key phrases or sentiment scores, are ingested into the index in fields created for that purpose.
109
111
110
-
A [**knowledge store**](knowledge-store-concept-intro.md) is an optional output, used for downstream apps like knowledge mining. A knowledge store is defined within a skillset. Its definition determines whether your enriched documents are projected as tables or objects (files or blobs). Tabular projections are well suited for interactive analysis in tools like Power BI, whereas files and blobs are typically used in data science or similar processes.
112
+
A [**knowledge store**](knowledge-store-concept-intro.md) is an optional output, used for downstream apps like knowledge mining. A knowledge store is defined within a skillset. Its definition determines whether your enriched documents are projected as tables or objects (files or blobs). Tabular projections are recommended for interactive analysis in tools like Power BI. Files and blobs are typically used in data science or similar workloads.
111
113
112
114
Finally, an indexer can [**cache enriched documents**](cognitive-search-incremental-indexing-conceptual.md) in Azure Blob Storage for potential reuse in subsequent skillset executions. The cache is for internal use. Cached enrichments are consumable by the same skillset that you rerun at a later date. Caching is helpful if your skillset include image analysis or OCR, and you want to avoid the time and expense of reprocessing image files.
113
115
114
116
Indexes and knowledge stores are fully independent of each other. While you must attach an index to satisfy indexer requirements, if your sole objective is a knowledge store, you can ignore the index after it's populated. Avoid deleting it though. If you want to rerun the indexer and skillset, you'll need the index in order for the indexer to run.
115
117
116
-
## Consume enriched content
118
+
## Consuming enriched content
117
119
118
120
The output of AI enrichment is either a [fully text-searchable index](search-what-is-an-index.md) on Azure Cognitive Search, or a [knowledge store](knowledge-store-concept-intro.md) in Azure Storage.
119
121
120
122
### Accessing content in a search index
121
123
122
-
[**Querying the index**](search-query-overview.md)is how developers and users access the enriched content generated by the pipeline. The index is like any other you might create for Azure Cognitive Search: you can supplement text analysis with custom analyzers, invoke fuzzy search queries, add filters, or experiment with scoring profiles to tune search relevance.
124
+
[*Run queries**](search-query-overview.md)to access the enriched content generated by the pipeline. The index is like any other you might create for Azure Cognitive Search: you can supplement text analysis with custom analyzers, invoke fuzzy search queries, add filters, or experiment with scoring profiles to tune search relevance.
123
125
124
126
### Accessing content in a knowledge store
125
127
126
-
In Azure Storage, a [knowledge store](knowledge-store-concept-intro.md) has two manifestations: a blob container of JSON document, a blob container of image objects, or tables in Table storage. You can use [Storage Browser](knowledge-store-view-storage-explorer.md), [Power BI](knowledge-store-connect-power-bi.md), or any app that connects to Azure Storage.
127
-
128
-
+ A blob container captures enriched documents in their entirety, which is useful if you want to feed into other processes.
128
+
In Azure Storage, a [knowledge store](knowledge-store-concept-intro.md) can assume the following forms: a blob container of JSON documents, a blob container of image objects, or tables in Table Storage. You can use [Storage Browser](knowledge-store-view-storage-explorer.md), [Power BI](knowledge-store-connect-power-bi.md), or any app that connects to Azure Storage to access your content.
129
129
130
-
+In contrast, Table storage can accommodate physical projections of enriched documents. You can create slices or layers of enriched documents that include or exclude specific parts. For analysis in Power BI, the tables in Azure Table Storage become the data source for further visualization and exploration.
130
+
+A blob container captures enriched documentsin their entirety, which is useful if you're creating a feed into other processes.
131
131
132
-
An enriched document at the output of the pipeline differs from its original source input by the presence of additional fields containing new information that was extracted or generated during enrichment. As such, you can work with a combination of original and created content, regardless of which output structure you use.
132
+
+ A table is useful if you need slices of enriched documents, or if you want to include or exclude specific parts of the output. For analysis in Power BI, tables are the recommended data source for data exploration and visualization in Power BI.
133
133
134
134
## Checklist: A typical workflow
135
135
136
-
1. When beginning a project, it's helpful to work with a subset of data. Indexer and skillset design is an iterative process, and you'll iterate more quickly if you're working with a small representative data set.
136
+
1. When beginning a project, it's helpful to work with a subset of data. Indexer and skillset design is an iterative process, and the work goes faster with a small representative data set.
137
137
138
138
1. Create a [data source](/rest/api/searchservice/create-data-source) that specifies a connection to your data.
139
139
140
140
1. Create a [skillset](/rest/api/searchservice/create-skillset) to add enrichment.
141
141
142
142
1. Create an [index schema](/rest/api/searchservice/create-index) that defines a search index.
143
143
144
-
1. Create an [indexer](/rest/api/searchservice/create-indexer) to bring all of the above components together. Creating or running indexer retrieves the data, runs the skillset, and loads the index.
144
+
1. Create an [indexer](/rest/api/searchservice/create-indexer) to bring all of the above components together. This step retrieves the data, runs the skillset, and loads the index.
145
145
146
146
1. Run queries to evaluate results and modify code to update skillsets, schema, or indexer configuration.
147
147
148
-
To iterate over the above steps, [reset the indexer](search-howto-reindex.md) before rebuilding the pipeline, or delete and recreate the objects on each run (recommended if you are using the free tier). You should also [enable enrichment caching](cognitive-search-incremental-indexing-conceptual.md) to reuse existing enrichments wherever possible.
148
+
To repeat any of the above steps, [reset the indexer](search-howto-reindex.md) before you run it. Or, delete and recreate the objects on each run (recommended if you are using the free tier). You should also [enable enrichment caching](cognitive-search-incremental-indexing-conceptual.md) to reuse existing enrichments wherever possible.
0 commit comments