You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-concept-intro.md
+22-22Lines changed: 22 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,13 +13,13 @@ ms.custom: references_regions
13
13
---
14
14
# AI enrichment in Azure Cognitive Search
15
15
16
-
In Azure Cognitive Search, AI enrichment refers to a pipeline process that adds machine learning to [indexer-based indexing](search-indexer-overview.md). Steps in the pipeline create information where none previously existed. For example, steps in the pipeline can extract information from images, detect sentiment or key phrases from chunks of text, and recognize entities, to name a few. These processes transform unsearchable content into searchable text, for full text search and knowledge mining scenarios.
16
+
In Azure Cognitive Search, AI enrichment refers to a pipeline process that adds machine learning to [indexer-based indexing](search-indexer-overview.md). Steps in the pipeline create information where none previously existed. For example, steps in the pipeline can extract information from images, detect sentiment or key phrases from chunks of text, and recognize entities. These processes transform unsearchable content into searchable text, for full text search and knowledge mining scenarios.
17
17
18
-
[**Azure Blob Storage**](../storage/blobs/storage-blobs-overview.md) is a frequently used input, but any supported data source can provide the initial content. A [**skillset**](cognitive-search-working-with-skillsets.md), attached to an indexer, adds the AI processing. The indexer extracts content and sets up the pipeline. The AI processing identifies, analyzes, and creates new information out of blobs, images, and raw text. Output is always a [**search index**](search-what-is-an-index.md), and optionally a [**knowledge store**](knowledge-store-concept-intro.md).
18
+
[**Azure Blob Storage**](../storage/blobs/storage-blobs-overview.md) is a frequently used input, but any supported data source can provide the initial content. A [**skillset**](cognitive-search-working-with-skillsets.md), attached to an indexer, adds the AI processing. The indexer extracts content and sets up the pipeline. The AI processing identifies, analyzes, and creates information out of blob, image, and raw text inputs. Output is always a [**search index**](search-what-is-an-index.md), and optionally a [**knowledge store**](knowledge-store-concept-intro.md).
Skillsets are composed of built-in skills from Cognitive Search or [*custom skills*](cognitive-search-create-custom-skill-example.md) for external processing that you provide. Custom skills might sound complex but can be simple and straightforward in terms of implementation. If you have existing packages that provide pattern matching or document classification models, the content you extract during indexing could be passed to these models for processing.
22
+
Skillsets are composed of [*built-in skills*](cognitive-search-predefined-skills.md) from Cognitive Search or [*custom skills*](cognitive-search-create-custom-skill-example.md) for external processing that you provide. Custom skills are not always complex. For example, if you have existing packages that provide pattern matching or document classification models, you can wrap them in a custom skill.
23
23
24
24
Built-in skills fall into these categories:
25
25
@@ -29,47 +29,47 @@ Built-in skills fall into these categories:
29
29
30
30
+**Natural language processing** skills include [entity recognition](cognitive-search-skill-entity-recognition-v3.md), [language detection](cognitive-search-skill-language-detection.md), [key phrase extraction](cognitive-search-skill-keyphrases.md), text manipulation, [sentiment detection (including opinion mining)](cognitive-search-skill-sentiment-v3.md), and [personal identifiable information detection](cognitive-search-skill-pii-detection.md). With these skills, unstructured text is mapped as searchable and filterable fields in an index.
31
31
32
-
Built-in skills are based on pre-trained machine learning models in Cognitive Services APIs: [Computer Vision](../cognitive-services/computer-vision/index.yml) and [Language Service](../cognitive-services/language-service/overview.md). You should [attach a billable Cognitive Services resource](cognitive-search-attach-cognitive-services.md) if you want these resources for larger workloads.
33
-
34
-
Natural language and image processing is applied during the data ingestion phase, with results becoming part of a document's composition in a searchable index in Azure Cognitive Search. Data is sourced as an Azure data set and then pushed through an indexing pipeline using whichever [built-in skills](cognitive-search-predefined-skills.md) you need.
32
+
Built-in skills are based on the Cognitive Services APIs: [Computer Vision](../cognitive-services/computer-vision/index.yml) and [Language Service](../cognitive-services/language-service/overview.md). Unless your content input is small, expect to [attach a billable Cognitive Services resource](cognitive-search-attach-cognitive-services.md) to run larger workloads.
35
33
36
34
## Availability and pricing
37
35
38
-
AI enrichment is available in regions where Azure Cognitive Services is also available. You can check the current availability of AI enrichment on the [Azure products available by region](https://azure.microsoft.com/global-infrastructure/services/?products=search) page. AI enrichment is available in all supported regions except:
36
+
AI enrichment is available in regions that have Azure Cognitive Services. You can check the availability of AI enrichment on the [Azure products available by region](https://azure.microsoft.com/global-infrastructure/services/?products=search) page. AI enrichment is available in all regions except:
39
37
40
38
+ Australia Southeast
41
39
+ China North 2
42
40
+ Germany West Central
43
41
44
-
If your search service is located in one of these regions, you won't be able to create and use skillsets, but all other search service functionality is available and fully supported.
45
-
46
-
Billing is under a pay-as-you-go pricing model. The costs of using built-in skills are passed on to the customer when you provide a multi-region Cognitive Services key. There are also costs associated with image extraction, as metered by Cognitive Search. Text extraction and utility skills aren't billable. For more information, see [How you're charged for Azure Cognitive Search](search-sku-manage-costs.md#how-youre-charged-for-azure-cognitive-search).
42
+
Billing follows a pay-as-you-go pricing model. The costs of using built-in skills are passed on when a multi-region Cognitive Services key is specified in the skillset. There are also costs associated with image extraction, as metered by Cognitive Search. Text extraction and utility skills, however, aren't billable. For more information, see [How you're charged for Azure Cognitive Search](search-sku-manage-costs.md#how-youre-charged-for-azure-cognitive-search).
47
43
48
44
## When to use AI enrichment
49
45
50
-
Enrichment is useful if your raw content is unstructured text, image content, or content that needs language detection and translation. Applying AI through the built-in cognitive skills can unlock this content for full text search and data science applications.
46
+
Enrichment is useful if raw content is unstructured text, image content, or content that needs language detection and translation. Applying AI through the built-in cognitive skills can unlock this content for full text search and data science applications.
51
47
52
-
Enrichment also helps if you want to integrate external processing. Open-source, third-party, or first-party code can be integrated into the pipeline as a custom skill. Classification models that identify salient characteristics of various document types fall into this category, but any external package that adds value to your content could be used.
48
+
Enrichment also unlocks external processing. Open-source, third-party, or first-party code can be integrated into the pipeline as a custom skill. Classification models that identify salient characteristics of various document types fall into this category, but any external package that adds value to your content could be used.
53
49
54
50
### Use-cases for built-in skills
55
51
56
52
A [skillset](cognitive-search-defining-skillset.md) that's assembled using built-in skills is well suited for the following application scenarios:
57
53
58
-
+[Optical Character Recognition (OCR)](cognitive-search-skill-ocr.md) that recognizes typeface and handwritten text in scanned documents (JPEG) is perhaps the most commonly used skill. Attaching the OCR skill will identify, extract, and ingest text from JPEG files.
54
+
+[Optical Character Recognition (OCR)](cognitive-search-skill-ocr.md) that recognizes typeface and handwritten text in scanned documents (JPEG) is perhaps the most commonly used skill.
59
55
60
-
+[Text translation](cognitive-search-skill-text-translation.md) of multilingual content is another commonly used skill. Language detection is built into Text Translation, but you can also run [Language Detection](cognitive-search-skill-language-detection.md)independently if you just want the language codes of the content in your corpus.
56
+
+[Text translation](cognitive-search-skill-text-translation.md) of multilingual content is another commonly used skill. Language detection is built into Text Translation, but you can also run [Language Detection](cognitive-search-skill-language-detection.md)as a separate skill to output a language code for each chunk of content.
61
57
62
-
+ PDFs with combined image and text. Text in PDFs can be extracted during indexing without the use of enrichment steps, but the addition of image and natural language processing can often produce a better outcome than a standard indexing provides.
58
+
+ PDFs with combined image and text. Embedded text can be extracted without AI enrichment, but adding image and language skills can unlock more information than what could be obtained through standard text-based indexing.
63
59
64
60
+ Unstructured or semi-structured documents containing content that has inherent meaning or context that is hidden in the larger document.
65
61
66
-
Blobs in particular often contain a large body of content that is packed into a single "field". By attaching image and natural language processing skills to an indexer, you can create new information that is extant in the raw content, but not otherwise surfaced as distinct fields. Some ready-to-use built-in cognitive skills that can help: [Key Phrase Extraction](cognitive-search-skill-keyphrases.md) and [Entity Recognition](cognitive-search-skill-entity-recognition-v3.md) (people, organizations, and locations to name a few).
62
+
Blobs in particular often contain a large body of content that is packed into a single "field". By attaching image and natural language processing skills to an indexer, you can create information that is extant in the raw content, but not otherwise surfaced as distinct fields. Some ready-to-use built-in cognitive skills that can help: [Key Phrase Extraction](cognitive-search-skill-keyphrases.md) and [Entity Recognition](cognitive-search-skill-entity-recognition-v3.md) (people, organizations, and locations to name a few).
67
63
68
64
Additionally, built-in skills can also be used restructure content through text split, merge, and shape operations.
69
65
70
66
### Use-cases for custom skills
71
67
72
-
Custom skills can support more complex scenarios, such as recognizing forms, or custom entity detection using a model that you provide and wrap in the [custom skill web interface](cognitive-search-custom-skill-interface.md). Several examples of custom skills include [Forms Recognizer](../applied-ai-services/form-recognizer/overview.md), integration of the [Bing Entity Search API](./cognitive-search-create-custom-skill-example.md), and [custom entity recognition](https://github.com/Microsoft/SkillsExtractorCognitiveSearch).
68
+
Custom skills can support more complex scenarios, such as recognizing forms, or custom entity detection using a model that you provide and wrap in the [custom skill web interface](cognitive-search-custom-skill-interface.md). Several examples of custom skills include:
@@ -109,7 +109,7 @@ In Azure Cognitive Search, an indexer saves the output it creates.
109
109
110
110
A [**searchable index**](search-what-is-an-index.md) is one of the outputs that is always created by an indexer. Specification of an index is an indexer requirement, and when you attach a skillset, the output of the skillset, plus any fields that are mapped directly from the source, are used to populate the index. Usually, the outputs of specific skills, such as key phrases or sentiment scores, are ingested into the index in fields created for that purpose.
111
111
112
-
A [**knowledge store**](knowledge-store-concept-intro.md) is an optional output, used for downstream apps like knowledge mining. A knowledge store is defined within a skillset. Its definition determines whether your enriched documents are projected as tables or objects (files or blobs). Tabular projections are well suited for interactive analysis in tools like Power BI, whereas files and blobs are typically used in data science or similar processes.
112
+
A [**knowledge store**](knowledge-store-concept-intro.md) is an optional output, used for downstream apps like knowledge mining. A knowledge store is defined within a skillset. Its definition determines whether your enriched documents are projected as tables or objects (files or blobs). Tabular projections are recommended for interactive analysis in tools like Power BI. Files and blobs are typically used in data science or similar workloads.
113
113
114
114
Finally, an indexer can [**cache enriched documents**](cognitive-search-incremental-indexing-conceptual.md) in Azure Blob Storage for potential reuse in subsequent skillset executions. The cache is for internal use. Cached enrichments are consumable by the same skillset that you rerun at a later date. Caching is helpful if your skillset include image analysis or OCR, and you want to avoid the time and expense of reprocessing image files.
115
115
@@ -129,23 +129,23 @@ In Azure Storage, a [knowledge store](knowledge-store-concept-intro.md) can assu
129
129
130
130
+ A blob container captures enriched documents in their entirety, which is useful if you're creating a feed into other processes.
131
131
132
-
+ A table is useful if you need slices of enriched documents, or if you want to include or exclude specific parts of the output. For analysis in Power BI, tables are the recommended data source for exploring and visualizing content in Power BI.
132
+
+ A table is useful if you need slices of enriched documents, or if you want to include or exclude specific parts of the output. For analysis in Power BI, tables are the recommended data source for data exploration and visualization in Power BI.
133
133
134
134
## Checklist: A typical workflow
135
135
136
-
1. When beginning a project, it's helpful to work with a subset of data. Indexer and skillset design is an iterative process, and you'll iterate more quickly if you're working with a small representative data set.
136
+
1. When beginning a project, it's helpful to work with a subset of data. Indexer and skillset design is an iterative process, and the work goes faster with a small representative data set.
137
137
138
138
1. Create a [data source](/rest/api/searchservice/create-data-source) that specifies a connection to your data.
139
139
140
140
1. Create a [skillset](/rest/api/searchservice/create-skillset) to add enrichment.
141
141
142
142
1. Create an [index schema](/rest/api/searchservice/create-index) that defines a search index.
143
143
144
-
1. Create an [indexer](/rest/api/searchservice/create-indexer) to bring all of the above components together. Creating or running indexer retrieves the data, runs the skillset, and loads the index.
144
+
1. Create an [indexer](/rest/api/searchservice/create-indexer) to bring all of the above components together. This step retrieves the data, runs the skillset, and loads the index.
145
145
146
146
1. Run queries to evaluate results and modify code to update skillsets, schema, or indexer configuration.
147
147
148
-
To iterate over the above steps, [reset the indexer](search-howto-reindex.md) before rebuilding the pipeline, or delete and recreate the objects on each run (recommended if you are using the free tier). You should also [enable enrichment caching](cognitive-search-incremental-indexing-conceptual.md) to reuse existing enrichments wherever possible.
148
+
To repeat any of the above steps, [reset the indexer](search-howto-reindex.md) before you run it. Or, delete and recreate the objects on each run (recommended if you are using the free tier). You should also [enable enrichment caching](cognitive-search-incremental-indexing-conceptual.md) to reuse existing enrichments wherever possible.
0 commit comments