Skip to content

Commit ccdf491

Browse files
committed
Boosting the acrolinx score
1 parent e85dc8f commit ccdf491

File tree

2 files changed

+35
-31
lines changed

2 files changed

+35
-31
lines changed

articles/search/cognitive-search-concept-intro.md

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -13,13 +13,13 @@ ms.custom: references_regions
1313
---
1414
# AI enrichment in Azure Cognitive Search
1515

16-
In Azure Cognitive Search, AI enrichment refers to a pipeline process that adds machine learning to [indexer-based indexing](search-indexer-overview.md). Steps in the pipeline create information where none previously existed. For example, steps in the pipeline can extract information from images, detect sentiment or key phrases from chunks of text, and recognize entities, to name a few. These processes transform unsearchable content into searchable text, for full text search and knowledge mining scenarios.
16+
In Azure Cognitive Search, AI enrichment refers to a pipeline process that adds machine learning to [indexer-based indexing](search-indexer-overview.md). Steps in the pipeline create information where none previously existed. For example, steps in the pipeline can extract information from images, detect sentiment or key phrases from chunks of text, and recognize entities. These processes transform unsearchable content into searchable text, for full text search and knowledge mining scenarios.
1717

18-
[**Azure Blob Storage**](../storage/blobs/storage-blobs-overview.md) is a frequently used input, but any supported data source can provide the initial content. A [**skillset**](cognitive-search-working-with-skillsets.md), attached to an indexer, adds the AI processing. The indexer extracts content and sets up the pipeline. The AI processing identifies, analyzes, and creates new information out of blobs, images, and raw text. Output is always a [**search index**](search-what-is-an-index.md), and optionally a [**knowledge store**](knowledge-store-concept-intro.md).
18+
[**Azure Blob Storage**](../storage/blobs/storage-blobs-overview.md) is a frequently used input, but any supported data source can provide the initial content. A [**skillset**](cognitive-search-working-with-skillsets.md), attached to an indexer, adds the AI processing. The indexer extracts content and sets up the pipeline. The AI processing identifies, analyzes, and creates information out of blob, image, and raw text inputs. Output is always a [**search index**](search-what-is-an-index.md), and optionally a [**knowledge store**](knowledge-store-concept-intro.md).
1919

2020
![Enrichment pipeline diagram](./media/cognitive-search-intro/cogsearch-architecture.png "enrichment pipeline overview")
2121

22-
Skillsets are composed of built-in skills from Cognitive Search or [*custom skills*](cognitive-search-create-custom-skill-example.md) for external processing that you provide. Custom skills might sound complex but can be simple and straightforward in terms of implementation. If you have existing packages that provide pattern matching or document classification models, the content you extract during indexing could be passed to these models for processing.
22+
Skillsets are composed of [*built-in skills*](cognitive-search-predefined-skills.md) from Cognitive Search or [*custom skills*](cognitive-search-create-custom-skill-example.md) for external processing that you provide. Custom skills are not always complex. For example, if you have existing packages that provide pattern matching or document classification models, you can wrap them in a custom skill.
2323

2424
Built-in skills fall into these categories:
2525

@@ -29,47 +29,47 @@ Built-in skills fall into these categories:
2929

3030
+ **Natural language processing** skills include [entity recognition](cognitive-search-skill-entity-recognition-v3.md), [language detection](cognitive-search-skill-language-detection.md), [key phrase extraction](cognitive-search-skill-keyphrases.md), text manipulation, [sentiment detection (including opinion mining)](cognitive-search-skill-sentiment-v3.md), and [personal identifiable information detection](cognitive-search-skill-pii-detection.md). With these skills, unstructured text is mapped as searchable and filterable fields in an index.
3131

32-
Built-in skills are based on pre-trained machine learning models in Cognitive Services APIs: [Computer Vision](../cognitive-services/computer-vision/index.yml) and [Language Service](../cognitive-services/language-service/overview.md). You should [attach a billable Cognitive Services resource](cognitive-search-attach-cognitive-services.md) if you want these resources for larger workloads.
33-
34-
Natural language and image processing is applied during the data ingestion phase, with results becoming part of a document's composition in a searchable index in Azure Cognitive Search. Data is sourced as an Azure data set and then pushed through an indexing pipeline using whichever [built-in skills](cognitive-search-predefined-skills.md) you need.
32+
Built-in skills are based on the Cognitive Services APIs: [Computer Vision](../cognitive-services/computer-vision/index.yml) and [Language Service](../cognitive-services/language-service/overview.md). Unless your content input is small, expect to [attach a billable Cognitive Services resource](cognitive-search-attach-cognitive-services.md) to run larger workloads.
3533

3634
## Availability and pricing
3735

38-
AI enrichment is available in regions where Azure Cognitive Services is also available. You can check the current availability of AI enrichment on the [Azure products available by region](https://azure.microsoft.com/global-infrastructure/services/?products=search) page. AI enrichment is available in all supported regions except:
36+
AI enrichment is available in regions that have Azure Cognitive Services. You can check the availability of AI enrichment on the [Azure products available by region](https://azure.microsoft.com/global-infrastructure/services/?products=search) page. AI enrichment is available in all regions except:
3937

4038
+ Australia Southeast
4139
+ China North 2
4240
+ Germany West Central
4341

44-
If your search service is located in one of these regions, you won't be able to create and use skillsets, but all other search service functionality is available and fully supported.
45-
46-
Billing is under a pay-as-you-go pricing model. The costs of using built-in skills are passed on to the customer when you provide a multi-region Cognitive Services key. There are also costs associated with image extraction, as metered by Cognitive Search. Text extraction and utility skills aren't billable. For more information, see [How you're charged for Azure Cognitive Search](search-sku-manage-costs.md#how-youre-charged-for-azure-cognitive-search).
42+
Billing follows a pay-as-you-go pricing model. The costs of using built-in skills are passed on when a multi-region Cognitive Services key is specified in the skillset. There are also costs associated with image extraction, as metered by Cognitive Search. Text extraction and utility skills, however, aren't billable. For more information, see [How you're charged for Azure Cognitive Search](search-sku-manage-costs.md#how-youre-charged-for-azure-cognitive-search).
4743

4844
## When to use AI enrichment
4945

50-
Enrichment is useful if your raw content is unstructured text, image content, or content that needs language detection and translation. Applying AI through the built-in cognitive skills can unlock this content for full text search and data science applications.
46+
Enrichment is useful if raw content is unstructured text, image content, or content that needs language detection and translation. Applying AI through the built-in cognitive skills can unlock this content for full text search and data science applications.
5147

52-
Enrichment also helps if you want to integrate external processing. Open-source, third-party, or first-party code can be integrated into the pipeline as a custom skill. Classification models that identify salient characteristics of various document types fall into this category, but any external package that adds value to your content could be used.
48+
Enrichment also unlocks external processing. Open-source, third-party, or first-party code can be integrated into the pipeline as a custom skill. Classification models that identify salient characteristics of various document types fall into this category, but any external package that adds value to your content could be used.
5349

5450
### Use-cases for built-in skills
5551

5652
A [skillset](cognitive-search-defining-skillset.md) that's assembled using built-in skills is well suited for the following application scenarios:
5753

58-
+ [Optical Character Recognition (OCR)](cognitive-search-skill-ocr.md) that recognizes typeface and handwritten text in scanned documents (JPEG) is perhaps the most commonly used skill. Attaching the OCR skill will identify, extract, and ingest text from JPEG files.
54+
+ [Optical Character Recognition (OCR)](cognitive-search-skill-ocr.md) that recognizes typeface and handwritten text in scanned documents (JPEG) is perhaps the most commonly used skill.
5955

60-
+ [Text translation](cognitive-search-skill-text-translation.md) of multilingual content is another commonly used skill. Language detection is built into Text Translation, but you can also run [Language Detection](cognitive-search-skill-language-detection.md) independently if you just want the language codes of the content in your corpus.
56+
+ [Text translation](cognitive-search-skill-text-translation.md) of multilingual content is another commonly used skill. Language detection is built into Text Translation, but you can also run [Language Detection](cognitive-search-skill-language-detection.md) as a separate skill to output a language code for each chunk of content.
6157

62-
+ PDFs with combined image and text. Text in PDFs can be extracted during indexing without the use of enrichment steps, but the addition of image and natural language processing can often produce a better outcome than a standard indexing provides.
58+
+ PDFs with combined image and text. Embedded text can be extracted without AI enrichment, but adding image and language skills can unlock more information than what could be obtained through standard text-based indexing.
6359

6460
+ Unstructured or semi-structured documents containing content that has inherent meaning or context that is hidden in the larger document.
6561

66-
Blobs in particular often contain a large body of content that is packed into a single "field". By attaching image and natural language processing skills to an indexer, you can create new information that is extant in the raw content, but not otherwise surfaced as distinct fields. Some ready-to-use built-in cognitive skills that can help: [Key Phrase Extraction](cognitive-search-skill-keyphrases.md) and [Entity Recognition](cognitive-search-skill-entity-recognition-v3.md) (people, organizations, and locations to name a few).
62+
Blobs in particular often contain a large body of content that is packed into a single "field". By attaching image and natural language processing skills to an indexer, you can create information that is extant in the raw content, but not otherwise surfaced as distinct fields. Some ready-to-use built-in cognitive skills that can help: [Key Phrase Extraction](cognitive-search-skill-keyphrases.md) and [Entity Recognition](cognitive-search-skill-entity-recognition-v3.md) (people, organizations, and locations to name a few).
6763

6864
Additionally, built-in skills can also be used restructure content through text split, merge, and shape operations.
6965

7066
### Use-cases for custom skills
7167

72-
Custom skills can support more complex scenarios, such as recognizing forms, or custom entity detection using a model that you provide and wrap in the [custom skill web interface](cognitive-search-custom-skill-interface.md). Several examples of custom skills include [Forms Recognizer](../applied-ai-services/form-recognizer/overview.md), integration of the [Bing Entity Search API](./cognitive-search-create-custom-skill-example.md), and [custom entity recognition](https://github.com/Microsoft/SkillsExtractorCognitiveSearch).
68+
Custom skills can support more complex scenarios, such as recognizing forms, or custom entity detection using a model that you provide and wrap in the [custom skill web interface](cognitive-search-custom-skill-interface.md). Several examples of custom skills include:
69+
70+
+ [Forms Recognizer](../applied-ai-services/form-recognizer/overview.md)
71+
+ Integration of the [Bing Entity Search API](./cognitive-search-create-custom-skill-example.md)
72+
+ [Custom entity recognition](https://github.com/Microsoft/SkillsExtractorCognitiveSearch)
7373

7474
## Enrichment steps <a name="enrichment-steps"></a>
7575

@@ -109,7 +109,7 @@ In Azure Cognitive Search, an indexer saves the output it creates.
109109

110110
A [**searchable index**](search-what-is-an-index.md) is one of the outputs that is always created by an indexer. Specification of an index is an indexer requirement, and when you attach a skillset, the output of the skillset, plus any fields that are mapped directly from the source, are used to populate the index. Usually, the outputs of specific skills, such as key phrases or sentiment scores, are ingested into the index in fields created for that purpose.
111111

112-
A [**knowledge store**](knowledge-store-concept-intro.md) is an optional output, used for downstream apps like knowledge mining. A knowledge store is defined within a skillset. Its definition determines whether your enriched documents are projected as tables or objects (files or blobs). Tabular projections are well suited for interactive analysis in tools like Power BI, whereas files and blobs are typically used in data science or similar processes.
112+
A [**knowledge store**](knowledge-store-concept-intro.md) is an optional output, used for downstream apps like knowledge mining. A knowledge store is defined within a skillset. Its definition determines whether your enriched documents are projected as tables or objects (files or blobs). Tabular projections are recommended for interactive analysis in tools like Power BI. Files and blobs are typically used in data science or similar workloads.
113113

114114
Finally, an indexer can [**cache enriched documents**](cognitive-search-incremental-indexing-conceptual.md) in Azure Blob Storage for potential reuse in subsequent skillset executions. The cache is for internal use. Cached enrichments are consumable by the same skillset that you rerun at a later date. Caching is helpful if your skillset include image analysis or OCR, and you want to avoid the time and expense of reprocessing image files.
115115

@@ -129,23 +129,23 @@ In Azure Storage, a [knowledge store](knowledge-store-concept-intro.md) can assu
129129

130130
+ A blob container captures enriched documents in their entirety, which is useful if you're creating a feed into other processes.
131131

132-
+ A table is useful if you need slices of enriched documents, or if you want to include or exclude specific parts of the output. For analysis in Power BI, tables are the recommended data source for exploring and visualizing content in Power BI.
132+
+ A table is useful if you need slices of enriched documents, or if you want to include or exclude specific parts of the output. For analysis in Power BI, tables are the recommended data source for data exploration and visualization in Power BI.
133133

134134
## Checklist: A typical workflow
135135

136-
1. When beginning a project, it's helpful to work with a subset of data. Indexer and skillset design is an iterative process, and you'll iterate more quickly if you're working with a small representative data set.
136+
1. When beginning a project, it's helpful to work with a subset of data. Indexer and skillset design is an iterative process, and the work goes faster with a small representative data set.
137137

138138
1. Create a [data source](/rest/api/searchservice/create-data-source) that specifies a connection to your data.
139139

140140
1. Create a [skillset](/rest/api/searchservice/create-skillset) to add enrichment.
141141

142142
1. Create an [index schema](/rest/api/searchservice/create-index) that defines a search index.
143143

144-
1. Create an [indexer](/rest/api/searchservice/create-indexer) to bring all of the above components together. Creating or running indexer retrieves the data, runs the skillset, and loads the index.
144+
1. Create an [indexer](/rest/api/searchservice/create-indexer) to bring all of the above components together. This step retrieves the data, runs the skillset, and loads the index.
145145

146146
1. Run queries to evaluate results and modify code to update skillsets, schema, or indexer configuration.
147147

148-
To iterate over the above steps, [reset the indexer](search-howto-reindex.md) before rebuilding the pipeline, or delete and recreate the objects on each run (recommended if you are using the free tier). You should also [enable enrichment caching](cognitive-search-incremental-indexing-conceptual.md) to reuse existing enrichments wherever possible.
148+
To repeat any of the above steps, [reset the indexer](search-howto-reindex.md) before you run it. Or, delete and recreate the objects on each run (recommended if you are using the free tier). You should also [enable enrichment caching](cognitive-search-incremental-indexing-conceptual.md) to reuse existing enrichments wherever possible.
149149

150150
## Next steps
151151

0 commit comments

Comments
 (0)