Skip to content

Commit bda3fb9

Browse files
authored
Merge pull request #175153 from HeidiSteen/heidist-fresh
[azure search] kstore edits
2 parents 41fd0da + c9b6cf0 commit bda3fb9

File tree

3 files changed

+20
-18
lines changed

3 files changed

+20
-18
lines changed

articles/search/cognitive-search-concept-intro.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ In Azure Cognitive Search, AI enrichment refers to built-in cognitive skills and
1717

1818
Enrichment is defined by a [skillset](cognitive-search-working-with-skillsets.md) that's attached to an [indexer](search-indexer-overview.md). The indexer will extract and set up the content, while the skillset identifies, analyzes, and creates new information and structures from images, blobs, and other unstructured data sources. The output of an enrichment pipeline is either a [search index](search-what-is-an-index.md) or a [knowledge store](knowledge-store-concept-intro.md).
1919

20+
![Enrichment pipeline diagram](./media/cognitive-search-intro/cogsearch-architecture.png "enrichment pipeline overview")
21+
2022
A skillset can contain built-in skills from Cognitive Search or embed external processing that you provide in a [*custom skill*](cognitive-search-create-custom-skill-example.md). Examples of a custom skill might be a custom entity module or document classifier targeting a specific domain such as finance, scientific publications, or medicine.
2123

2224
Built-in skills fall into these categories:
@@ -25,8 +27,6 @@ Built-in skills fall into these categories:
2527

2628
+ **Image processing** skills include [Optical Character Recognition (OCR)](cognitive-search-skill-ocr.md) and identification of [visual features](cognitive-search-skill-image-analysis.md), such as facial detection, image interpretation, image recognition (famous people and landmarks) or attributes like image orientation. These skills create text representations of image content, making it searchable using the query capabilities of Azure Cognitive Search.
2729

28-
![Enrichment pipeline diagram](./media/cognitive-search-intro/cogsearch-architecture.png "enrichment pipeline overview")
29-
3030
Built-in skills in Azure Cognitive Search are based on pre-trained machine learning models in Cognitive Services APIs: [Computer Vision](../cognitive-services/computer-vision/index.yml) and [Text Analytics](../cognitive-services/text-analytics/overview.md). You can attach a Cognitive Services resource if you want to leverage these resources during content processing.
3131

3232
Natural language and image processing is applied during the data ingestion phase, with results becoming part of a document's composition in a searchable index in Azure Cognitive Search. Data is sourced as an Azure data set and then pushed through an indexing pipeline using whichever [built-in skills](cognitive-search-predefined-skills.md) you need.
@@ -104,11 +104,11 @@ Enriched content is generated during skillset execution, and is temporary unless
104104

105105
In Azure Cognitive Search, an indexer saves the output it creates.
106106

107-
A [searchable index](search-what-is-an-index.md) is one of the outputs that is always created by an indexer. Specification of an index is an indexer requirement, and when you attach a skillset, the output of the skillset, plus any fields that are imported directly from the source, are used to populate the index. Usually, the outputs of specific skills, such as key phrases or sentiment scores, are ingested into the index in fields created for that purpose.
107+
A [searchable index](search-what-is-an-index.md) is one of the outputs that is always created by an indexer. Specification of an index is an indexer requirement, and when you attach a skillset, the output of the skillset, plus any fields that are mapped directly from the source, are used to populate the index. Usually, the outputs of specific skills, such as key phrases or sentiment scores, are ingested into the index in fields created for that purpose.
108108

109109
A [knowledge store](knowledge-store-concept-intro.md) is an optional output, used for downstream apps like knowledge mining. A knowledge store is defined within a skillset. Its definition determines whether your enriched documents are projected as tables or objects (files or blobs). Tabular projections are well suited for interactive analysis in tools like Power BI, whereas files and blobs are typically used in data science or similar processes.
110110

111-
Finally, an indexer can [cache enriched documents](cognitive-search-incremental-indexing-conceptual.md) in Azure Blob Storage for potential reuse in subsequent skillset executions. Cached enrichments are consumable by the same skillset that you rerun at a later date. Caching is helpful if your skillset include image analysis or OCR, and you want to avoid the time and expense of reprocessing image files.
111+
Finally, an indexer can [cache enriched documents](cognitive-search-incremental-indexing-conceptual.md) in Azure Blob Storage for potential reuse in subsequent skillset executions. The cache is for internal use. Cached enrichments are consumable by the same skillset that you rerun at a later date. Caching is helpful if your skillset include image analysis or OCR, and you want to avoid the time and expense of reprocessing image files.
112112

113113
Indexes and knowledge stores are fully independent of each other. While you must attach an index to satisfy indexer requirements, if your sole objective is a knowledge store, you can ignore the index after it's populated. Avoid deleting it though. If you want to rerun the indexer and skillset, you'll need the index in order for the indexer to run.
114114

articles/search/knowledge-store-concept-intro.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ ms.date: 09/02/2021
1212

1313
# Knowledge store in Azure Cognitive Search
1414

15-
Knowledge store is a feature of Azure Cognitive Search that sends output from an [AI enrichment pipeline](cognitive-search-concept-intro.md) to tables and blobs in Azure Storage for independent analysis or downstream processing.
15+
Knowledge store is a data sink created by a Cognitive Search [AI enrichment pipeline](cognitive-search-concept-intro.md) that stores enriched content in tables and blob containers in Azure Storage for independent analysis or downstream processing in non-search scenarios, like knowledge mining.
1616

1717
If you have used cognitive skills in the past, you already know that *skillsets* move a document through a sequence of enrichments that invoke atomic transformations, such as recognizing entities or translating text. The outcome can be a search index, or projections in a knowledge store. The two outputs, search index and knowledge store, are mutually exclusive products of the same pipeline; derived from the same inputs, but resulting in output that is structured, stored, and used in different applications.
1818

articles/search/knowledge-store-projection-overview.md

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -8,30 +8,32 @@ author: HeidiSteen
88
ms.author: heidist
99
ms.service: cognitive-search
1010
ms.topic: conceptual
11-
ms.date: 08/10/2021
11+
ms.date: 10/08/2021
1212
---
1313

1414
# Knowledge store "projections" in Azure Cognitive Search
1515

16-
Projections, an element of [knowledge store](knowledge-store-concept-intro.md), are views of enriched documents that can be saved to physical storage for knowledge mining purposes. A projection lets you "project" your data into a shape that aligns with your needs, preserving relationships so that tools like Power BI can read the data with no additional effort.
16+
Projections are the element of a [knowledge store](knowledge-store-concept-intro.md) definition that specifies the physical expression of your data in Azure Storage. A projection definition determines the number and type of data structures in Azure Storage.
1717

18-
Projections can be tabular, where data articulation is in rows and columns in Azure Table Storage, or JSON objects stored in Azure Blob Storage, or binary images also stored in Blob Storage. You can define multiple projections of your data as it is being enriched. Multiple projections are useful when you want the same data shaped differently for individual use cases.
18+
## Types of data structures
1919

20-
The knowledge store supports three types of projections:
20+
A knowledge store is a logical construction that's physically expressed in Azure Storage as tables, JSON objects, or binary image files.
2121

22-
+ **Tables**: For data that's best represented as rows and columns, table projections allow you to define a schematized shape or projection in Table storage. Only valid JSON objects can be projected as tables. Since an enriched document can contain nodes that are not named JSON objects, you'll add a [Shaper skill or use inline shaping](knowledge-store-projection-shape.md) within a skill to create valid JSON.
22+
| Projection | Storage | Usage |
23+
|------------|---------|-------|
24+
| Tables | Azure Table Storage | Used for data that's best represented as rows and columns. Table projections allow you to define a schematized shape or projection. Only valid JSON objects can be projected as tables. Since an enriched document can contain nodes that are not named JSON objects, you'll add a [Shaper skill or use inline shaping](knowledge-store-projection-shape.md) within a skill to create valid JSON. |
25+
| Objects | Azure Blob Storage | Used when you need a JSON representation of your data and enrichments. As with table projections, only valid JSON objects can be projected as objects, and shaping can help you do that. |
26+
| Files | Azure Blob Storage | Used when you need to save normalized, binary image files. |
2327

24-
+ **Objects**: When you need a JSON representation of your data and enrichments, use object projections to save the output as blobs. As with table projections, only valid JSON objects can be projected as objects, and shaping can help you do that.
28+
You can define multiple projections of your data as it is being enriched. Multiple projections are useful when you want the same data shaped differently for individual use cases.
2529

26-
+ **Files**: When you need to save the images extracted from the documents, file projections allow you to save the normalized images to blob storage.
30+
## Basic definition
2731

28-
To see projections defined in context, step through [Create a knowledge store in REST](knowledge-store-create-rest.md).
32+
Projections are an array of complex collections under a `knowledgeStore` definition in a [skillset object](/rest/api/searchservice/create-skillset).
2933

30-
## Basic pattern
34+
Each set of tables, objects, and files is a *project group*, and you can have multiple groups if storage requirements include supporting different tools and scenarios. Within a single group, you can have multiple tables, objects, and files.
3135

32-
Projections are an array of complex collections under a `knowledgeStore` definition in a skillset object. Each set of tables, objects, and files is a *project group*, and you can have multiple groups if storage requirements include supporting different tools and scenarios. Within a single group, you can have multiple tables, objects, and files.
33-
34-
Typically only one group is used, but the following example shows two to illustrate the pattern when multiple groups exist.
36+
Typically only one group is used, but the following example shows two to reinforce the idea of multiple groups.
3537

3638
```json
3739
"knowledgeStore" : {
@@ -51,7 +53,7 @@ Typically only one group is used, but the following example shows two to illustr
5153
}
5254
```
5355

54-
### Projection groups
56+
## Data isolation and relatedness
5557

5658
Having multiple sets of table-object-file combinations is useful for supporting different scenarios. You might use one set for design and debug of a skillset, capturing output for further examination, while a second set collects output used for an online app, with a third for data science workloads.
5759

0 commit comments

Comments
 (0)