You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-concept-intro.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -17,6 +17,8 @@ In Azure Cognitive Search, AI enrichment refers to built-in cognitive skills and
17
17
18
18
Enrichment is defined by a [skillset](cognitive-search-working-with-skillsets.md) that's attached to an [indexer](search-indexer-overview.md). The indexer will extract and set up the content, while the skillset identifies, analyzes, and creates new information and structures from images, blobs, and other unstructured data sources. The output of an enrichment pipeline is either a [search index](search-what-is-an-index.md) or a [knowledge store](knowledge-store-concept-intro.md).
A skillset can contain built-in skills from Cognitive Search or embed external processing that you provide in a [*custom skill*](cognitive-search-create-custom-skill-example.md). Examples of a custom skill might be a custom entity module or document classifier targeting a specific domain such as finance, scientific publications, or medicine.
21
23
22
24
Built-in skills fall into these categories:
@@ -25,8 +27,6 @@ Built-in skills fall into these categories:
25
27
26
28
+**Image processing** skills include [Optical Character Recognition (OCR)](cognitive-search-skill-ocr.md) and identification of [visual features](cognitive-search-skill-image-analysis.md), such as facial detection, image interpretation, image recognition (famous people and landmarks) or attributes like image orientation. These skills create text representations of image content, making it searchable using the query capabilities of Azure Cognitive Search.
Built-in skills in Azure Cognitive Search are based on pre-trained machine learning models in Cognitive Services APIs: [Computer Vision](../cognitive-services/computer-vision/index.yml) and [Text Analytics](../cognitive-services/text-analytics/overview.md). You can attach a Cognitive Services resource if you want to leverage these resources during content processing.
31
31
32
32
Natural language and image processing is applied during the data ingestion phase, with results becoming part of a document's composition in a searchable index in Azure Cognitive Search. Data is sourced as an Azure data set and then pushed through an indexing pipeline using whichever [built-in skills](cognitive-search-predefined-skills.md) you need.
@@ -104,11 +104,11 @@ Enriched content is generated during skillset execution, and is temporary unless
104
104
105
105
In Azure Cognitive Search, an indexer saves the output it creates.
106
106
107
-
A [searchable index](search-what-is-an-index.md) is one of the outputs that is always created by an indexer. Specification of an index is an indexer requirement, and when you attach a skillset, the output of the skillset, plus any fields that are imported directly from the source, are used to populate the index. Usually, the outputs of specific skills, such as key phrases or sentiment scores, are ingested into the index in fields created for that purpose.
107
+
A [searchable index](search-what-is-an-index.md) is one of the outputs that is always created by an indexer. Specification of an index is an indexer requirement, and when you attach a skillset, the output of the skillset, plus any fields that are mapped directly from the source, are used to populate the index. Usually, the outputs of specific skills, such as key phrases or sentiment scores, are ingested into the index in fields created for that purpose.
108
108
109
109
A [knowledge store](knowledge-store-concept-intro.md) is an optional output, used for downstream apps like knowledge mining. A knowledge store is defined within a skillset. Its definition determines whether your enriched documents are projected as tables or objects (files or blobs). Tabular projections are well suited for interactive analysis in tools like Power BI, whereas files and blobs are typically used in data science or similar processes.
110
110
111
-
Finally, an indexer can [cache enriched documents](cognitive-search-incremental-indexing-conceptual.md) in Azure Blob Storage for potential reuse in subsequent skillset executions. Cached enrichments are consumable by the same skillset that you rerun at a later date. Caching is helpful if your skillset include image analysis or OCR, and you want to avoid the time and expense of reprocessing image files.
111
+
Finally, an indexer can [cache enriched documents](cognitive-search-incremental-indexing-conceptual.md) in Azure Blob Storage for potential reuse in subsequent skillset executions. The cache is for internal use. Cached enrichments are consumable by the same skillset that you rerun at a later date. Caching is helpful if your skillset include image analysis or OCR, and you want to avoid the time and expense of reprocessing image files.
112
112
113
113
Indexes and knowledge stores are fully independent of each other. While you must attach an index to satisfy indexer requirements, if your sole objective is a knowledge store, you can ignore the index after it's populated. Avoid deleting it though. If you want to rerun the indexer and skillset, you'll need the index in order for the indexer to run.
Copy file name to clipboardExpand all lines: articles/search/knowledge-store-concept-intro.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ ms.date: 09/02/2021
12
12
13
13
# Knowledge store in Azure Cognitive Search
14
14
15
-
Knowledge store is a feature of Azure Cognitive Search that sends output from an [AI enrichment pipeline](cognitive-search-concept-intro.md)to tables and blobs in Azure Storage for independent analysis or downstream processing.
15
+
Knowledge store is a data sink created by a Cognitive Search [AI enrichment pipeline](cognitive-search-concept-intro.md)that stores enriched content in tables and blob containers in Azure Storage for independent analysis or downstream processing in non-search scenarios, like knowledge mining.
16
16
17
17
If you have used cognitive skills in the past, you already know that *skillsets* move a document through a sequence of enrichments that invoke atomic transformations, such as recognizing entities or translating text. The outcome can be a search index, or projections in a knowledge store. The two outputs, search index and knowledge store, are mutually exclusive products of the same pipeline; derived from the same inputs, but resulting in output that is structured, stored, and used in different applications.
Copy file name to clipboardExpand all lines: articles/search/knowledge-store-projection-overview.md
+15-13Lines changed: 15 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,30 +8,32 @@ author: HeidiSteen
8
8
ms.author: heidist
9
9
ms.service: cognitive-search
10
10
ms.topic: conceptual
11
-
ms.date: 08/10/2021
11
+
ms.date: 10/08/2021
12
12
---
13
13
14
14
# Knowledge store "projections" in Azure Cognitive Search
15
15
16
-
Projections, an element of [knowledge store](knowledge-store-concept-intro.md), are views of enriched documents that can be saved to physical storage for knowledge mining purposes. A projection lets you "project" your data into a shape that aligns with your needs, preserving relationships so that tools like Power BI can read the data with no additional effort.
16
+
Projections are the element of a [knowledge store](knowledge-store-concept-intro.md) definition that specifies the physical expression of your data in Azure Storage. A projection definition determines the number and type of data structures in Azure Storage.
17
17
18
-
Projections can be tabular, where data articulation is in rows and columns in Azure Table Storage, or JSON objects stored in Azure Blob Storage, or binary images also stored in Blob Storage. You can define multiple projections of your data as it is being enriched. Multiple projections are useful when you want the same data shaped differently for individual use cases.
18
+
## Types of data structures
19
19
20
-
The knowledge store supports three types of projections:
20
+
A knowledge store is a logical construction that's physically expressed in Azure Storage as tables, JSON objects, or binary image files.
21
21
22
-
+**Tables**: For data that's best represented as rows and columns, table projections allow you to define a schematized shape or projection in Table storage. Only valid JSON objects can be projected as tables. Since an enriched document can contain nodes that are not named JSON objects, you'll add a [Shaper skill or use inline shaping](knowledge-store-projection-shape.md) within a skill to create valid JSON.
22
+
| Projection | Storage | Usage |
23
+
|------------|---------|-------|
24
+
| Tables | Azure Table Storage | Used for data that's best represented as rows and columns. Table projections allow you to define a schematized shape or projection. Only valid JSON objects can be projected as tables. Since an enriched document can contain nodes that are not named JSON objects, you'll add a [Shaper skill or use inline shaping](knowledge-store-projection-shape.md) within a skill to create valid JSON. |
25
+
| Objects | Azure Blob Storage | Used when you need a JSON representation of your data and enrichments. As with table projections, only valid JSON objects can be projected as objects, and shaping can help you do that. |
26
+
| Files | Azure Blob Storage | Used when you need to save normalized, binary image files. |
23
27
24
-
+**Objects**: When you need a JSON representation of your data and enrichments, use object projections to save the output as blobs. As with table projections, only valid JSON objects can be projected as objects, and shaping can help you do that.
28
+
You can define multiple projections of your data as it is being enriched. Multiple projections are useful when you want the same data shaped differently for individual use cases.
25
29
26
-
+**Files**: When you need to save the images extracted from the documents, file projections allow you to save the normalized images to blob storage.
30
+
## Basic definition
27
31
28
-
To see projections defined in context, step through [Create a knowledge store in REST](knowledge-store-create-rest.md).
32
+
Projections are an array of complex collections under a `knowledgeStore` definition in a [skillset object](/rest/api/searchservice/create-skillset).
29
33
30
-
## Basic pattern
34
+
Each set of tables, objects, and files is a *project group*, and you can have multiple groups if storage requirements include supporting different tools and scenarios. Within a single group, you can have multiple tables, objects, and files.
31
35
32
-
Projections are an array of complex collections under a `knowledgeStore` definition in a skillset object. Each set of tables, objects, and files is a *project group*, and you can have multiple groups if storage requirements include supporting different tools and scenarios. Within a single group, you can have multiple tables, objects, and files.
33
-
34
-
Typically only one group is used, but the following example shows two to illustrate the pattern when multiple groups exist.
36
+
Typically only one group is used, but the following example shows two to reinforce the idea of multiple groups.
35
37
36
38
```json
37
39
"knowledgeStore" : {
@@ -51,7 +53,7 @@ Typically only one group is used, but the following example shows two to illustr
51
53
}
52
54
```
53
55
54
-
### Projection groups
56
+
##Data isolation and relatedness
55
57
56
58
Having multiple sets of table-object-file combinations is useful for supporting different scenarios. You might use one set for design and debug of a skillset, capturing output for further examination, while a second set collects output used for an online app, with a third for data science workloads.
0 commit comments