Skip to content

Commit fa08b17

Browse files
authored
Merge pull request #97663 from vkurpad/master
updated docs for file projections and source context based on open hack feedback
2 parents ad2ea68 + 47884fa commit fa08b17

6 files changed

+44
-11
lines changed

articles/search/cognitive-search-concept-intro.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,8 @@ Indexes are generated from an index schema that defines the fields, attributes,
111111
| Indexer | A crawler that extracts searchable data and metadata from an external data source and populates an index based on field-to-field mappings between the index and your data source for document cracking. For AI enrichments, the indexer invokes a skillset, and contains the field mappings associating enrichment output to target fields in the index. The indexer definition contains all of the instructions and references for pipeline operations, and the pipeline is invoked when you run the indexer. With additional configuration, you can re-use existing processing and execute only those steps and skills that are changed. | See [Indexers](search-indexer-overview.md) and [Incremental indexing (preview)](cognitive-search-incremental-indexing-conceptual.md). |
112112
| Data Source | An object used by an indexer to connect to an external data source of supported types on Azure. | See [Indexers overview](search-indexer-overview.md) |
113113
| Index | A persisted search index in Azure Cognitive Search, built from an index schema that defines field structure and usage. | See [Create a basic index](search-what-is-an-index.md) |
114+
| Knowledge store | A storage account where the enriched documents can be shaped and projected in addition to the search index | See [Introduction to knowledge store](knowledge-store-concept-intro.md) |
115+
| Indexer cache | A storage account skill outputs are cached by the indexer. The cache enables the indexeer to minimize the cost of reprocessing a large number of documents when a skillset is edited. | See [Incremental indexing](cognitive-search-incremental-indexing-conceptual.md) |
114116

115117
<a name="where-do-i-start"></a>
116118

articles/search/cognitive-search-incremental-indexing-conceptual.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ You'll need to set the `cache` property on the indexer to start benefitting from
4444
},
4545
"fieldMappings" : [],
4646
"outputFieldMappings": [],
47-
"parameters":{}
47+
"parameters": {}
4848
}
4949
```
5050

articles/search/cognitive-search-working-with-skillsets.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ Each skill requires a context. A context determines:
6161

6262
### SourceContext
6363

64-
The `sourceContext` is only used in [shaper skills](cognitive-search-skill-shaper.md) and [projections](knowledge-store-projection-overview.md). It is used to construct multi-level, nested objects. The `sourceContext` enables you to construct a hierarchical, anonymous type object, which would require multiple skills if you were only using the context. Using `sourceContext` is shown in the next section.
64+
The `sourceContext` is only used in skill inputs and [projections](knowledge-store-projection-overview.md). It is used to construct multi-level, nested objects. You may need to create a new oject to either pass it as an input to a skill or project into the knowledge store. As enrichment nodes may not be a valid JSON object in the enrichment tree and refrencing an node in the tree only returns that state of the node when it was created, using the enrichments as skill inputs or projections requires you to create a well formed JSON object. The `sourceContext` enables you to construct a hierarchical, anonymous type object, which would require multiple skills if you were only using the context. Using `sourceContext` is shown in the next section. Look at the skill output that generated an enrichment to determine if it is a valid JSON object and not a primitive type.
6565

6666
### Projections
6767

articles/search/knowledge-store-concept-intro.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,9 @@ A `knowledgeStore` consists of a connection and projections.
5858

5959
+ Connection is to a storage account in the same region as Azure Cognitive Search.
6060

61-
+ Projections are tables-objects pairs. `Tables` define the physical expression of enriched documents in Azure Table storage. `Objects` define the physical objects in Azure Blob storage.
61+
+ Projections can be tabular, JSON objects or files. `Tables` define the physical expression of enriched documents in Azure Table storage. `Objects` define the physical JSON objects in Azure Blob storage. `Files` are binaries like images that were extracted from the document that will be persisted.
62+
63+
+ Projections is a collection of projection objects, each projection object can contain `tables`, `objects` and `files`. Enrichments projected within a single projection are related even when projected across types (tables, objects or files). Projections across projection objects are not related and are independent. The same shape can be projected aross multiple projection objects.
6264

6365
```json
6466
{
@@ -106,7 +108,10 @@ A `knowledgeStore` consists of a connection and projections.
106108
],
107109
"objects": [
108110

109-
]
111+
],
112+
"files": [
113+
114+
]
110115
},
111116
{
112117
"tables": [
@@ -118,13 +123,17 @@ A `knowledgeStore` consists of a connection and projections.
118123
"source": "/document/Review",
119124
"key": "/document/Review/Id"
120125
}
121-
]
126+
],
127+
"files": [
128+
129+
]
122130
}
123131
]
124132
}
125133
}
126134
```
127135

136+
This sample does not contain any images, for an example of how to use file projections see [Working with projections](knowledge-store-projection-overview.md).
128137
### Sources of data for a knowledge store
129138

130139
If a knowledge store is output from an AI enrichment pipeline, what are the inputs? The original data that you want to extract, enrich, and ultimately save to a knowledge store can originate from any Azure data source supported by search indexers:

articles/search/knowledge-store-projection-overview.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,11 +25,11 @@ Projections can be tabular, with data stored in rows and columns in Azure Table
2525

2626
The knowledge store supports three types of projections:
2727

28-
+ **Tables**: For data that's best represented as rows and columns, table projections allow you to define a schematized shape or projection in Table storage.
28+
+ **Tables**: For data that's best represented as rows and columns, table projections allow you to define a schematized shape or projection in Table storage. Only valid JSON objects can be projected as tables, the enriched document can contain nodes that are not named JSON objects and when projecting these objects, create a valid JSON object with a shaper skill or inline shaping.
2929

30-
+ **Objects**: When you need a JSON representation of your data and enrichments, object projections are saved as blobs.
30+
+ **Objects**: When you need a JSON representation of your data and enrichments, object projections are saved as blobs. Only valid JSON objects can be projected as objects, the enriched document can contain nodes that are not named JSON objects and when projecting these objects, create a valid JSON object with a shaper skill or inline shaping.
3131

32-
+ **Files**: When you need to save the images extracted from the documents, file projections allow you to save the normalized images.
32+
+ **Files**: When you need to save the images extracted from the documents, file projections allow you to save the normalized images to blob storage.
3333

3434
To see projections defined in context, step through [How to get started with knowledge store](knowledge-store-howto.md).
3535

@@ -44,7 +44,7 @@ This independence implies that you can have the same data shaped differently, ye
4444

4545
### Relatedness
4646

47-
Projection groups now allow you to project your documents across projection types while preserving the relationships across projection types. All content projected within a single projection group preserves relationships within the data across projection types. Within tables, relationships are based on a generated key and each child node retains a reference to the parent node. Across types (tables, objects, and files), relationships are preserved when a single node is projected across different types. For example, consider a scenario where you have a document containing images and text. You could project the text to tables or objects and the images to files where the tables or objects have a property containing the file URL.
47+
Projection groups now allow you to project your documents across projection types while preserving the relationships across projection types. All content projected within a single projection group preserves relationships within the data across projection types. Within tables, relationships are based on a generated key and each child node retains a reference to the parent node. Across types (tables, objects and files), relationships are preserved when a single node is projected across different types. For example, consider a scenario where you have a document containing images and text. You could project the text to tables or objects and the images to files where the tables or objects have a column/property containing the file URL.
4848

4949
## Input shaping
5050

articles/search/search-howto-incremental-index.md

Lines changed: 24 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,13 +39,35 @@ api-key: [admin key]
3939

4040
### Step 2: Add the cache property
4141

42+
<<<<<<< HEAD
43+
Edit the response from the GET request to add the `cache` property to the indexer. The cache object requires only a single property, `storageConnectionString` which is the connection string to the storage account.
44+
=======
4245
Edit the response from the GET request to add the `cache` property to the indexer. The cache object requires only a single property, and that is the connection string to an Azure Storage account.
46+
>>>>>>> 3519a330aa86b6827d31403690529105825b1b16
4347
4448
```json
45-
"cache": {
46-
"storageConnectionString": "[your storage connection string]"
49+
{
50+
"name": "myIndexerName",
51+
"targetIndexName": "myIndex",
52+
"dataSourceName": "myDatasource",
53+
"skillsetName": "mySkillset",
54+
"cache" : {
55+
"storageConnectionString" : "Your storage account connection string",
56+
"enableReprocessing": true,
57+
"id" : "Auto generated Id you do not need to set"
58+
},
59+
"fieldMappings" : [],
60+
"outputFieldMappings": [],
61+
"parameters": {
62+
"configuration": {
63+
"enableAnnotationCache": true
64+
}
4765
}
66+
}
4867
```
68+
#### Enable reporocessing
69+
70+
You can optionally set the `enableReprocessing` boolean property within the cache which is by default set to true. The `enableReprocessing` flag allows you to control the behavior of your indexer. In scenarios where you want the indexer to prioritize adding new documents to the index, you would set the flag to false. Once your indexer is caught up with the new documents, flipping the flag to true would then allow the indexer to start driving existing documents to eventual consistency. During the period when the `enableReprocessing` flag is set to false, the indexer only writes to the cache but will not process any existing documents based on identified changes to the enrichment pipeline.
4971

5072
### Step 3: Reset the indexer
5173

0 commit comments

Comments
 (0)