Skip to content

Commit 2af6901

Browse files
committed
file, cross projections and incremental indexing
1 parent 1505d8c commit 2af6901

File tree

3 files changed

+76
-18
lines changed

3 files changed

+76
-18
lines changed

articles/search/cognitive-search-concept-image-scenarios.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ This article covers image processing in more detail and provides guidance for wo
2222

2323
## Get normalized images
2424

25-
As part of document cracking, there are a new set of indexer configuration parameters for handling image files or images embedded in files. These parameters are used to normalize images for further downstream processing. Normalizing images makes them more uniform. Large images are resized to a maximum height and width to make them consumable. For images providing metadata on orientation, image rotation is adjusted for vertical loading. Metadata adjustments are captured in a complex type created for each image.
25+
As part of document cracking, there are a new set of indexer configuration parameters for handling image files or images embedded in files. These parameters are used to normalize images for further downstream processing. Normalizing images makes them more uniform. Large images are resized to a maximum height and width to make them consumable. For images providing metadata on orientation, image rotation is adjusted for vertical loading. Metadata adjustments are captured in a complex type created for each image.
2626

2727
You cannot turn off image normalization. Skills that iterate over images expect normalized images. Enabling image normalization on an indexer requires that a skillset be attached to that indexer.
2828

@@ -211,10 +211,16 @@ As a helper, if you need to transform normalized coordinates to the original coo
211211
}
212212
```
213213

214+
## Saving images
215+
216+
Knowledge store projections allow you to save the normalized images extracted as part of document cracking as blobs in storage. To save your images, you will need to add a knowledge store to your skillset and use the file projection group.
217+
214218
## See also
219+
215220
+ [Create indexer (REST)](https://docs.microsoft.com/rest/api/searchservice/create-indexer)
216221
+ [Analyze image skill](cognitive-search-skill-image-analysis.md)
217222
+ [OCR skill](cognitive-search-skill-ocr.md)
218223
+ [Text merge skill](cognitive-search-skill-textmerger.md)
219224
+ [How to define a skillset](cognitive-search-defining-skillset.md)
220225
+ [How to map enriched fields](cognitive-search-output-field-mapping.md)
226+
+ [How to save extracted images](knowledge-store-projection-overview.md)

articles/search/cognitive-search-incremental-indexing-conceptual.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -125,11 +125,21 @@ Indexers will now expose a new property:
125125
2. CacheId: The cacheId is the identifier of the container within the annotationCache storage account that will be used as the cache for this indexer. This cache will be unique to this indexer and if the indexer is deleted and recreated with the same name, the cacheid will be regenerated. The cacheId cannot be set, it is always generated by the service.
126126
3. EnableReprocessing: Set to true by default, when set to false, documents will continue to be written to the cache, but no existing documents will be reprocessed based on the cache data.
127127

128-
### Skillset
128+
Indexers will also support a new querystring parameter:
129129

130-
Skillset will support a new operation:
130+
1. `ignoreResetRequirement` set to true when your update action version of the skill
131131

132-
1. ResetSkills: The invalidate skills API will accept POST request with a payload containing the list of skill names that need to be invalidated.
132+
### Skillsets
133+
134+
Skillsets will not support any new operations, but will support new querystring parameter:
135+
136+
1. `disableCacheReprocessingChangeDetection` set to true when you want no updates to on existing documents based on the current action.
137+
138+
### Datasources
139+
140+
Datasources will not support any new operations, but will support new querystring parameter:
141+
142+
1. `ignoreResetRequirement` set to true when your update action version of the skill
133143

134144
## Best practices
135145

articles/search/knowledge-store-projection-overview.md

Lines changed: 56 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ The knowledge store supports three types of projections:
2828

2929
+ **Objects**: When you need a JSON representation of your data and enrichments, object projections are saved as blobs.
3030

31-
+ **Files**: In scenarios where you need to save the images extracted from the documents, file projections allow you to save the normalized images.
31+
+ **Files**: When you need to save the images extracted from the documents, file projections allow you to save the normalized images.
3232

3333
To see projections defined in context, step through [How to get started with knowledge store](knowledge-store-howto.md).
3434

@@ -43,7 +43,7 @@ This implies that you can have the same data shaped differently, yet repeated in
4343

4444
### Relatedness
4545

46-
All content projected within a single projection group preserves relationships within the data across projection types. Within tables, relationships are based on a generated key and each child node retains a reference to the parent node. Across types (tables, objects and files), relationships are preserved when a single node is projected across different types. For example, consider a scenario where you have a document containing images and text. You could project the text to tables or objects and the images to files where the tables or objects have a property containing the file URL.
46+
Projection groups now allow you to project your documents across projection types while preserving the relationships across projection types. All content projected within a single projection group preserves relationships within the data across projection types. Within tables, relationships are based on a generated key and each child node retains a reference to the parent node. Across types (tables, objects and files), relationships are preserved when a single node is projected across different types. For example, consider a scenario where you have a document containing images and text. You could project the text to tables or objects and the images to files where the tables or objects have a property containing the file URL.
4747

4848
## Input shaping
4949
Getting your data in the right shape or structure is key to effective use, be it tables or objects. The ability to shape or structure your data based on how you plan to access and use it is a key capability exposed as the **Shaper** skill within the skillset.
@@ -52,6 +52,10 @@ Projections are easier to define when you have an object in the enrichment tree
5252

5353
When you have a new shape defined that contains all the elements you need to project out, you can now use this shape as the source for your projections or as an input to another skill.
5454

55+
## Projection slicing
56+
57+
When defining a projection group, a single node in the enrichment tree can be sliced into multiple related tables or objects. Adding a projection with a source path that is a child of an existing projection will result in the child node being sliced out of the parent node and projected into the new yet related table or object. This allows you to define a single node in a shaper skill that can be the source for all of your projections.
58+
5559
## Table projections
5660

5761
Because it makes importing easier, we recommend table projections for data exploration with Power BI. Additionally, table projections allow for changing change the cardinality between table relationship.
@@ -60,10 +64,7 @@ You can project a single document in your index into multiple tables, preserving
6064

6165
### Defining a table projection
6266

63-
When defining a table projection within the `knowledgeStore` element of your skillset, start by mapping a node on the enrichment tree to the table source. Typically this node is the output of a **Shaper** skill that you added to the list of skills to produce a specific shape that you need to project into tables. The node you choose to project can be sliced to project into multiple tables. The tables definition is a list of tables that you want to project.
64-
65-
#### Projection slicing
66-
When defining a table projection group, a single node in the enrichment tree can be sliced into multiple related tables. Adding a table with a source path that is a child of an existing table projection will result in the child node being sliced out of the parent node and projected into the new yet related table. This allows you to define a single node in a shaper skill that can be the source for all of your table projections.
67+
When defining a table projection within the `knowledgeStore` element of your skillset, start by mapping a node on the enrichment tree to the table source. Typically this node is the output of a **Shaper** skill that you added to the list of skills to produce a specific shape that you need to project into tables. The node you choose to project can be sliced to project into multiple tables. The tables definition is a list of tables that you want to project.
6768

6869
Each table requires three properties:
6970

@@ -97,15 +98,16 @@ Here is an example of table projections.
9798
]
9899
},
99100
{
100-
"objects": [
101-
102-
]
101+
"objects": [ ]
102+
},
103+
{
104+
"files": [ ]
103105
}
104106
]
105107
}
106108
}
107109
```
108-
As demonstrated in this example, the key phrases and entities are modeled into different tables and will contain a reference back to the parent (MainTable) for each row.
110+
As demonstrated in this example, the key phrases and entities are modeled into different tables and will contain a reference back to the parent (MainTable) for each row.
109111

110112
The following illustration is a reference to the Caselaw exercise in [How to get started with knowledge store](knowledge-store-howto.md). In a scenario where a case has multiple opinions, and each opinion is enriched by identifying entities contained within it, you could model the projections as shown here.
111113

@@ -121,7 +123,6 @@ Object projections are JSON representations of the enrichment tree that can be s
121123
"name": "your-skillset",
122124
"skills": [
123125
…your skills
124-
125126
],
126127
"cognitiveServices": {
127128
… your cognitive services key info
@@ -142,6 +143,9 @@ Object projections are JSON representations of the enrichment tree that can be s
142143
"key": "/document/Review/Id"
143144
}
144145
]
146+
},
147+
{
148+
"files": [ ]
145149
}
146150
]
147151
}
@@ -154,13 +158,51 @@ Generating an object projection requires a few object-specific attributes:
154158
+ source: The path to the node of the enrichment tree that is the root of the projection
155159
+ key: A path that represents a unique key for the object to be stored. It will be used to create the name of the blob in the container.
156160

157-
## Projection Lifecycle
161+
## File projection
162+
163+
File projections are very similar to object projections and only act on the `normalized_images` collection. Similar to object projections file projects are saved in the blob container with folder prefix of the base64 encoded value of the document id. File projections cannot share the same container as object projections and need to be projected into a different container.
164+
165+
```json
166+
{
167+
"name": "your-skillset",
168+
"skills": [
169+
…your skills
170+
],
171+
"cognitiveServices": {
172+
… your cognitive services key info
173+
},
174+
175+
"knowledgeStore": {
176+
"storageConnectionString": "an Azure storage connection string",
177+
"projections" : [
178+
{
179+
"tables": [ ]
180+
},
181+
{
182+
"objects": [ ]
183+
},
184+
{
185+
"files": [
186+
{
187+
"storageContainer": "ReviewImages",
188+
"source": "/document/normalized_images/*"
189+
}
190+
]
191+
}
192+
]
193+
}
194+
}
195+
```
196+
197+
198+
199+
## Projection lifecycle
158200

159-
Your projections have a lifecycle that is tied to the source data in your data source. As your data is updated and re-indexed, your projections are updated with the results of the enrichments ensuring your projections are eventually consistent with the data in your data source. The projections inherit the delete policy you have configured for your index.
201+
Your projections have a lifecycle that is tied to the source data in your data source. As your data is updated and re-indexed, your projections are updated with the results of the enrichments ensuring your projections are eventually consistent with the data in your data source. The projections inherit the delete policy you have configured for your index. Projections are not deleted when the indexer or the search service itself is deleted.
160202

161203
## Using projections
162204

163-
After the indexer is run, you can read the projected data in the containers or tables you specified through projections.
205+
After the indexer is run, you can read the projected data in the containers or tables you specified through projections.
164206

165207
For analytics, exploration in Power BI is as simple as setting Azure Table storage as the data source. You can very easily create a set of visualizations on your data leveraging the relationships within.
166208

0 commit comments

Comments
 (0)