Merge pull request #97663 from vkurpad/master

PRMerger13 · web-flow · commit fa08b17a41a2 · 2019-12-03T11:27:27.000-08:00
updated docs for file projections and source context based on open hack feedback
diff --git a/articles/search/cognitive-search-concept-intro.md b/articles/search/cognitive-search-concept-intro.md
@@ -111,6 +111,8 @@ Indexes are generated from an index schema that defines the fields, attributes,
 | Indexer |  A crawler that extracts searchable data and metadata from an external data source and populates an index based on field-to-field mappings between the index and your data source for document cracking. For AI enrichments, the indexer invokes a skillset, and contains the field mappings associating enrichment output to target fields in the index. The indexer definition contains all of the instructions and references for pipeline operations, and the pipeline is invoked when you run the indexer. With additional configuration, you can re-use existing processing and execute only those steps and skills that are changed. | See [Indexers](search-indexer-overview.md) and [Incremental indexing (preview)](cognitive-search-incremental-indexing-conceptual.md). |
 | Data Source  | An object used by an indexer to connect to an external data source of supported types on Azure. | See [Indexers overview](search-indexer-overview.md) |
 | Index | A persisted search index in Azure Cognitive Search, built from an index schema that defines field structure and usage. | See [Create a basic index](search-what-is-an-index.md) | 
+| Knowledge store | A storage account where the enriched documents can be shaped and projected in addition to the search index | See [Introduction to knowledge store](knowledge-store-concept-intro.md) | 
+| Indexer cache | A storage account skill outputs are cached by the indexer. The cache enables the indexeer to minimize the cost of reprocessing a large number of documents when a skillset is edited. | See [Incremental indexing](cognitive-search-incremental-indexing-conceptual.md) | 
 
 <a name="where-do-i-start"></a>
 
diff --git a/articles/search/cognitive-search-incremental-indexing-conceptual.md b/articles/search/cognitive-search-incremental-indexing-conceptual.md
@@ -44,7 +44,7 @@ You'll need to set the `cache` property on the indexer to start benefitting from
     },
     "fieldMappings" : [],
     "outputFieldMappings": [],
-    "parameters":{}
+    "parameters": {}
 }
 ```
 
diff --git a/articles/search/cognitive-search-working-with-skillsets.md b/articles/search/cognitive-search-working-with-skillsets.md
@@ -61,7 +61,7 @@ Each skill requires a context. A context determines:
 
 ### SourceContext
 
-The `sourceContext` is only used in [shaper skills](cognitive-search-skill-shaper.md) and [projections](knowledge-store-projection-overview.md). It is used to construct multi-level, nested objects. The `sourceContext` enables you to construct a hierarchical, anonymous type object, which would require multiple skills if you were only using the context. Using `sourceContext` is shown in the next section.
+The `sourceContext` is only used in skill inputs and [projections](knowledge-store-projection-overview.md). It is used to construct multi-level, nested objects. You may need to create a new oject to either pass it as an input to a skill or project into the knowledge store. As enrichment nodes may not be a valid JSON object in the enrichment tree and refrencing an node in the tree only returns that state of the node when it was created, using the enrichments as skill inputs or projections requires you to create a well formed JSON object. The `sourceContext` enables you to construct a hierarchical, anonymous type object, which would require multiple skills if you were only using the context. Using `sourceContext` is shown in the next section. Look at the skill output that generated an enrichment to determine if it is a valid JSON object and not a primitive type.
 
 ### Projections
 
diff --git a/articles/search/knowledge-store-concept-intro.md b/articles/search/knowledge-store-concept-intro.md
@@ -58,7 +58,9 @@ A `knowledgeStore` consists of a connection and projections.
 
 + Connection is to a storage account in the same region as Azure Cognitive Search. 
 
-+ Projections are tables-objects pairs. `Tables` define the physical expression of enriched documents in Azure Table storage. `Objects` define the physical objects in Azure Blob storage.
++ Projections can be tabular, JSON objects or files. `Tables` define the physical expression of enriched documents in Azure Table storage. `Objects` define the physical JSON objects in Azure Blob storage. `Files` are binaries like images that were extracted from the document that will be persisted.
+
++ Projections is a collection of projection objects, each projection object can contain `tables`, `objects` and `files`. Enrichments projected within a single projection are related even when projected across types (tables, objects or files). Projections across projection objects are not related and are independent. The same shape can be projected aross multiple projection objects.
 
 ```json
 {
@@ -106,7 +108,10 @@ A `knowledgeStore` consists of a connection and projections.
             ], 
             "objects": [ 
                
-            ]      
+            ], 
+            "files": [
+
+            ]  
         },
         { 
             "tables": [ 
@@ -118,13 +123,17 @@ A `knowledgeStore` consists of a connection and projections.
                 "source": "/document/Review", 
                 "key": "/document/Review/Id" 
                 } 
-            ]      
+            ],
+            "files": [
+                
+            ]  
         }        
     ]     
     } 
 }
 ```
 
+This sample does not contain any images, for an example of how to use file projections see [Working with projections](knowledge-store-projection-overview.md).
 ### Sources of data for a knowledge store
 
 If a knowledge store is output from an AI enrichment pipeline, what are the inputs? The original data that you want to extract, enrich, and ultimately save to a knowledge store can originate from any Azure data source supported by search indexers: 
diff --git a/articles/search/knowledge-store-projection-overview.md b/articles/search/knowledge-store-projection-overview.md
@@ -25,11 +25,11 @@ Projections can be tabular, with data stored in rows and columns in Azure Table
 
 The knowledge store supports three types of projections:
 
-+ **Tables**: For data that's best represented as rows and columns, table projections allow you to define a schematized shape or projection in Table storage.
++ **Tables**: For data that's best represented as rows and columns, table projections allow you to define a schematized shape or projection in Table storage. Only valid JSON objects can be projected as tables, the enriched document can contain nodes that are not named JSON objects and when projecting these objects, create a valid JSON object with a shaper skill or inline shaping.
 
-+ **Objects**: When you need a JSON representation of your data and enrichments, object projections are saved as blobs.
++ **Objects**: When you need a JSON representation of your data and enrichments, object projections are saved as blobs. Only valid JSON objects can be projected as objects, the enriched document can contain nodes that are not named JSON objects and when projecting these objects, create a valid JSON object with a shaper skill or inline shaping.
 
-+ **Files**: When you need to save the images extracted from the documents, file projections allow you to save the normalized images.
++ **Files**: When you need to save the images extracted from the documents, file projections allow you to save the normalized images to blob storage.
 
 To see projections defined in context, step through [How to get started with knowledge store](knowledge-store-howto.md).
 
@@ -44,7 +44,7 @@ This independence implies that you can have the same data shaped differently, ye
 
 ### Relatedness
 
-Projection groups now allow you to project your documents across projection types while preserving the relationships across projection types. All content projected within a single projection group preserves relationships within the data across projection types. Within tables, relationships are based on a generated key and each child node retains a reference to the parent node. Across types (tables, objects, and files), relationships are preserved when a single node is projected across different types. For example, consider a scenario where you have a document containing images and text. You could project the text to tables or objects and the images to files where the tables or objects have a property containing the file URL.
+Projection groups now allow you to project your documents across projection types while preserving the relationships across projection types. All content projected within a single projection group preserves relationships within the data across projection types. Within tables, relationships are based on a generated key and each child node retains a reference to the parent node. Across types (tables, objects and files), relationships are preserved when a single node is projected across different types. For example, consider a scenario where you have a document containing images and text. You could project the text to tables or objects and the images to files where the tables or objects have a column/property containing the file URL.
 
 ## Input shaping
 
diff --git a/articles/search/search-howto-incremental-index.md b/articles/search/search-howto-incremental-index.md
@@ -39,13 +39,35 @@ api-key: [admin key]
 
 ### Step 2: Add the cache property
 
+<<<<<<< HEAD
+Edit the response from the GET request to add the `cache` property to the indexer. The cache object requires only a single property, `storageConnectionString` which is the connection string to the storage account. 
+=======
 Edit the response from the GET request to add the `cache` property to the indexer. The cache object requires only a single property, and that is the connection string to an Azure Storage account.
+>>>>>>> 3519a330aa86b6827d31403690529105825b1b16
 
 ```json
-    "cache": {
-        "storageConnectionString": "[your storage connection string]"
+{
+    "name": "myIndexerName",
+    "targetIndexName": "myIndex",
+    "dataSourceName": "myDatasource",
+    "skillsetName": "mySkillset",
+    "cache" : {
+        "storageConnectionString" : "Your storage account connection string",
+        "enableReprocessing": true,
+        "id" : "Auto generated Id you do not need to set"
+    },
+    "fieldMappings" : [],
+    "outputFieldMappings": [],
+    "parameters": {
+        "configuration": {
+            "enableAnnotationCache": true
+        }
     }
+}
 ```
+#### Enable reporocessing
+
+You can optionally set the `enableReprocessing` boolean property within the cache which is by default set to true. The `enableReprocessing` flag allows you to control the behavior of your indexer. In scenarios where you want the indexer to prioritize adding new documents to the index, you would set the flag to false. Once your indexer is caught up with the new documents, flipping the flag to true would then allow the indexer to start driving existing documents to eventual consistency. During the period when the `enableReprocessing` flag is set to false, the indexer only writes to the cache but will not process any existing documents based on identified changes to the enrichment pipeline.
 
 ### Step 3: Reset the indexer
 

Original file line number	Diff line number	Diff line change
@@ -44,7 +44,7 @@ You'll need to set the `cache` property on the indexer to start benefitting from
`44`	`44`	`},`
`45`	`45`	`"fieldMappings" : [],`
`46`	`46`	`"outputFieldMappings": [],`
`47`		`- "parameters":{}`
	`47`	`+ "parameters": {}`
`48`	`48`	`}`
`49`	`49`	```
`50`	`50`
Original file line number	Diff line number	Diff line change
@@ -58,7 +58,9 @@ A `knowledgeStore` consists of a connection and projections.
`58`	`58`
`59`	`59`	`+ Connection is to a storage account in the same region as Azure Cognitive Search.`
`60`	`60`
`61`		-+ Projections are tables-objects pairs. `Tables` define the physical expression of enriched documents in Azure Table storage. `Objects` define the physical objects in Azure Blob storage.
	`61`	++ Projections can be tabular, JSON objects or files. `Tables` define the physical expression of enriched documents in Azure Table storage. `Objects` define the physical JSON objects in Azure Blob storage. `Files` are binaries like images that were extracted from the document that will be persisted.
	`62`	`+`
	`63`	++ Projections is a collection of projection objects, each projection object can contain `tables`, `objects` and `files`. Enrichments projected within a single projection are related even when projected across types (tables, objects or files). Projections across projection objects are not related and are independent. The same shape can be projected aross multiple projection objects.
`62`	`64`
`63`	`65`	```json
`64`	`66`	`{`
@@ -106,7 +108,10 @@ A `knowledgeStore` consists of a connection and projections.
`106`	`108`	`],`
`107`	`109`	`"objects": [`
`108`	`110`
`109`		`- ]`
	`111`	`+ ],`
	`112`	`+ "files": [`
	`113`	`+`
	`114`	`+ ]`
`110`	`115`	`},`
`111`	`116`	`{`
`112`	`117`	`"tables": [`
@@ -118,13 +123,17 @@ A `knowledgeStore` consists of a connection and projections.
`118`	`123`	`"source": "/document/Review",`
`119`	`124`	`"key": "/document/Review/Id"`
`120`	`125`	`}`
`121`		`- ]`
	`126`	`+ ],`
	`127`	`+ "files": [`
	`128`	`+`
	`129`	`+ ]`
`122`	`130`	`}`
`123`	`131`	`]`
`124`	`132`	`}`
`125`	`133`	`}`
`126`	`134`	```
`127`	`135`
	`136`	`+This sample does not contain any images, for an example of how to use file projections see [Working with projections](knowledge-store-projection-overview.md).`
`128`	`137`	`### Sources of data for a knowledge store`
`129`	`138`
`130`	`139`	`If a knowledge store is output from an AI enrichment pipeline, what are the inputs? The original data that you want to extract, enrich, and ultimately save to a knowledge store can originate from any Azure data source supported by search indexers:`