Skip to content

Commit f128073

Browse files
authored
Merge pull request #225816 from HeidiSteen/heidist-fresh
[azure search] January freshness, part 2
2 parents 437e32b + 473302f commit f128073

9 files changed

+88
-84
lines changed

articles/search/cognitive-search-incremental-indexing-conceptual.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ author: HeidiSteen
77
ms.author: heidist
88
ms.service: cognitive-search
99
ms.topic: conceptual
10-
ms.date: 10/17/2021
10+
ms.date: 01/31/2023
1111
---
1212

1313
# Incremental enrichment and caching in Azure Cognitive Search

articles/search/cognitive-search-tutorial-blob.md

Lines changed: 14 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,14 @@ author: HeidiSteen
77
ms.author: heidist
88
ms.service: cognitive-search
99
ms.topic: tutorial
10-
ms.date: 12/10/2021
10+
ms.date: 01/31/2023
1111
---
1212

1313
# Tutorial: Use REST and AI to generate searchable content from Azure blobs
1414

1515
If you have unstructured text or images in Azure Blob Storage, an [AI enrichment pipeline](cognitive-search-concept-intro.md) can extract information and create new content for full-text search or knowledge mining scenarios.
1616

17-
In this REST tutorial, you will learn how to:
17+
In this REST tutorial, you'll learn how to:
1818

1919
> [!div class="checklist"]
2020
> * Set up a development environment.
@@ -39,12 +39,12 @@ The skillset is attached to an [indexer](search-indexer-overview.md). It uses bu
3939
* [Azure Cognitive Search](https://azure.microsoft.com/services/search/)
4040
* [Sample data](https://github.com/Azure-Samples/azure-search-sample-data/tree/master/ai-enrichment-mixed-media)
4141

42-
> [!Note]
42+
> [!NOTE]
4343
> You can use the free service for this tutorial. A free search service limits you to three indexes, three indexers, and three data sources. This tutorial creates one of each. Before starting, make sure you have room on your service to accept the new resources.
4444
4545
## Download files
4646

47-
The sample data consists of 14 files of mixed content type that you will upload to Azure Blob Storage in a later step.
47+
The sample data consists of 14 files of mixed content type that you'll upload to Azure Blob Storage in a later step.
4848

4949
1. Get the files from [azure-search-sample-data/ai-enrichment-mixed-media/](https://github.com/Azure-Samples/azure-search-sample-data/tree/master/ai-enrichment-mixed-media) and copy them to your local computer.
5050

@@ -58,7 +58,7 @@ If possible, create both in the same region and resource group for proximity and
5858

5959
### Start with Azure Storage
6060

61-
1. [Sign in to the Azure portal](https://portal.azure.com/) and click **+ Create Resource**.
61+
1. [Sign in to the Azure portal](https://portal.azure.com/) and select **+ Create Resource**.
6262

6363
1. Search for *storage account* and select Microsoft's Storage Account offering.
6464

@@ -114,7 +114,7 @@ You can use the Free tier to complete this walkthrough.
114114
115115
### Copy an admin api-key and URL for Azure Cognitive Search
116116
117-
To interact with your Azure Cognitive Search service you will need the service URL and an access key.
117+
To interact with your Azure Cognitive Search service you'll need the service URL and an access key.
118118
119119
1. [Sign in to the Azure portal](https://portal.azure.com/), and in your search service **Overview** page, get the name of your search service. You can confirm your service name by reviewing the endpoint URL. If your endpoint URL were `https://mydemo.search.windows.net`, your service name would be `mydemo`.
120120
@@ -126,9 +126,9 @@ All HTTP requests to a search service require an API key. A valid key establishe
126126
127127
## 2 - Set up Postman
128128
129-
1. Start Postman, import the collection, and set up the environment variables. If you are unfamiliar with this tool, see [Explore Azure Cognitive Search REST APIs](search-get-started-rest.md).
129+
1. Start Postman, import the collection, and set up the environment variables. If you're unfamiliar with this tool, see [Explore Azure Cognitive Search REST APIs](search-get-started-rest.md).
130130
131-
1. You will need to provide a search service name, an admin API key, an index name, a connection string to your Azure Storage account, and the container name.
131+
1. You'll need to provide a search service name, an admin API key, an index name, a connection string to your Azure Storage account, and the container name.
132132
133133
:::image type="content" source="media/cognitive-search-tutorial-blob/postman-setup.png" alt-text="Screenshot of the Variables page in Postman." border="true":::
134134
@@ -151,10 +151,10 @@ Call [Create Data Source](/rest/api/searchservice/create-data-source) to set the
151151
"description" : "Demo files to demonstrate cognitive search capabilities.",
152152
"type" : "azureblob",
153153
"credentials" : {
154-
"connectionString": "{{azure-storage-connection-string}}"
154+
"connectionString": "{{azure_storage_connection_string}}"
155155
},
156156
"container" : {
157-
"name" : "{{blob-container}}"
157+
"name" : "{{container_name}}"
158158
}
159159
}
160160
```
@@ -174,7 +174,7 @@ Call [Create Skillset](/rest/api/searchservice/create-skillset) to specify which
174174
| Skill | Description |
175175
|-----------------------|----------------|
176176
| [Optical Character Recognition](cognitive-search-skill-ocr.md) | Recognizes text and numbers in image files. |
177-
| [Text Merge](cognitive-search-skill-textmerger.md) | Creates "merged content" that recombines previously separated content, useful for documents with embedded images (PDF, DOCX, and so forth). Images and text are separated during the document cracking phase. The merge skill recombines them by inserting any recognized text, image captions, or tags created during enrichment into the same location where the image was extracted from in the document. </p>When working with merged content in a skillset, this node will be inclusive of all text in the document, including text-only documents that never undergo OCR or image analysis. |
177+
| [Text Merge](cognitive-search-skill-textmerger.md) | Creates "merged content" that recombines previously separated content, useful for documents with embedded images (PDF, DOCX, and so forth). Images and text are separated during the document cracking phase. The merge skill recombines them by inserting any recognized text, image captions, or tags created during enrichment into the same location where the image was extracted from in the document. </p>When you're working with merged content in a skillset, this node will be inclusive of all text in the document, including text-only documents that never undergo OCR or image analysis. |
178178
| [Language Detection](cognitive-search-skill-language-detection.md) | Detects the language and outputs either a language name or code. In multilingual data sets, a language field can be useful for filters. |
179179
| [Entity Recognition](cognitive-search-skill-entity-recognition-v3.md) | Extracts the names of people, organizations, and locations from merged content. |
180180
| [Text Split](cognitive-search-skill-textsplit.md) | Breaks large merged content into smaller chunks before calling the key phrase extraction skill. Key phrase extraction accepts inputs of 50,000 characters or less. A few of the sample files need splitting up to fit within this limit. |
@@ -526,9 +526,9 @@ Call [Create Indexer](/rest/api/searchservice/create-indexer) to drive the pipel
526526
527527
The script sets ```"maxFailedItems"``` to -1, which instructs the indexing engine to ignore errors during data import. This is acceptable because there are so few documents in the demo data source. For a larger data source, you would set the value to greater than 0.
528528
529-
The ```"dataToExtract":"contentAndMetadata"``` statement tells the indexer to automatically extract the content from different file formats as well as metadata related to each file.
529+
The ```"dataToExtract":"contentAndMetadata"``` statement tells the indexer to automatically extract the values from the blob's content property and the metadata of each object.
530530
531-
When content is extracted, you can set ```imageAction``` to extract text from images found in the data source. The ```"imageAction":"generateNormalizedImages"``` configuration, combined with the OCR Skill and Text Merge Skill, tells the indexer to extract text from the images (for example, the word "stop" from a traffic Stop sign), and embed it as part of the content field. This behavior applies to both the images embedded in the documents (think of an image inside a PDF), as well as images found in the data source, for instance a JPG file.
531+
When content is extracted, you can set ```imageAction``` to extract text from images found in the data source. The ```"imageAction":"generateNormalizedImages"``` configuration, combined with the OCR Skill and Text Merge Skill, tells the indexer to extract text from the images (for example, the word "stop" from a traffic Stop sign), and embed it as part of the content field. This behavior applies to both embedded images (think of an image inside a PDF) and standalone image files, for instance a JPG file.
532532
533533
## 4 - Monitor indexing
534534
@@ -565,7 +565,7 @@ Recall that we started with blob content, where the entire document is packaged
565565
1. For the next query, apply a filter. Recall that the language field and all entity fields are filterable.
566566

567567
```http
568-
GET /indexes/{{index_name}}/docs?search=*&$filter=organizations/any(organizations: organizations eq 'NASDAQ')&$select=metadata_storage_name,organizations&$count=true&api-version=2020-06-30
568+
GET /indexes/{{index_name}}/docs?search=*&$filter=organizations/any(organizations: organizations eq 'Microsoft')&$select=metadata_storage_name,organizations&$count=true&api-version=2020-06-30
569569
```
570570

571571
These queries illustrate a few of the ways you can work with query syntax and filters on new fields created by cognitive search. For more query examples, see [Examples in Search Documents REST API](/rest/api/searchservice/search-documents#bkmk_examples), [Simple syntax query examples](search-query-simple-examples.md), and [Full Lucene query examples](search-query-lucene-examples.md).

articles/search/knowledge-store-projection-example-long.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ author: HeidiSteen
88
ms.author: heidist
99
ms.service: cognitive-search
1010
ms.topic: conceptual
11-
ms.date: 10/20/2021
11+
ms.date: 01/31/2023
1212
---
1313

1414
# Detailed example of shapes and projections in a knowledge store

articles/search/knowledge-store-projection-shape.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,16 +7,16 @@ author: HeidiSteen
77
ms.author: heidist
88
ms.service: cognitive-search
99
ms.topic: conceptual
10-
ms.date: 10/15/2021
10+
ms.date: 01/31/2023
1111
---
1212

1313
# Shaping data for projection into a knowledge store
1414

1515
In Azure Cognitive Search, "shaping data" describes a step in the [knowledge store workflow](knowledge-store-concept-intro.md) that creates a data representation of the content that you want to project into tables, objects, and files in Azure Storage.
1616

17-
As skills execute, the outputs are written to an enrichment tree in a hierarchy of nodes, and while you might want to view and consume the enrichment tree in its entirety, it's more likely that you will want a finer grain, creating subsets of nodes for different scenarios, such as placing the nodes related to translated text or extracted entities in specific tables.
17+
As skills execute, the outputs are written to an enrichment tree in a hierarchy of nodes, and while you might want to view and consume the enrichment tree in its entirety, it's more likely that you'll want a finer grain, creating subsets of nodes for different scenarios, such as placing the nodes related to translated text or extracted entities in specific tables.
1818

19-
By itself, the enrichment tree does not include logic that would inform how its content is represented in a knowledge store. Data shapes fill this gap by providing the schema of what goes into each table, object, and file projection. You can think of a data shape as a custom definition or view of the enriched data. You can create as many shapes as you need, and then assign them to [projections](knowledge-store-projection-overview.md) in a knowledge store definition.
19+
By itself, the enrichment tree doesn't include logic that would inform how its content is represented in a knowledge store. Data shapes fill this gap by providing the schema of what goes into each table, object, and file projection. You can think of a data shape as a custom definition or view of the enriched data. You can create as many shapes as you need, and then assign them to [projections](knowledge-store-projection-overview.md) in a knowledge store definition.
2020

2121
## Approaches for creating shapes
2222

@@ -26,7 +26,7 @@ There are two ways to shape enriched content to that it can be projected into a
2626

2727
+ Use an inline shape within the projection definition itself.
2828

29-
Using the Shaper skill externalizes the shape so that it can be used by multiple projections or even other skills. It also ensures that all the mutations of the enrichment tree are contained within the skill, and that the output is an object that can be reused. In contrast, inline shaping allows you to create the shape you need, but is an anonymous object and is only available to the projection for which it is defined.
29+
Using the Shaper skill externalizes the shape so that it can be used by multiple projections or even other skills. It also ensures that all the mutations of the enrichment tree are contained within the skill, and that the output is an object that can be reused. In contrast, inline shaping allows you to create the shape you need, but is an anonymous object and is only available to the projection for which it's defined.
3030

3131
The approaches can be used together or separately. This article shows both: a Shaper skill for the table projections, and inline shaping with the key phrases table projection.
3232

@@ -110,7 +110,7 @@ Within a Shaper skill, an input can have a `sourceContext` element. This same pr
110110

111111
`sourceContext` is used to construct multi-level, nested objects in an enrichment pipeline. If the input is at a *different* context than the skill context, use the *sourceContext*. The *sourceContext* requires you to define a nested input with the specific element being addressed as the source.
112112

113-
In the example above, sentiment analysis and key phrases extraction was performed on text that was split into pages for more efficient analysis. Assuming you want the scores and phrases projected into a table, you will now need to set the context to nested input that provides the score and phrase.
113+
In the example above, sentiment analysis and key phrases extraction was performed on text that was split into pages for more efficient analysis. Assuming you want the scores and phrases projected into a table, you'll now need to set the context to nested input that provides the score and phrase.
114114

115115
### Projecting a shape into multiple tables
116116

@@ -222,7 +222,7 @@ One observation from both the approaches is how values of "Keyphrases" are proje
222222

223223
You can generate a new shape using the Shaper skill or use inline shaping of the object projection. While the tables example demonstrated the approach of creating a shape and slicing, this example demonstrates the use of inline shaping.
224224

225-
Inline shaping is the ability to create a new shape in the definition of the inputs to a projection. Inline shaping creates an anonymous object that is identical to what a Shaper skill would produce (in this case, `projectionShape`). Inline shaping is useful if you are defining a shape that you do not plan to reuse.
225+
Inline shaping is the ability to create a new shape in the definition of the inputs to a projection. Inline shaping creates an anonymous object that is identical to what a Shaper skill would produce (in this case, `projectionShape`). Inline shaping is useful if you're defining a shape that you don't plan to reuse.
226226

227227
The projections property is an array. This example adds a new projection instance to the array, where the knowledgeStore definition contains inline projections. When using inline projections, you can omit the Shaper skill.
228228

0 commit comments

Comments
 (0)