You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/cognitive-search-tutorial-blob.md
+14-14Lines changed: 14 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,14 +7,14 @@ author: HeidiSteen
7
7
ms.author: heidist
8
8
ms.service: cognitive-search
9
9
ms.topic: tutorial
10
-
ms.date: 12/10/2021
10
+
ms.date: 01/31/2023
11
11
---
12
12
13
13
# Tutorial: Use REST and AI to generate searchable content from Azure blobs
14
14
15
15
If you have unstructured text or images in Azure Blob Storage, an [AI enrichment pipeline](cognitive-search-concept-intro.md) can extract information and create new content for full-text search or knowledge mining scenarios.
16
16
17
-
In this REST tutorial, you will learn how to:
17
+
In this REST tutorial, you'll learn how to:
18
18
19
19
> [!div class="checklist"]
20
20
> * Set up a development environment.
@@ -39,12 +39,12 @@ The skillset is attached to an [indexer](search-indexer-overview.md). It uses bu
> You can use the free service for this tutorial. A free search service limits you to three indexes, three indexers, and three data sources. This tutorial creates one of each. Before starting, make sure you have room on your service to accept the new resources.
44
44
45
45
## Download files
46
46
47
-
The sample data consists of 14 files of mixed content type that you will upload to Azure Blob Storage in a later step.
47
+
The sample data consists of 14 files of mixed content type that you'll upload to Azure Blob Storage in a later step.
48
48
49
49
1. Get the files from [azure-search-sample-data/ai-enrichment-mixed-media/](https://github.com/Azure-Samples/azure-search-sample-data/tree/master/ai-enrichment-mixed-media) and copy them to your local computer.
50
50
@@ -58,7 +58,7 @@ If possible, create both in the same region and resource group for proximity and
58
58
59
59
### Start with Azure Storage
60
60
61
-
1.[Sign in to the Azure portal](https://portal.azure.com/) and click**+ Create Resource**.
61
+
1.[Sign in to the Azure portal](https://portal.azure.com/) and select**+ Create Resource**.
62
62
63
63
1. Search for *storage account* and select Microsoft's Storage Account offering.
64
64
@@ -114,7 +114,7 @@ You can use the Free tier to complete this walkthrough.
114
114
115
115
### Copy an admin api-key and URL for Azure Cognitive Search
116
116
117
-
To interact with your Azure Cognitive Search service you will need the service URL and an access key.
117
+
To interact with your Azure Cognitive Search service you'll need the service URL and an access key.
118
118
119
119
1. [Sign in to the Azure portal](https://portal.azure.com/), and in your search service **Overview** page, get the name of your search service. You can confirm your service name by reviewing the endpoint URL. If your endpoint URL were `https://mydemo.search.windows.net`, your service name would be `mydemo`.
120
120
@@ -126,9 +126,9 @@ All HTTP requests to a search service require an API key. A valid key establishe
126
126
127
127
## 2 - Set up Postman
128
128
129
-
1. Start Postman, import the collection, and set up the environment variables. If you are unfamiliar with this tool, see [Explore Azure Cognitive Search REST APIs](search-get-started-rest.md).
129
+
1. Start Postman, import the collection, and set up the environment variables. If you're unfamiliar with this tool, see [Explore Azure Cognitive Search REST APIs](search-get-started-rest.md).
130
130
131
-
1. You will need to provide a search service name, an admin API key, an index name, a connection string to your Azure Storage account, and the container name.
131
+
1. You'll need to provide a search service name, an admin API key, an index name, a connection string to your Azure Storage account, and the container name.
132
132
133
133
:::image type="content" source="media/cognitive-search-tutorial-blob/postman-setup.png" alt-text="Screenshot of the Variables page in Postman." border="true":::
134
134
@@ -151,10 +151,10 @@ Call [Create Data Source](/rest/api/searchservice/create-data-source) to set the
151
151
"description" : "Demo files to demonstrate cognitive search capabilities.",
@@ -174,7 +174,7 @@ Call [Create Skillset](/rest/api/searchservice/create-skillset) to specify which
174
174
| Skill | Description |
175
175
|-----------------------|----------------|
176
176
| [Optical Character Recognition](cognitive-search-skill-ocr.md) | Recognizes text and numbers in image files. |
177
-
| [Text Merge](cognitive-search-skill-textmerger.md) | Creates "merged content" that recombines previously separated content, useful for documents with embedded images (PDF, DOCX, and so forth). Images and text are separated during the document cracking phase. The merge skill recombines them by inserting any recognized text, image captions, or tags created during enrichment into the same location where the image was extracted from in the document. </p>When working with merged content in a skillset, this node will be inclusive of all text in the document, including text-only documents that never undergo OCR or image analysis. |
177
+
| [Text Merge](cognitive-search-skill-textmerger.md) | Creates "merged content" that recombines previously separated content, useful for documents with embedded images (PDF, DOCX, and so forth). Images and text are separated during the document cracking phase. The merge skill recombines them by inserting any recognized text, image captions, or tags created during enrichment into the same location where the image was extracted from in the document. </p>When you're working with merged content in a skillset, this node will be inclusive of all text in the document, including text-only documents that never undergo OCR or image analysis. |
178
178
| [Language Detection](cognitive-search-skill-language-detection.md) | Detects the language and outputs either a language name or code. In multilingual data sets, a language field can be useful for filters. |
179
179
| [Entity Recognition](cognitive-search-skill-entity-recognition-v3.md) | Extracts the names of people, organizations, and locations from merged content. |
180
180
| [Text Split](cognitive-search-skill-textsplit.md) | Breaks large merged content into smaller chunks before calling the key phrase extraction skill. Key phrase extraction accepts inputs of 50,000 characters or less. A few of the sample files need splitting up to fit within this limit. |
@@ -526,9 +526,9 @@ Call [Create Indexer](/rest/api/searchservice/create-indexer) to drive the pipel
526
526
527
527
The script sets ```"maxFailedItems"``` to -1, which instructs the indexing engine to ignore errors during data import. This is acceptable because there are so few documents in the demo data source. For a larger data source, you would set the value to greater than 0.
528
528
529
-
The ```"dataToExtract":"contentAndMetadata"``` statement tells the indexer to automatically extract the content from different file formats as well as metadata related to each file.
529
+
The ```"dataToExtract":"contentAndMetadata"``` statement tells the indexer to automatically extract the values from the blob's content property and the metadata of each object.
530
530
531
-
When content is extracted, you can set ```imageAction``` to extract text from images found in the data source. The ```"imageAction":"generateNormalizedImages"``` configuration, combined with the OCR Skill and Text Merge Skill, tells the indexer to extract text from the images (for example, the word "stop" from a traffic Stop sign), and embed it as part of the content field. This behavior applies to both the images embedded in the documents (think of an image inside a PDF), as well as images found in the data source, for instance a JPG file.
531
+
When content is extracted, you can set ```imageAction``` to extract text from images found in the data source. The ```"imageAction":"generateNormalizedImages"``` configuration, combined with the OCR Skill and Text Merge Skill, tells the indexer to extract text from the images (for example, the word "stop" from a traffic Stop sign), and embed it as part of the content field. This behavior applies to both embedded images (think of an image inside a PDF) and standalone image files, for instance a JPG file.
532
532
533
533
## 4 - Monitor indexing
534
534
@@ -565,7 +565,7 @@ Recall that we started with blob content, where the entire document is packaged
565
565
1. For the next query, apply a filter. Recall that the language field and all entity fields are filterable.
566
566
567
567
```http
568
-
GET /indexes/{{index_name}}/docs?search=*&$filter=organizations/any(organizations: organizations eq 'NASDAQ')&$select=metadata_storage_name,organizations&$count=true&api-version=2020-06-30
568
+
GET /indexes/{{index_name}}/docs?search=*&$filter=organizations/any(organizations: organizations eq 'Microsoft')&$select=metadata_storage_name,organizations&$count=true&api-version=2020-06-30
569
569
```
570
570
571
571
These queries illustrate a few of the ways you can work with query syntax and filters on new fields created by cognitive search. For more query examples, see [Examples in Search Documents REST API](/rest/api/searchservice/search-documents#bkmk_examples), [Simple syntax query examples](search-query-simple-examples.md), and [Full Lucene query examples](search-query-lucene-examples.md).
Copy file name to clipboardExpand all lines: articles/search/knowledge-store-projection-shape.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,16 +7,16 @@ author: HeidiSteen
7
7
ms.author: heidist
8
8
ms.service: cognitive-search
9
9
ms.topic: conceptual
10
-
ms.date: 10/15/2021
10
+
ms.date: 01/31/2023
11
11
---
12
12
13
13
# Shaping data for projection into a knowledge store
14
14
15
15
In Azure Cognitive Search, "shaping data" describes a step in the [knowledge store workflow](knowledge-store-concept-intro.md) that creates a data representation of the content that you want to project into tables, objects, and files in Azure Storage.
16
16
17
-
As skills execute, the outputs are written to an enrichment tree in a hierarchy of nodes, and while you might want to view and consume the enrichment tree in its entirety, it's more likely that you will want a finer grain, creating subsets of nodes for different scenarios, such as placing the nodes related to translated text or extracted entities in specific tables.
17
+
As skills execute, the outputs are written to an enrichment tree in a hierarchy of nodes, and while you might want to view and consume the enrichment tree in its entirety, it's more likely that you'll want a finer grain, creating subsets of nodes for different scenarios, such as placing the nodes related to translated text or extracted entities in specific tables.
18
18
19
-
By itself, the enrichment tree does not include logic that would inform how its content is represented in a knowledge store. Data shapes fill this gap by providing the schema of what goes into each table, object, and file projection. You can think of a data shape as a custom definition or view of the enriched data. You can create as many shapes as you need, and then assign them to [projections](knowledge-store-projection-overview.md) in a knowledge store definition.
19
+
By itself, the enrichment tree doesn't include logic that would inform how its content is represented in a knowledge store. Data shapes fill this gap by providing the schema of what goes into each table, object, and file projection. You can think of a data shape as a custom definition or view of the enriched data. You can create as many shapes as you need, and then assign them to [projections](knowledge-store-projection-overview.md) in a knowledge store definition.
20
20
21
21
## Approaches for creating shapes
22
22
@@ -26,7 +26,7 @@ There are two ways to shape enriched content to that it can be projected into a
26
26
27
27
+ Use an inline shape within the projection definition itself.
28
28
29
-
Using the Shaper skill externalizes the shape so that it can be used by multiple projections or even other skills. It also ensures that all the mutations of the enrichment tree are contained within the skill, and that the output is an object that can be reused. In contrast, inline shaping allows you to create the shape you need, but is an anonymous object and is only available to the projection for which it is defined.
29
+
Using the Shaper skill externalizes the shape so that it can be used by multiple projections or even other skills. It also ensures that all the mutations of the enrichment tree are contained within the skill, and that the output is an object that can be reused. In contrast, inline shaping allows you to create the shape you need, but is an anonymous object and is only available to the projection for which it's defined.
30
30
31
31
The approaches can be used together or separately. This article shows both: a Shaper skill for the table projections, and inline shaping with the key phrases table projection.
32
32
@@ -110,7 +110,7 @@ Within a Shaper skill, an input can have a `sourceContext` element. This same pr
110
110
111
111
`sourceContext` is used to construct multi-level, nested objects in an enrichment pipeline. If the input is at a *different* context than the skill context, use the *sourceContext*. The *sourceContext* requires you to define a nested input with the specific element being addressed as the source.
112
112
113
-
In the example above, sentiment analysis and key phrases extraction was performed on text that was split into pages for more efficient analysis. Assuming you want the scores and phrases projected into a table, you will now need to set the context to nested input that provides the score and phrase.
113
+
In the example above, sentiment analysis and key phrases extraction was performed on text that was split into pages for more efficient analysis. Assuming you want the scores and phrases projected into a table, you'll now need to set the context to nested input that provides the score and phrase.
114
114
115
115
### Projecting a shape into multiple tables
116
116
@@ -222,7 +222,7 @@ One observation from both the approaches is how values of "Keyphrases" are proje
222
222
223
223
You can generate a new shape using the Shaper skill or use inline shaping of the object projection. While the tables example demonstrated the approach of creating a shape and slicing, this example demonstrates the use of inline shaping.
224
224
225
-
Inline shaping is the ability to create a new shape in the definition of the inputs to a projection. Inline shaping creates an anonymous object that is identical to what a Shaper skill would produce (in this case, `projectionShape`). Inline shaping is useful if you are defining a shape that you do not plan to reuse.
225
+
Inline shaping is the ability to create a new shape in the definition of the inputs to a projection. Inline shaping creates an anonymous object that is identical to what a Shaper skill would produce (in this case, `projectionShape`). Inline shaping is useful if you're defining a shape that you don't plan to reuse.
226
226
227
227
The projections property is an array. This example adds a new projection instance to the array, where the knowledgeStore definition contains inline projections. When using inline projections, you can omit the Shaper skill.
0 commit comments