Skip to content

Commit 903b121

Browse files
authored
Merge pull request #174942 from HeidiSteen/heidist-fresh
[azure search] AI quickstarts and landing page edits
2 parents 4112ea6 + d4e36f0 commit 903b121

20 files changed

+246
-129
lines changed

articles/search/TOC.yml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,11 +17,13 @@
1717
href: search-get-started-portal.md
1818
- name: Create a demo app
1919
href: search-create-app-portal.md
20-
- name: Create a skillset
21-
href: cognitive-search-quickstart-blob.md
2220
- name: Create a knowledge store
2321
href: knowledge-store-create-portal.md
24-
- name: Query with Search explorer
22+
- name: Translate text and recognize entities
23+
href: cognitive-search-quickstart-blob.md
24+
- name: Apply OCR and image analysis
25+
href: cognitive-search-quickstart-ocr.md
26+
- name: Query with Search Explorer
2527
href: search-explorer.md
2628
- name: Visual Studio Code
2729
href: search-get-started-vs-code.md

articles/search/cognitive-search-quickstart-blob.md

Lines changed: 47 additions & 74 deletions
Large diffs are not rendered by default.
Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
---
2+
title: "Quickstart: OCR and image analysis"
3+
titleSuffix: Azure Cognitive Search
4+
description: Use the Import Data wizard and AI cognitive skills to apply OCR and image analysis to identify and extract searchable text from image files.
5+
6+
manager: nitinme
7+
author: HeidiSteen
8+
ms.author: heidist
9+
ms.service: cognitive-search
10+
ms.topic: quickstart
11+
ms.date: 10/07/2021
12+
---
13+
14+
# Quickstart: Apply OCR and image analysis using the Import data wizard
15+
16+
Learn how AI enrichment in Azure Cognitive Search adds Optical Character Recognition (OCR) and image analysis to create searchable content from image files.
17+
18+
In this quickstart, you'll run the **Import data** wizard to analyze visual content in JPG files. The content consists of photographs of signs. Output is a searchable index containing captions, tags, and text identified through OCR, all of which is queryable in the portal using [Search explorer](search-explorer.md).
19+
20+
To prepare, you'll create a few resources and upload sample files before running the wizard.
21+
22+
Prefer to start with code? Try the [.NET tutorial](cognitive-search-tutorial-blob-dotnet.md), [Python tutorial](cognitive-search-tutorial-blob-python.md), or [REST tutorial](cognitive-search-tutorial-blob-dotnet.md) instead.
23+
24+
## Prerequisites
25+
26+
Before you begin, have the following prerequisites in place:
27+
28+
+ An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/).
29+
30+
+ Azure Cognitive Search service. [Create a service](search-create-service-portal.md) or [find an existing service](https://ms.portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Search%2FsearchServices) under your current subscription. You can use a free service for this quickstart.
31+
32+
+ Azure Storage account with Blob Storage. [Create a storage account](../storage/common/storage-account-create.md?tabs=azure-portal) or [find an existing account](https://ms.portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Storage%2storageAccounts/).
33+
34+
+ Choose the same subscription if you want the wizard to find your storage account and set up the connection.
35+
+ Choose the same region as Azure Cognitive Search to avoid bandwidth charges.
36+
+ Choose the StorageV2 (general purpose V2).
37+
38+
> [!NOTE]
39+
> This quickstart also uses [Cognitive Services](https://azure.microsoft.com/services/cognitive-services/) for the AI. Because the workload is so small, Cognitive Services is tapped behind the scenes for free processing for up to 20 transactions. This means that you can complete this exercise without having to create an additional Cognitive Services resource.
40+
41+
## Set up your data
42+
43+
In the following steps, set up a blob container in Azure Storage to store heterogeneous content files.
44+
45+
1. [Download sample data](https://github.com/Azure-Samples/azure-search-sample-data) from GitHub. There are multiple data sets. Use the files in the **unsplash-images\jpg-signs** folder for this quickstart.
46+
47+
1. Upload the sample data to a blob container.
48+
49+
1. Sign in to the [Azure portal](https://portal.azure.com/) and find your storage account.
50+
1. In the left nave pane, select **Containers**.
51+
1. [Create a container](../storage/blobs/storage-quickstart-blobs-portal.md#create-a-container) named "signs". Use the default public access level.
52+
1. In the "signs" container, select **Upload** to upload the files from your local **unsplash-images\jpg-signs** folder.
53+
54+
You should have 10 files containing photographs of signs. There is a second subfolder that includes landmark buildings. If your search service is Basic or above, include the second set of files if you want to evaluate image analysis on files that do not include pictures of text.
55+
56+
You are now ready to move on the Import data wizard.
57+
58+
## Run the Import data wizard
59+
60+
1. Sign in to the [Azure portal](https://portal.azure.com/) with your Azure account.
61+
62+
1. [Find your search service](https://ms.portal.azure.com/#blade/HubsExtension/BrowseResourceBlade/resourceType/Microsoft.Storage%2storageAccounts/) and on the Overview page, click **Import data** on the command bar to set up cognitive enrichment in four steps.
63+
64+
:::image type="content" source="media/search-import-data-portal/import-data-cmd.png" alt-text="Screenshot of the Import data command" border="true":::
65+
66+
### Step 1 - Create a data source
67+
68+
1. In **Connect to your data**, choose **Azure Blob Storage**. Choose an existing connection to the storage account and container you created. Give the data source a name, and use default values for the rest.
69+
70+
:::image type="content" source="media/cognitive-search-quickstart-blob/connect-to-signs.png" alt-text="Azure blob configuration" border="true":::
71+
72+
### Step 2 - Add cognitive skills
73+
74+
Next, configure AI enrichment to invoke OCR and image analysis.
75+
76+
1. For this quickstart, we are using the **Free** Cognitive Services resource. The sample data consists of 19 files, so the daily, per-indexer allotment of 20 free transactions on Cognitive Services is sufficient for this quickstart.
77+
78+
:::image type="content" source="media/cognitive-search-quickstart-blob/free-enrichments.png" alt-text="Attach free Cognitive Services processing" border="true":::
79+
80+
1. In the same page, expand **Add enrichments** and make tree selections:
81+
82+
Enable OCR and merge all text into merged_content field.
83+
84+
Choose "Generate tags from images" and "Generate captions from images".
85+
86+
:::image type="content" source="media/cognitive-search-quickstart-blob/select-ocr-image-enrichments.png" alt-text="Attach Cognitive Services select services for skillset" border="true":::
87+
88+
For image analysis, images are split from text during document cracking. The "merged_content" field re-associates text and images in the AI enrichment pipeline.
89+
90+
### Step 3 - Configure the index
91+
92+
An index contains your searchable content and the **Import data** wizard can usually infer the schema for you by sampling the data. In this step, review the generated schema and potentially revise any settings. Below is the default schema created for the demo data set.
93+
94+
For this quickstart, the wizard does a good job setting reasonable defaults:
95+
96+
+ Default fields are based on properties for existing blobs plus new fields to contain enrichment output (for example, `text`, `layoutText`, `imageCaption`). Data types are inferred from metadata and by data sampling.
97+
98+
+ Default document key is *metadata_storage_path* (selected because the field contains unique values).
99+
100+
+ Default attributes are **Retrievable** and **Searchable**. **Searchable** allows full text search a field. **Retrievable** means field values can be returned in results. The wizard assumes you want these fields to be retrievable and searchable because you created them via a skillset.
101+
102+
:::image type="content" source="media/cognitive-search-quickstart-blob/index-fields-ocr-images.png" alt-text="Index fields" border="true":::
103+
104+
Marking a field as **Retrievable** does not mean that the field *must* be present in the search results. You can precisely control search results composition by using the **$select** query parameter to specify which fields to include. For text-heavy fields like `content`, the **$select** parameter is your solution for shaping manageable search results to the human users of your application, while ensuring client code has access to all the information it needs via the **Retrievable** attribute.
105+
106+
### Step 4 - Configure the indexer
107+
108+
The indexer is a high-level resource that drives the indexing process. It specifies the data source name, a target index, and frequency of execution. The **Import data** wizard creates several objects, and of them is always an indexer that you can run repeatedly.
109+
110+
1. In the **Indexer** page, you can accept the default name and click the **Once** schedule option to run it immediately.
111+
112+
:::image type="content" source="media/cognitive-search-quickstart-blob/indexer-signs.png" alt-text="Indexer definition" border="true":::
113+
114+
1. Click **Submit** to create and simultaneously run the indexer.
115+
116+
## Monitor status
117+
118+
Cognitive skills indexing takes longer to complete than typical text-based indexing. To monitor progress, go to the Overview page and select the **Indexers** tab in the middle of page.
119+
120+
:::image type="content" source="media/cognitive-search-quickstart-blob/indexer-status-signs.png" alt-text="Indexer status" border="true":::
121+
122+
To check details about execution status, select an indexer from the list.
123+
124+
## Query in Search explorer
125+
126+
After an index is created, you can run queries to return results. In the portal, use **Search explorer** for this task.
127+
128+
1. On the search service dashboard page, click **Search explorer** on the command bar.
129+
130+
1. Select **Change Index** at the top to select the index you created.
131+
132+
1. In Query string, enter a search string to query the index, such as `search=sign&searchFields=imageTags&$select=text,imageCaption,imageTags&$count=true`, and then select **Search**.
133+
134+
:::image type="content" source="media/cognitive-search-quickstart-blob/search-explorer-query-string-signs.png" alt-text="Query string in search explorer" border="true":::
135+
136+
Results are returned as JSON, which can be verbose and hard to read, especially in large documents originating from Azure blobs. Some tips for searching in this tool include the following techniques:
137+
138+
+ Append `$select` to specify which fields to include in results.
139+
+ Append `searchField` to scope full text search to specific fields.
140+
+ Use CTRL-F to search within the JSON for specific properties or terms.
141+
142+
:::image type="content" source="media/cognitive-search-quickstart-blob/search-explorer-results-signs.png" alt-text="Search explorer example" border="true":::
143+
144+
Query strings are case-sensitive so if you get an "unknown field" message, check **Fields** or **Index Definition (JSON)** to verify name and case.
145+
146+
## Clean up resources
147+
148+
When you're working in your own subscription, it's a good idea at the end of a project to identify whether you still need the resources you created. Resources left running can cost you money. You can delete resources individually or delete the resource group to delete the entire set of resources.
149+
150+
You can find and manage resources in the portal, using the **All resources** or **Resource groups** link in the left-navigation pane.
151+
152+
If you are using a free service, remember that you are limited to three indexes, indexers, and data sources. You can delete individual items in the portal to stay under the limit.
153+
154+
## Next steps
155+
156+
Cognitive Search has other built-in skills that can be exercised in the Import data wizard. The next quickstart uses entity recognition, language detection, and text translation.
157+
158+
> [!div class="nextstepaction"]
159+
> [Quickstart: Translate text and recognize entities using the Import data wizard](cognitive-search-quickstart-blob.md)

articles/search/index.yml

Lines changed: 35 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ landingContent:
2929
url: semantic-search-overview.md
3030
- text: What is AI enrichment?
3131
url: cognitive-search-concept-intro.md
32-
- text: What is an indexer?
32+
- text: What are indexers?
3333
url: search-indexer-overview.md
3434
- text: What are Azure Applied AI Services?
3535
url: ../applied-ai-services/what-are-applied-ai-services.md
@@ -39,24 +39,24 @@ landingContent:
3939
url: https://youtu.be/M4Rf1s0WcWQ?t=19
4040

4141
# Card
42-
- title: Index data
42+
- title: Develop apps
4343
linkLists:
44-
- linkListType: overview
44+
- linkListType: tutorial
4545
links:
46-
- text: Data source gallery
47-
url: search-data-sources-gallery.md
48-
- text: Creating indexes
49-
url: search-what-is-an-index.md
50-
- text: Importing data
51-
url: search-what-is-data-import.md
46+
- text: Add search to web apps
47+
url: tutorial-csharp-overview.md
5248
- linkListType: how-to-guide
5349
links:
54-
- text: Index from Azure Blob Storage
55-
url: search-blob-storage-integration.md
56-
- text: Index from Azure SQL Database
57-
url: search-howto-connecting-azure-sql-database-to-azure-search-using-indexers.md
58-
- text: Index from Cosmos DB
59-
url: search-howto-index-cosmosdb.md
50+
- text: Develop in .NET
51+
url: search-howto-dotnet-sdk.md
52+
- text: Call REST APIs in Postman
53+
url: search-get-started-rest.md
54+
- linkListType: reference
55+
links:
56+
- text: Azure SDK for .NET
57+
url: /dotnet/api/overview/azure/search.documents-readme
58+
- text: Azure SDK for Python
59+
url: /python/api/overview/azure/search-documents-readme
6060

6161
# Card
6262
- title: Get started
@@ -69,32 +69,31 @@ landingContent:
6969
url: search-get-started-portal.md
7070
- text: Create a demo app
7171
url: search-create-app-portal.md
72-
- text: Extract text from images and app files
72+
- text: Translate text and recognize entities
7373
url: cognitive-search-quickstart-blob.md
74-
74+
- text: Apply OCR and image analysis
75+
url: cognitive-search-quickstart-ocr.md
76+
77+
7578
# Card
76-
- title: Develop apps
79+
- title: Index data
7780
linkLists:
78-
- linkListType: tutorial
81+
- linkListType: overview
7982
links:
80-
- text: Add search to web apps
81-
url: tutorial-csharp-overview.md
83+
- text: Data source gallery
84+
url: search-data-sources-gallery.md
85+
- text: Creating indexes
86+
url: search-what-is-an-index.md
87+
- text: Importing data
88+
url: search-what-is-data-import.md
8289
- linkListType: how-to-guide
8390
links:
84-
- text: Develop in .NET
85-
url: search-howto-dotnet-sdk.md
86-
- text: Call REST APIs in VS Code
87-
url: search-get-started-vs-code.md
88-
- text: Call REST APIs in Postman
89-
url: search-get-started-rest.md
90-
- linkListType: reference
91-
links:
92-
- text: Azure SDK for .NET
93-
url: /dotnet/api/overview/azure/search.documents-readme
94-
- text: Azure SDK for Python
95-
url: /python/api/overview/azure/search-documents-readme
96-
- text: REST API Reference
97-
url: /rest/api/searchservice/index
91+
- text: Index from Azure Blob Storage
92+
url: search-blob-storage-integration.md
93+
- text: Index from Azure SQL Database
94+
url: search-howto-connecting-azure-sql-database-to-azure-search-using-indexers.md
95+
- text: Index from Cosmos DB
96+
url: search-howto-index-cosmosdb.md
9897

9998
# Card
10099
- title: Enrich with AI
@@ -115,10 +114,6 @@ landingContent:
115114
url: cognitive-search-defining-skillset.md
116115
- text: Create a knowledge store
117116
url: knowledge-store-create-portal.md
118-
- linkListType: reference
119-
links:
120-
- text: Built-in skills
121-
url: cognitive-search-predefined-skills.md
122117

123118
# Card
124119
- title: Query data
@@ -142,16 +137,4 @@ landingContent:
142137
- text: OData language reference
143138
url: query-odata-filter-orderby-syntax.md
144139
- text: Search Documents (REST)
145-
url: /rest/api/searchservice/search-documents
146-
147-
# Card
148-
- title: Help and feedback
149-
linkLists:
150-
- linkListType: reference
151-
links:
152-
- text: FAQ
153-
url: search-faq-frequently-asked-questions.yml
154-
- text: Ask questions on Stack Overflow
155-
url: https://stackoverflow.com/questions/tagged/azure-search
156-
- text: Request a feature
157-
url: https://feedback.azure.com/forums/263029-azure-search
140+
url: /rest/api/searchservice/search-documents
53.2 KB
Loading
52.8 KB
Loading
62.2 KB
Loading
73.4 KB
Loading
69 KB
Loading
30.2 KB
Loading

0 commit comments

Comments
 (0)