Skip to content

Commit 3acff59

Browse files
committed
Image vector doc
1 parent 8d13f22 commit 3acff59

File tree

6 files changed

+38
-45
lines changed

6 files changed

+38
-45
lines changed
151 KB
Loading
32.5 KB
Loading
4.84 KB
Loading
44.7 KB
Loading
-18.7 KB
Loading

articles/search/search-get-started-portal-image-search.md

Lines changed: 38 additions & 45 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ If you're starting with the free service, you're limited to three indexes, three
4646

4747
## Prepare sample data
4848

49-
1. Download the [unsplash-signs image folder](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/unsplash-images/jpg-signs) to a local folder or find some images of your own. On a free search service, keep the image files under 20 to stay under the free quota for enrichment procedssing.
49+
1. Download the [unsplash-signs image folder](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/unsplash-images/jpg-signs) to a local folder or find some images of your own. On a free search service, keep the image files under 20 to stay under the free quota for enrichment processing.
5050

5151
1. Sign in to the [Azure portal](https://portal.azure.com/) with your Azure account, and go to your Azure Storage account.
5252

@@ -80,93 +80,86 @@ The next step is to connect to a data source to use for the search index.
8080

8181
## Vectorize your text
8282

83-
If raw content includes text, this step specifies an embedding model that generates vectors for that content. Azure AI Vision model provides text embeddings, so we'll use that for this step.
83+
If raw content includes text, or if the skillset produces text, the wizard calls a text embedding model to generate vectors for that content. In this exercise, text will be produced from the Optical Character Recognition (OCR) skill that you add in the next step.
8484

85-
1. On the **Vectorize your text** page, select **AI Vision vectorization**. If it's not selectable, make sure Azure AI Search and Azure AI multiservice account are together in a region that [supports AI Vision multimodal APIs](/azure/ai-services/computer-vision/how-to/image-retrieval).
85+
Azure AI Vision model provides text embeddings, so we'll use that model for text vectorization.
8686

87-
:::image type="content" source="media/search-get-started-portal-images/vectorize-your-text.png" alt-text="Screenshot of the vectorize your text page in the wizard.":::
88-
89-
1. Select **Next**.
90-
91-
## Vectorize your images
92-
93-
Use Azure AI Vision to generate a vector representation of the image files.
94-
95-
96-
vectorize-enrich-images.png
87+
1. On the **Vectorize text** page, select **AI Vision vectorization**. If it's not selectable, make sure Azure AI Search and Azure AI multiservice account are together in a region that [supports AI Vision multimodal APIs](/azure/ai-services/computer-vision/how-to/image-retrieval).
9788

98-
99-
1. For AI Vision vectorization, select the account.
100-
1
101-
1. Select the checkbox acknowledging the billing impact of using these resources.
89+
:::image type="content" source="media/search-get-started-portal-images/vectorize-your-text.png" alt-text="Screenshot of the Vectorize your text page in the wizard.":::
10290

10391
1. Select **Next**.
10492

10593
## Vectorize and enrich your images
10694

107-
If your content includes images, you can apply AI in two ways:
108-
109-
+ Use a supported image embedding model from the catalog, or choose the Azure AI Vision multimodal embeddings API to vectorize images.
110-
+ Use OCR to recognize text in images.
95+
Use Azure AI Vision to generate a vector representation of the image files.
11196

112-
Azure AI Search and your Azure AI resource must be in the same region.
97+
In this step, you can also set enrichment options to extract text from images. The wizard uses OCR from Azure AI services to recognize text in image files. Two more outputs appear in the index when OCR is added to the workflow. First, the "chunk" field is populated with the OCR-generated string. Second, the "text_vector" field is populated with an embedding that represents the string. The inclusion of plain text in an index is useful if you want to use relevance features that operate on strings, such as semantic ranker and scoring profiles.
11398

114-
1. Specify the kind of connection the wizard should make. For image vectorization, it can connect to embedding models in Azure AI Studio or Azure AI Vision.
99+
1. On the **Vectorize images** page, select the **Vectorize images** checkbox, and then select **AI Vision vectorization**.
115100

116-
1. Specify the subscription.
101+
1. Select **Use same AI service selected for text vectorization**.
117102

118-
1. For Azure AI Studio model catalog, specify the project and deployment. See [Setting up an embedding model](#set-up-embedding-models) for details.
103+
1. In the enrichment section, select **Extract text from images**.
119104

120-
1. Optionally, you can crack binary images (for example, scanned document files) and [use OCR](cognitive-search-skill-ocr.md) to recognize text.
105+
1. Select **Use same AI service selected for image vectorization**.
121106

122-
1. Select the checkbox acknowledging the billing impact of using these resources.
107+
:::image type="content" source="media/search-get-started-portal-images/vectorize-enrich-images.png" alt-text="Screenshot of the Vectorize your images page in the wizard.":::
123108

124109
1. Select **Next**.
125110

126111
## Advanced settings
127112

128-
1. Optionally, specify a [run time schedule](search-howto-schedule-indexers.md) for the indexer.
113+
1. Specify a [run time schedule](search-howto-schedule-indexers.md) for the indexer. We recommend **Once** for this exercise, but for data sources where the underlying data is volatile, you can schedule indexing to pick up the changes.
114+
115+
:::image type="content" source="media/search-get-started-portal-images/run-once.png" alt-text="Screenshot of the Advanced settings page in the wizard.":::
129116

130117
1. Select **Next**.
131118

132119
## Run the wizard
133120

134-
1. On Review and create, specify a prefix for the objects created when the wizard runs. A common prefix helps you stay organized.
121+
1. On Review and create, specify a prefix for the objects created when the wizard runs. The wizard creates multiple objects. A common prefix helps you stay organized.
135122

136-
1. Select **Create** to run the wizard. This step creates the following objects:
123+
:::image type="content" source="media/search-get-started-portal-images/review-create.png" alt-text="Screenshot of the Review and create page in the wizard.":::
137124

138-
+ Data source connection.
125+
1. Select **Create** to run the wizard. This step creates the following objects:
139126

140-
+ Index with vector fields, vectorizers, vector profiles, vector algorithms. You aren't prompted to design or modify the default index during the wizard workflow. Indexes conform to the [2024-05-01-preview REST API](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2024-05-01-preview&preserve-view=true).
127+
+ Data source connection to blob storage.
141128

142-
+ Skillset with [Text Split skill](cognitive-search-skill-textsplit.md) for chunking and an embedding skill for vectorization. The embedding skill is either the [AzureOpenAIEmbeddingModel skill](cognitive-search-skill-azure-openai-embedding.md) for Azure OpenAI or [AML skill](cognitive-search-aml-skill.md) for Azure AI Studio model catalog.
129+
+ Index with vector fields, text fields, vectorizers, vector profiles, vector algorithms. You can't modify the default index during the wizard workflow. Indexes conform to the [2024-05-01-preview REST API](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2024-05-01-preview&preserve-view=true).
143130

144-
+ Indexer with field mappings and output field mappings (if applicable).
131+
+ Skillset with the following five skills:
145132

146-
If you can't select Azure AI Vision vectorizer, make sure you have an Azure AI Vision resource in a supported region, and that your search service managed identity has **Cognitive Services OpenAI User** permissions.
133+
+ [OCR skill](cognitive-search-skill-ocr.md) recognizes text in image files.
134+
+ [Text Merger skill](cognitive-search-skill-textmerger.md) reunites the various outputs of OCR processing.
135+
+ [Text Split skill](cognitive-search-skill-textsplit.md) adds data chunking. This skill is part of the wizard workflow, although for this data, chunking isn't technically necessary.
136+
+ [Azure AI Vision multimodal](cognitive-search-skill-vision-vectorize.md) is used to vectorize OCR-generated text.
137+
+ [Azure AI Vision multimodal](cognitive-search-skill-vision-vectorize.md) is called again to vectorize images.
147138

148-
If you can't progress through the wizard because other options aren't available (for example, you can't select a data source or an embedding model), revisit the role assignments. Error messages indicate that models or deployments don't exist, when in fact the real issue is that the search service doesn't have permission to access them.
139+
+ Indexer with field mappings and output field mappings.
149140

150141
## Check results
151142

152-
Search explorer accepts text strings as input and then vectorizes the text for vector query execution.
143+
Search explorer accepts text, vectors, and images as query inputs. You can drag or select an image into the search area, and it will be vectorized for search. Image vectorization assumes that your index has a vectorizer definition, which the **Import and vectorize data** wizard creates using your selections.
153144

154-
1. In the Azure portal, under **Search Management** and **Indexes**, select the index your created.
145+
1. In the Azure portal, under **Search Management** and **Indexes**, select the index your created. An embedded Search Explorer is the first tab.
155146

156-
1. Optionally, select **Query options** and hide vector values in search results. This step makes your search results easier to read.
147+
1. Under **View**, select **Image view**.
157148

158-
:::image type="content" source="media/search-get-started-portal-import-vectors/query-options.png" alt-text="Screenshot of the query options button.":::
149+
:::image type="content" source="media/search-get-started-portal-images/select-image-view.png" alt-text="Screenshot of the query options button with image view.":::
159150

160-
1. Select **JSON view** so that you can enter text for your vector query in the **text** vector query parameter.
151+
1. Drag an image from the local folder that contains the sample image files. Or, open the file browser to select a local image file.
161152

153+
1. Select **Search** to run the query
162154

163-
1. Replace the text `"*"` with a question related to health plans, such as *"which plan has the lowest deductible"*.
155+
:::image type="content" source="media/search-get-started-portal-images/image-search.png" alt-text="Screenshot of search results.":::
164156

165-
1. Select **Search** to run the query.
157+
The top match should be the image you searched for.
166158

167-
:::image type="content" source="media/search-get-started-portal-import-vectors/search-results.png" alt-text="Screenshot of search results.":::
159+
1. Try the query options to compare search outcomes:
168160

169-
You should see 5 matches, where each document is a chunk of the original PDF. The title field shows which PDF the chunk comes from.
161+
+ Hide vectors for more readable results.
162+
+ Select a vector field to query over. The default is text vectors, but you can specify the image vector to exclude text vectors from query execution.
170163

171164
## Clean up
172165

0 commit comments

Comments
 (0)