Skip to content

Commit ae06b06

Browse files
committed
Multimodal tutorial vectorize text and images, updates per testing
1 parent f079bab commit ae06b06

4 files changed

+141
-68
lines changed

articles/search/tutorial-document-extraction-image-verbalization.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ The following instructions apply to Azure Storage which provides the sample data
6363

6464
1. [Create role assignments and specify a managed identity in a connection string](search-howto-managed-identities-storage.md):
6565

66-
1. Assign **Storage Blob Data Reader** for data retrieval by the indexer and **Storage Blob Data Contributor** to create and load the knowledge store. You can use either a system-assigned managed identity or a user-assigned managed identity for your search service role assignment.
66+
1. Assign **Storage Blob Data Reader** for data retrieval by the indexer. Assign **Storage Blob Data Contributor** and **Storage Table Data Contributor** to create and load the knowledge store. You can use either a system-assigned managed identity or a user-assigned managed identity for your search service role assignment.
6767

6868
1. For connections made using a system-assigned managed identity, get a connection string that contains a ResourceId, with no account key or password. The ResourceId must include the subscription ID of the storage account, the resource group of the storage account, and the storage account name. The connection string is similar to the following example:
6969

@@ -105,11 +105,11 @@ For more information, see [Role-based access control for Azure OpenAI in Azure A
105105

106106
For this tutorial, your local REST client connection to Azure AI Search requires an endpoint and an API key. You can get these values from the Azure portal. For alternative connection methods, see [Connect to a search service](search-get-started-rbac.md).
107107

108-
For other authenticated connections, the search service uses the role assignments you previously defined.
108+
For authenticated connections that occur during indexer and skillset processing, the search service uses the role assignments you previously defined.
109109

110110
1. Start Visual Studio Code and create a new file.
111111

112-
1. Provide values for variables used in the request.
112+
1. Provide values for variables used in the request. For `@storageConnection`, make sure your connection string doesn't have a trailing semicolon or quotation marks. For `@imageProjectionContainer`, provide a container name that's unique in blob storage. Azure AI Search creates this container for you during skills processing.
113113

114114
```http
115115
@searchUrl = PUT-YOUR-SEARCH-SERVICE-ENDPOINT-HERE
@@ -119,7 +119,7 @@ For other authenticated connections, the search service uses the role assignment
119119
@openAIKey = PUT-YOUR-OPENAI-KEY-HERE
120120
@chatCompletionResourceUri = PUT-YOUR-CHAT-COMPLETION-URI-HERE
121121
@chatCompletionKey = PUT-YOUR-CHAT-COMPLETION-KEY-HERE
122-
@imageProjectionContainer=PUT-YOUR-IMAGE-PROJECTION-CONTAINER-HERE (Azure AI Search creates this container for you during skills processing)
122+
@imageProjectionContainer=PUT-YOUR-IMAGE-PROJECTION-CONTAINER-HERE
123123
```
124124

125125
1. Save the file using a `.rest` or `.http` file extension. For help with the REST client, see [Quickstart: Full-text search using REST](search-get-started-text.md).
@@ -147,11 +147,11 @@ POST {{searchUrl}}/datasources?api-version=2025-05-01-preview HTTP/1.1
147147
"description": null,
148148
"type": "azureblob",
149149
"subtype": null,
150-
"credentials": {
151-
"connectionString": "{{storageConnection}}"
150+
"credentials":{
151+
"connectionString":"{{storageConnection}}"
152152
},
153153
"container": {
154-
"name": "doc-extraction-image-verbalization-container",
154+
"name": "sustainable-ai-pdf",
155155
"query": null
156156
},
157157
"dataChangeDetectionPolicy": null,
@@ -167,7 +167,7 @@ Send the request. The response should look like:
167167
HTTP/1.1 201 Created
168168
Transfer-Encoding: chunked
169169
Content-Type: application/json; odata.metadata=minimal; odata.streaming=true; charset=utf-8
170-
Location: https://<YOUR-SEARCH-SERVICE-NAME>.search.windows-int.net:443/datasources('doc-extraction-image-verbalization-ds')?api-version=2025-05-01-preview -Preview
170+
Location: https://<YOUR-SEARCH-SERVICE-NAME>.search.windows-int.net:443/datasources('doc-extraction-multimodal-embedding-ds')?api-version=2025-05-01-preview -Preview
171171
Server: Microsoft-IIS/10.0
172172
Strict-Transport-Security: max-age=2592000, max-age=15724800; includeSubDomains
173173
Preference-Applied: odata.include-annotations="*"
@@ -178,16 +178,16 @@ Date: Sat, 26 Apr 2025 21:25:24 GMT
178178
Connection: close
179179

180180
{
181-
"name": "doc-extraction-image-verbalization-ds",
182-
"description": "A test datasource",
181+
"name": "doc-extraction-multimodal-embedding-ds",
182+
"description": null,
183183
"type": "azureblob",
184184
"subtype": null,
185185
"indexerPermissionOptions": [],
186186
"credentials": {
187187
"connectionString": null
188188
},
189189
"container": {
190-
"name": "doc-extraction-multimodality-container",
190+
"name": "sustainable-ai-pdf",
191191
"query": null
192192
},
193193
"dataChangeDetectionPolicy": null,

articles/search/tutorial-document-extraction-multimodal-embeddings.md

Lines changed: 38 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -39,15 +39,15 @@ This tutorial demonstrates a lower-cost approach for indexing multimodal content
3939

4040
+ [Azure AI services multi-service account](/azure/ai-services/multi-service-resource#azure-ai-services-resource-for-azure-ai-search-skills). This account provides access to the Azure AI Vision multimodal embedding model used in this tutorial. You must use an Azure AI multi-service account for skillset access to this resource.
4141

42-
+ [Azure AI Search](search-create-service-portal.md). [Configure your search service](search-manage.md) for role-based access control and a managed identity for connections to Azure Storage and Azure AI Vision. Your service must be on the Basic tier or higher. This tutorial isn't supported on the Free tier. The search service must also be in the same region as your multi-service account.
42+
+ [Azure AI Search](search-create-service-portal.md). [Configure your search service](search-manage.md) for role-based access control and a managed identity for connections to Azure Storage and Azure AI Vision. Your service must be on the Basic tier or higher. This tutorial isn't supported on the Free tier.
4343

4444
+ [Azure Storage](/azure/storage/common/storage-account-create), used for storing sample data and for creating a [knowledge store](knowledge-store-concept-intro.md).
4545

4646
+ [Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client).
4747

4848
## Limitations
4949

50-
+ The [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md) also has limited regional availability. For an updated list of regions that provide multimodal embeddings, see the [Azure AI Vision documentation](/azure/ai-services/computer-vision/overview-image-analysis#region-availability).
50+
+ The [Azure AI Vision multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md) has limited regional availability. When you install the multi-service account, choose a region that provides multimodal embeddings. For an updated list of regions that provide multimodal embeddings, see the [Azure AI Vision documentation](/azure/ai-services/computer-vision/overview-image-analysis#region-availability).
5151

5252
## Prepare data
5353

@@ -61,7 +61,7 @@ The following instructions apply to Azure Storage which provides the sample data
6161

6262
1. [Create role assignments and specify a managed identity in a connection string](search-howto-managed-identities-storage.md):
6363

64-
1. Assign **Storage Blob Data Reader** for data retrieval by the indexer and **Storage Blob Data Contributor** to create and load the knowledge store. You can use either a system-assigned managed identity or a user-assigned managed identity for your search service role assignment.
64+
1. Assign **Storage Blob Data Reader** for data retrieval by the indexer. Assign **Storage Blob Data Contributor** and **Storage Table Data Contributor** to create and load the knowledge store. You can use either a system-assigned managed identity or a user-assigned managed identity for your search service role assignment.
6565

6666
1. For connections made using a system-assigned managed identity, get a connection string that contains a ResourceId, with no account key or password. The ResourceId must include the subscription ID of the storage account, the resource group of the storage account, and the storage account name. The connection string is similar to the following example:
6767

@@ -101,20 +101,19 @@ This tutorial assumes you have an existing Azure AI multiservice account through
101101

102102
For this tutorial, your local REST client connection to Azure AI Search requires an endpoint and an API key. You can get these values from the Azure portal. For alternative connection methods, see [Connect to a search service](search-get-started-rbac.md).
103103

104-
For other authenticated connections, the search service uses the role assignments you previously defined.
104+
For authenticated connections that occur during indexer and skillset processing, the search service uses the role assignments you previously defined.
105105

106106
1. Start Visual Studio Code and create a new file.
107107

108-
1. Provide values for variables used in the request.
108+
1. Provide values for variables used in the request. For `@storageConnection`, make sure your connection string doesn't have a trailing semicolon or quotation marks. For `@imageProjectionContainer`, provide a container name that's unique in blob storage. Azure AI Search creates this container for you during skills processing.
109109

110110
```http
111-
@searchUrl = PUT-YOUR-SEARCH-SERVICE-ENDPOINT-HERE
112-
@searchsearchApiKey = PUT-YOUR-ADMIN-API-KEY-HERE
113-
@storageConnection = PUT-YOUR-STORAGE-CONNECTION-STRING-HERE
114-
@cognitiveServicesUrl = PUT-YOUR-COGNITIVE-SERVICES-URL-HERE
115-
@cognitiveServicesKey= PUT-YOUR-COGNITIVE-SERVICES-URL-KEY-HERE
116-
@modelVersion = PUT-YOUR-VECTORIZE-MODEL-VERSION-HERE
117-
@imageProjectionContainer=PUT-YOUR-IMAGE-PROJECTION-CONTAINER-HERE (Azure AI Search creates this container for you during skills processing)
111+
@searchUrl = PUT-YOUR-SEARCH-SERVICE-ENDPOINT-HERE
112+
@searchApiKey = PUT-YOUR-ADMIN-API-KEY-HERE
113+
@storageConnection = PUT-YOUR-STORAGE-CONNECTION-STRING-HERE
114+
@cognitiveServicesUrl = PUT-YOUR-AZURE-AI-MULTI-SERVICE-ENDPOINT-HERE
115+
@modelVersion = 2023-04-15
116+
@imageProjectionContainer=sustainable-ai-pdf-images
118117
```
119118

120119
1. Save the file using a `.rest` or `.http` file extension. For help with the REST client, see [Quickstart: Full-text search using REST](search-get-started-text.md).
@@ -132,28 +131,27 @@ To get the Azure AI Search endpoint and API key:
132131
[Create Data Source (REST)](/rest/api/searchservice/data-sources/create) creates a data source connection that specifies what data to index.
133132

134133
```http
135-
### Create a data source
136134
POST {{searchUrl}}/datasources?api-version=2025-05-01-preview HTTP/1.1
137135
Content-Type: application/json
138136
api-key: {{searchApiKey}}
139137
140-
{
141-
"name": "doc-extraction-multimodal-embedding-ds",
142-
"description": null,
143-
"type": "azureblob",
144-
"subtype": null,
145-
"credentials": {
146-
"connectionString": "{{storageConnection}}"
147-
},
148-
"container": {
149-
"name": "doc-extraction-multimodality-container",
150-
"query": null
151-
},
152-
"dataChangeDetectionPolicy": null,
153-
"dataDeletionDetectionPolicy": null,
154-
"encryptionKey": null,
155-
"identity": null
156-
}
138+
{
139+
"name":"doc-extraction-multimodal-embedding-ds",
140+
"description":null,
141+
"type":"azureblob",
142+
"subtype":null,
143+
"credentials":{
144+
"connectionString":"{{storageConnection}}"
145+
},
146+
"container":{
147+
"name":"sustainable-ai-pdf",
148+
"query":null
149+
},
150+
"dataChangeDetectionPolicy":null,
151+
"dataDeletionDetectionPolicy":null,
152+
"encryptionKey":null,
153+
"identity":null
154+
}
157155
```
158156

159157
Send the request. The response should look like:
@@ -174,15 +172,15 @@ Connection: close
174172

175173
{
176174
"name": "doc-extraction-multimodal-embedding-ds",
177-
"description": "A test datasource",
175+
"description": null,
178176
"type": "azureblob",
179177
"subtype": null,
180178
"indexerPermissionOptions": [],
181179
"credentials": {
182180
"connectionString": null
183181
},
184182
"container": {
185-
"name": "doc-extraction-multimodality-container",
183+
"name": "sustainable-ai-pdf",
186184
"query": null
187185
},
188186
"dataChangeDetectionPolicy": null,
@@ -288,7 +286,7 @@ POST {{searchUrl}}/indexes?api-version=2025-05-01-preview HTTP/1.1
288286
{
289287
"name": "hnsw",
290288
"algorithm": "defaulthnsw",
291-
"vectorizer": "{{vectorizer}}"
289+
"vectorizer": "demo-vectorizer"
292290
}
293291
],
294292
"algorithms": [
@@ -304,11 +302,11 @@ POST {{searchUrl}}/indexes?api-version=2025-05-01-preview HTTP/1.1
304302
],
305303
"vectorizers": [
306304
{
307-
"name": "{{ vectorizer }}",
305+
"name": "demo-vectorizer",
308306
"kind": "aiServicesVision",
309307
"aiServicesVisionParameters": {
310308
"resourceUri": "{{cognitiveServicesUrl}}",
311-
"searchApiKey": "{{cognitiveServicesKey}}",
309+
"authIdentity": null,
312310
"modelVersion": "{{modelVersion}}"
313311
}
314312
}
@@ -451,7 +449,7 @@ POST {{searchUrl}}/skillsets?api-version=2025-05-01-preview HTTP/1.1
451449
{
452450
"@odata.type": "#Microsoft.Skills.Util.ShaperSkill",
453451
"name": "shaper-skill",
454-
"description": "Shaper skill to reshape the data to fit the index schema"
452+
"description": "Shaper skill to reshape the data to fit the index schema",
455453
"context": "/document/normalized_images/*",
456454
"inputs": [
457455
{
@@ -493,9 +491,9 @@ POST {{searchUrl}}/skillsets?api-version=2025-05-01-preview HTTP/1.1
493491
}
494492
],
495493
"cognitiveServices": {
496-
"@odata.type": "#Microsoft.Azure.Search.AIServicesByKey",
494+
"@odata.type": "#Microsoft.Azure.Search.AIServicesByIdentity",
497495
"subdomainUrl": "{{cognitiveServicesUrl}}",
498-
"key": "{{cognitiveServicesKey}}"
496+
"identity": null
499497
},
500498
"indexProjections": {
501499
"selectors": [
@@ -548,6 +546,7 @@ POST {{searchUrl}}/skillsets?api-version=2025-05-01-preview HTTP/1.1
548546
},
549547
"knowledgeStore": {
550548
"storageConnectionString": "{{storageConnection}}",
549+
"identity": null,
551550
"projections": [
552551
{
553552
"files": [
@@ -583,6 +582,7 @@ POST {{searchUrl}}/indexers?api-version=2025-05-01-preview HTTP/1.1
583582
api-key: {{searchApiKey}}
584583
585584
{
585+
"name": "doc-extraction-multimodal-embedding-indexer",
586586
"dataSourceName": "doc-extraction-multimodal-embedding-ds",
587587
"targetIndexName": "doc-extraction-multimodal-embedding-index",
588588
"skillsetName": "doc-extraction-multimodal-embedding-skillset",

articles/search/tutorial-document-layout-image-verbalization.md

Lines changed: 44 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ The following instructions apply to Azure Storage which provides the sample data
6464

6565
1. [Create role assignments and specify a managed identity in a connection string](search-howto-managed-identities-storage.md):
6666

67-
1. Assign **Storage Blob Data Reader** for data retrieval by the indexer and **Storage Blob Data Contributor** to create and load the knowledge store. You can use either a system-assigned managed identity or a user-assigned managed identity for your search service role assignment.
67+
1. Assign **Storage Blob Data Reader** for data retrieval by the indexer. Assign **Storage Blob Data Contributor** and **Storage Table Data Contributor** to create and load the knowledge store. You can use either a system-assigned managed identity or a user-assigned managed identity for your search service role assignment.
6868

6969
1. For connections made using a system-assigned managed identity, get a connection string that contains a ResourceId, with no account key or password. The ResourceId must include the subscription ID of the storage account, the resource group of the storage account, and the storage account name. The connection string is similar to the following example:
7070

@@ -122,11 +122,12 @@ For more information, see [Role-based access control for Azure OpenAI in Azure A
122122

123123
For this tutorial, your local REST client connection to Azure AI Search requires an endpoint and an API key. You can get these values from the Azure portal. For alternative connection methods, see [Connect to a search service](search-get-started-rbac.md).
124124

125-
For other authenticated connections, the search service uses the role assignments you previously defined.
125+
For authenticated connections that occur during indexer and skillset processing, the search service uses the role assignments you previously defined.
126126

127127
1. Start Visual Studio Code and create a new file.
128128

129-
1. Provide values for variables used in the request.
129+
1. Provide values for variables used in the request. For `@storageConnection`, make sure your connection string doesn't have a trailing semicolon or quotation marks. For `@imageProjectionContainer`, provide a container name that's unique in blob storage. Azure AI Search creates this container for you during skills processing.
130+
130131
```http
131132
@searchUrl = PUT-YOUR-SEARCH-SERVICE-ENDPOINT-HERE
132133
@searchApiKey = PUT-YOUR-ADMIN-API-KEY-HERE
@@ -135,7 +136,7 @@ For other authenticated connections, the search service uses the role assignment
135136
@openAIKey = PUT-YOUR-OPENAI-KEY-HERE
136137
@chatCompletionResourceUri = PUT-YOUR-CHAT-COMPLETION-URI-HERE
137138
@chatCompletionKey = PUT-YOUR-CHAT-COMPLETION-KEY-HERE
138-
@imageProjectionContainer=PUT-YOUR-IMAGE-PROJECTION-CONTAINER-HERE (Azure AI Search creates this container for you during skills processing)
139+
@imageProjectionContainer=PUT-YOUR-IMAGE-PROJECTION-CONTAINER-HERE
139140
```
140141

141142
1. Save the file using a `.rest` or `.http` file extension. For help with the REST client, see [Quickstart: Full-text search using REST](search-get-started-text.md).
@@ -163,11 +164,11 @@ POST {{searchUrl}}/datasources?api-version=2025-05-01-preview HTTP/1.1
163164
"description": "A data source to store multi-modality documents",
164165
"type": "azureblob",
165166
"subtype": null,
166-
"credentials": {
167-
"connectionString": "ResourceId=/subscriptions/00000000-0000-0000-0000-00000000/resourceGroups/MY-DEMO-RESOURCE-GROUP/providers/Microsoft.Storage/storageAccounts/MY-DEMO-STORAGE-ACCOUNT/;"
167+
"credentials":{
168+
"connectionString":"{{storageConnection}}"
168169
},
169170
"container": {
170-
"name": "doc-intelligence-multimodality-container",
171+
"name": "sustainable-ai-pdf",
171172
"query": null
172173
},
173174
"dataChangeDetectionPolicy": null,
@@ -178,6 +179,42 @@ POST {{searchUrl}}/datasources?api-version=2025-05-01-preview HTTP/1.1
178179
179180
```
180181

182+
Send the request. The response should look like:
183+
184+
```json
185+
HTTP/1.1 201 Created
186+
Transfer-Encoding: chunked
187+
Content-Type: application/json; odata.metadata=minimal; odata.streaming=true; charset=utf-8
188+
Location: https://<YOUR-SEARCH-SERVICE-NAME>.search.windows-int.net:443/datasources('doc-extraction-multimodal-embedding-ds')?api-version=2025-05-01-preview -Preview
189+
Server: Microsoft-IIS/10.0
190+
Strict-Transport-Security: max-age=2592000, max-age=15724800; includeSubDomains
191+
Preference-Applied: odata.include-annotations="*"
192+
OData-Version: 4.0
193+
request-id: 4eb8bcc3-27b5-44af-834e-295ed078e8ed
194+
elapsed-time: 346
195+
Date: Sat, 26 Apr 2025 21:25:24 GMT
196+
Connection: close
197+
198+
{
199+
"name": "doc-extraction-multimodal-embedding-ds",
200+
"description": null,
201+
"type": "azureblob",
202+
"subtype": null,
203+
"indexerPermissionOptions": [],
204+
"credentials": {
205+
"connectionString": null
206+
},
207+
"container": {
208+
"name": "sustainable-ai-pdf",
209+
"query": null
210+
},
211+
"dataChangeDetectionPolicy": null,
212+
"dataDeletionDetectionPolicy": null,
213+
"encryptionKey": null,
214+
"identity": null
215+
}
216+
```
217+
181218
## Create an index
182219

183220
[Create Index (REST)](/rest/api/searchservice/indexes/create) creates a search index on your search service. An index specifies all the parameters and their attributes.

0 commit comments

Comments
 (0)