Skip to content

Commit 7fe360b

Browse files
Merge pull request #6309 from HeidiSteen/heidist-freshness
multimodal RBAC for models
2 parents 719fb02 + f079bab commit 7fe360b

4 files changed

+220
-137
lines changed

articles/search/tutorial-document-extraction-image-verbalization.md

Lines changed: 54 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -43,15 +43,17 @@ This tutorial demonstrates a lower-cost approach for indexing multimodal content
4343

4444
+ [Azure Storage](/azure/storage/common/storage-account-create), used for storing sample data and for creating a [knowledge store](knowledge-store-concept-intro.md).
4545

46-
+ [Azure OpenAI](/azure/ai-foundry/openai/how-to/create-resource) with a deployment of a chat completion model hosted in Azure AI Foundry or another source. The model is used to verbalize image content. You provide the URI to the hosted model in the GenAI Prompt skill definition.
46+
+ [Azure OpenAI](/azure/ai-foundry/openai/how-to/create-resource) with a deployment of
4747

48-
+ A text embedding model deployed in Azure AI Foundry. The model is used to vectorize text content pull from source documents and the image descriptions generated by the chat completion model. For integrated vectorization, the embedding model must be located in Azure AI Foundry, and it must be either text-embedding-ada-002, text-embedding-3-large, or text-embedding-3-small. If you want to use an external embedding model, use a custom skill instead of the Azure OpenAI embedding skill.
48+
+ A chat completion model hosted in Azure AI Foundry or another source. The model is used to verbalize image content. You provide the URI to the hosted model in the GenAI Prompt skill definition. You can use [any chat completion model](cognitive-search-skill-genai-prompt.md#supported-models).
49+
50+
+ A text embedding model deployed in Azure AI Foundry. The model is used to vectorize text content pull from source documents and the image descriptions generated by the chat completion model. For integrated vectorization, the embedding model must be located in Azure AI Foundry, and it must be either text-embedding-ada-002, text-embedding-3-large, or text-embedding-3-small. If you want to use an external embedding model, use a custom skill instead of the Azure OpenAI embedding skill.
4951

5052
+ [Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client).
5153

5254
## Prepare data
5355

54-
The following instructions apply to Azure Storage which provides the sample data and also hosts the knowledge store. A search service identity needs read access to Azure Storage to retrieve the sample data, and it needs write access to create the knowledge store. The search service creates the container for cropped images during skillset processing.
56+
The following instructions apply to Azure Storage which provides the sample data and also hosts the knowledge store. A search service identity needs read access to Azure Storage to retrieve the sample data, and it needs write access to create the knowledge store. The search service creates the container for cropped images during skillset processing, using the name you provide in an environment variable.
5557

5658
1. Download the following sample PDF: [sustainable-ai-pdf](https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/microsoft/msc/documents/presentations/CSR/Accelerating-Sustainability-with-AI-2025.pdf)
5759

@@ -83,25 +85,35 @@ The following instructions apply to Azure Storage which provides the sample data
8385
}
8486
```
8587

86-
### Copy a search service URL and API key
88+
## Prepare models
8789

88-
For this tutorial, your REST client connection to Azure AI Search requires an endpoint and an API key. You can get these values from the Azure portal. For alternative connection methods, see [Connect to a search service](search-get-started-rbac.md).
90+
This tutorial assumes you have an existing Azure OpenAI resource through which the skills call the text embedding model and chat completion models. The search service connects to the models during skillset processing and during query execution using its managed identity. This section gives you guidance and links for assigning roles for authorized access.
8991

90-
1. Sign in to the [Azure portal](https://portal.azure.com), navigate to the search service **Overview** page, and copy the URL. An example endpoint might look like `https://mydemo.search.windows.net`.
92+
1. Sign in to the Azure portal (not the Foundry portal) and find the Azure OpenAI resource.
9193

92-
1. Under **Settings** > **Keys**, copy an admin key. Admin keys are used to add, modify, and delete objects. There are two interchangeable admin keys. Copy either one.
94+
1. Select **Access control (IAM)**.
9395

94-
:::image type="content" source="media/search-get-started-rest/get-url-key.png" alt-text="Screenshot of the URL and API keys in the Azure portal.":::
96+
1. Select **Add** and then **Add role assignment**.
97+
98+
1. Search for **Cognitive Services OpenAI User** and then select it.
99+
100+
1. Choose **Managed identity** and then assign your [search service managed identity](search-howto-managed-identities-data-sources.md).
101+
102+
For more information, see [Role-based access control for Azure OpenAI in Azure AI Foundry Models](/azure/ai-foundry/openai/how-to/role-based-access-control).
95103

96104
## Set up your REST file
97105

106+
For this tutorial, your local REST client connection to Azure AI Search requires an endpoint and an API key. You can get these values from the Azure portal. For alternative connection methods, see [Connect to a search service](search-get-started-rbac.md).
107+
108+
For other authenticated connections, the search service uses the role assignments you previously defined.
109+
98110
1. Start Visual Studio Code and create a new file.
99111

100112
1. Provide values for variables used in the request.
101113

102114
```http
103-
@baseUrl = PUT-YOUR-SEARCH-SERVICE-ENDPOINT-HERE
104-
@apiKey = PUT-YOUR-ADMIN-API-KEY-HERE
115+
@searchUrl = PUT-YOUR-SEARCH-SERVICE-ENDPOINT-HERE
116+
@searchApiKey = PUT-YOUR-ADMIN-API-KEY-HERE
105117
@storageConnection = PUT-YOUR-STORAGE-CONNECTION-STRING-HERE
106118
@openAIResourceUri = PUT-YOUR-OPENAI-URI-HERE
107119
@openAIKey = PUT-YOUR-OPENAI-KEY-HERE
@@ -110,19 +122,25 @@ For this tutorial, your REST client connection to Azure AI Search requires an en
110122
@imageProjectionContainer=PUT-YOUR-IMAGE-PROJECTION-CONTAINER-HERE (Azure AI Search creates this container for you during skills processing)
111123
```
112124

113-
1. Save the file using a `.rest` or `.http` file extension.
125+
1. Save the file using a `.rest` or `.http` file extension. For help with the REST client, see [Quickstart: Full-text search using REST](search-get-started-text.md).
126+
127+
To get the Azure AI Search endpoint and API key:
128+
129+
1. Sign in to the [Azure portal](https://portal.azure.com), navigate to the search service **Overview** page, and copy the URL. An example endpoint might look like `https://mydemo.search.windows.net`.
114130

115-
For help with the REST client, see [Quickstart: Full-text search using REST](search-get-started-text.md).
131+
1. Under **Settings** > **Keys**, copy an admin key. Admin keys are used to add, modify, and delete objects. There are two interchangeable admin keys. Copy either one.
132+
133+
:::image type="content" source="media/search-get-started-rest/get-url-key.png" alt-text="Screenshot of the URL and API keys in the Azure portal.":::
116134

117135
## Create a data source
118136

119137
[Create Data Source (REST)](/rest/api/searchservice/data-sources/create) creates a data source connection that specifies what data to index.
120138

121139
```http
122140
### Create a data source
123-
POST {{baseUrl}}/datasources?api-version=2025-05-01-preview HTTP/1.1
141+
POST {{searchUrl}}/datasources?api-version=2025-05-01-preview HTTP/1.1
124142
Content-Type: application/json
125-
api-key: {{apiKey}}
143+
api-key: {{searchApiKey}}
126144
127145
{
128146
"name": "doc-extraction-image-verbalization-ds",
@@ -187,9 +205,9 @@ For nested JSON, the index fields must be identical to the source fields. Curren
187205

188206
```http
189207
### Create an index
190-
POST {{baseUrl}}/indexes?api-version=2025-05-01-preview HTTP/1.1
208+
POST {{searchUrl}}/indexes?api-version=2025-05-01-preview HTTP/1.1
191209
Content-Type: application/json
192-
api-key: {{apiKey}}
210+
api-key: {{searchApiKey}}
193211
194212
{
195213
"name": "doc-extraction-image-verbalization-index",
@@ -296,7 +314,7 @@ POST {{baseUrl}}/indexes?api-version=2025-05-01-preview HTTP/1.1
296314
"azureOpenAIParameters": {
297315
"resourceUri": "{{openAIResourceUri}}",
298316
"deploymentId": "text-embedding-3-large",
299-
"apiKey": "{{openAIKey}}",
317+
"searchApiKey": "{{openAIKey}}",
300318
"modelName": "text-embedding-3-large"
301319
}
302320
}
@@ -339,9 +357,9 @@ The skillset also performs actions specific to images. It uses the GenAI Prompt
339357

340358
```http
341359
### Create a skillset
342-
POST {{baseUrl}}/skillsets?api-version=2025-05-01-preview HTTP/1.1
360+
POST {{searchUrl}}/skillsets?api-version=2025-05-01-preview HTTP/1.1
343361
Content-Type: application/json
344-
api-key: {{apiKey}}
362+
api-key: {{searchApiKey}}
345363
346364
{
347365
"name": "doc-extraction-image-verbalization-skillset",
@@ -419,7 +437,7 @@ POST {{baseUrl}}/skillsets?api-version=2025-05-01-preview HTTP/1.1
419437
],
420438
"resourceUri": "{{openAIResourceUri}}",
421439
"deploymentId": "text-embedding-3-large",
422-
"apiKey": "{{openAIKey}}",
440+
"searchApiKey": "{{openAIKey}}",
423441
"dimensions": 3072,
424442
"modelName": "text-embedding-3-large"
425443
},
@@ -429,7 +447,7 @@ POST {{baseUrl}}/skillsets?api-version=2025-05-01-preview HTTP/1.1
429447
"description": "GenAI Prompt skill for image verbalization",
430448
"uri": "{{chatCompletionResourceUri}}",
431449
"timeout": "PT1M",
432-
"apiKey": "{{chatCompletionKey}}",
450+
"searchApiKey": "{{chatCompletionKey}}",
433451
"context": "/document/normalized_images/*",
434452
"inputs": [
435453
{
@@ -472,7 +490,7 @@ POST {{baseUrl}}/skillsets?api-version=2025-05-01-preview HTTP/1.1
472490
],
473491
"resourceUri": "{{openAIResourceUri}}",
474492
"deploymentId": "text-embedding-3-large",
475-
"apiKey": "{{openAIKey}}",
493+
"searchApiKey": "{{openAIKey}}",
476494
"dimensions": 3072,
477495
"modelName": "text-embedding-3-large"
478496
},
@@ -606,9 +624,9 @@ Key points:
606624

607625
```http
608626
### Create and run an indexer
609-
POST {{baseUrl}}/indexers?api-version=2025-05-01-preview HTTP/1.1
627+
POST {{searchUrl}}/indexers?api-version=2025-05-01-preview HTTP/1.1
610628
Content-Type: application/json
611-
api-key: {{apiKey}}
629+
api-key: {{searchApiKey}}
612630
613631
{
614632
"dataSourceName": "doc-extraction-image-verbalization-ds",
@@ -638,9 +656,9 @@ You can start searching as soon as the first document is loaded.
638656

639657
```http
640658
### Query the index
641-
POST {{baseUrl}}/indexes/doc-extraction-image-verbalization-index/docs/search?api-version=2025-05-01-preview HTTP/1.1
659+
POST {{searchUrl}}/indexes/doc-extraction-image-verbalization-index/docs/search?api-version=2025-05-01-preview HTTP/1.1
642660
Content-Type: application/json
643-
api-key: {{apiKey}}
661+
api-key: {{searchApiKey}}
644662
645663
{
646664
"search": "*",
@@ -689,9 +707,9 @@ Here are some examples of other queries:
689707

690708
```http
691709
### Query for only images
692-
POST {{baseUrl}}/indexes/doc-extraction-image-verbalization-index/docs/search?api-version=2025-05-01-preview HTTP/1.1
710+
POST {{searchUrl}}/indexes/doc-extraction-image-verbalization-index/docs/search?api-version=2025-05-01-preview HTTP/1.1
693711
Content-Type: application/json
694-
api-key: {{apiKey}}
712+
api-key: {{searchApiKey}}
695713
696714
{
697715
"search": "*",
@@ -702,9 +720,9 @@ POST {{baseUrl}}/indexes/doc-extraction-image-verbalization-index/docs/search?ap
702720

703721
```http
704722
### Query for text or images with content related to energy, returning the id, parent document, and text (extracted text for text chunks and verbalized image text for images), and the content path where the image is saved in the knowledge store (only populated for images)
705-
POST {{baseUrl}}/indexes/doc-extraction-image-verbalization-index/docs/search?api-version=2025-05-01-preview HTTP/1.1
723+
POST {{searchUrl}}/indexes/doc-extraction-image-verbalization-index/docs/search?api-version=2025-05-01-preview HTTP/1.1
706724
Content-Type: application/json
707-
api-key: {{apiKey}}
725+
api-key: {{searchApiKey}}
708726
709727
{
710728
"search": "energy",
@@ -719,20 +737,20 @@ Indexers can be reset to clear the high-water mark, which allows a full rerun. T
719737

720738
```http
721739
### Reset the indexer
722-
POST {{baseUrl}}/indexers/doc-extraction-image-verbalization-indexer/reset?api-version=2025-05-01-preview HTTP/1.1
723-
api-key: {{apiKey}}
740+
POST {{searchUrl}}/indexers/doc-extraction-image-verbalization-indexer/reset?api-version=2025-05-01-preview HTTP/1.1
741+
api-key: {{searchApiKey}}
724742
```
725743

726744
```http
727745
### Run the indexer
728-
POST {{baseUrl}}/indexers/doc-extraction-image-verbalization-indexer/run?api-version=2025-05-01-preview HTTP/1.1
729-
api-key: {{apiKey}}
746+
POST {{searchUrl}}/indexers/doc-extraction-image-verbalization-indexer/run?api-version=2025-05-01-preview HTTP/1.1
747+
api-key: {{searchApiKey}}
730748
```
731749

732750
```http
733751
### Check indexer status
734-
GET {{baseUrl}}/indexers/doc-extraction-image-verbalization-indexer/status?api-version=2025-05-01-preview HTTP/1.1
735-
api-key: {{apiKey}}
752+
GET {{searchUrl}}/indexers/doc-extraction-image-verbalization-indexer/status?api-version=2025-05-01-preview HTTP/1.1
753+
api-key: {{searchApiKey}}
736754
```
737755

738756
## Clean up resources

0 commit comments

Comments
 (0)