Skip to content

Commit 3c26025

Browse files
committed
Added how-to article for programmatic integrated vectorization
1 parent f6679b6 commit 3c26025

File tree

3 files changed

+259
-0
lines changed

3 files changed

+259
-0
lines changed
69 KB
Loading
Lines changed: 257 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,257 @@
1+
---
2+
title: Integrated Vectorization Using REST APIs or Python
3+
titleSuffix: Azure AI Search
4+
description: Learn how to use supported data sources and embedding models for vectorization during indexing and queries in Azure AI Search.
5+
manager: nitinme
6+
author: haileytap
7+
ms.author: haileytapia
8+
ms.service: azure-ai-search
9+
ms.topic: how-to
10+
ms.date: 04/11/2025
11+
---
12+
13+
# Set up integrated vectorization in Azure AI Search using REST or Python
14+
15+
In this article, you learn how to use an indexer and a skillset to chunk, vectorize, and index content from a [supported data source](#supported-data-sources). The skillset calls the [Text Split skill](cognitive-search-skill-textsplit.md) or [Document Layout skill](cognitive-search-skill-document-intelligence-layout.md) for chunking and an embedding skill that's attached to a [supported embedding model](#supported-embedding-models) for chunk vectorization.
16+
17+
You also learn how to perform [vector search](vector-search-overview.md) by assigning a vectorizer, which you define in the index schema, to a vector field. The vectorizer should match the embedding model that encodes your content. At query time, the vectorizer is automatically used for text-to-vector conversion.
18+
19+
This article describes the end-to-end workflow for [integrated vectorization](vector-search-integrated-vectorization.md) using REST and Python. For portal-based instructions, see [Quickstart: Vectorize text and images in the Azure portal](search-get-started-portal-import-vectors.md).
20+
21+
## Prerequisites
22+
23+
+ An Azure account with an active subscription. [Create an account for free](https://azure.microsoft.com/free/?WT.mc_id=A261C142F).
24+
25+
+ An [Azure AI Search service](search-create-service-portal.md). We recommend the Basic tier or higher.
26+
27+
+ An [Azure AI services multi-service resource](/azure/ai-services/multi-service-resource) in the same region as your Azure AI Search service.
28+
29+
+ A [supported data source](#supported-data-sources).
30+
31+
+ A [supported embedding model](#supported-embedding-models).
32+
33+
+ [Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client) or the [Python extension](https://marketplace.visualstudio.com/items?itemName=ms-python.python).
34+
35+
### Supported data sources
36+
37+
Azure AI Search [supports various data sources](search-indexer-overview.md#supported-data-sources). However, this article only covers the data sources that work with whole files, which are described in the following table.
38+
39+
| Supported data source | Description |
40+
|--|--|
41+
| [Azure Blob Storage](search-howto-indexing-azure-blob-storage.md) | This data source works with blobs and tables. You must use a standard performance (general-purpose v2) account. Access tiers can be hot, cool, or cold. |
42+
| [Azure Data Lake Storage (ADLS) Gen2](/azure/storage/blobs/create-data-lake-storage-account) | This is an Azure Storage account with a hierarchical namespace enabled. To confirm that you have Data Lake Storage, check the **Properties** tab on the **Overview** page.<br><br> :::image type="content" source="media/search-how-to-integrated-vectorization/data-lake-storage-account.png" alt-text="Screenshot of an Azure Data Lake Storage account in the Azure portal." border="true" lightbox="media/search-how-to-integrated-vectorization/data-lake-storage-account.png"::: |
43+
| [OneLake lakehouse](search-how-to-index-onelake-files.md) | This data source is currently in preview. |
44+
45+
### Supported embedding models
46+
47+
Use an embedding model on an Azure AI platform in the [same region as Azure AI Search](search-create-service-portal.md#regions-with-the-most-overlap). For deployment instructions, see [Prepare your embedding model](#prepare-your-embedding-model).
48+
49+
| Provider | Supported models |
50+
|--|--|
51+
| [Azure OpenAI Service](https://aka.ms/oai/access) <sup>1, 2</sup> | <ul><li>text-embedding-ada-002</li><li>text-embedding-3-small</li><li>text-embedding-3-large |
52+
| [Azure AI Foundry model catalog](/azure/ai-foundry/what-is-azure-ai-foundry) | For text:<ul><li>Cohere-embed-v3-english</li><li>Cohere-embed-v3-multilingual</li></ul>For images:<ul><li>Facebook-DinoV2-Image-Embeddings-ViT-Base</li><li>Facebook-DinoV2-Image-Embeddings-ViT-Giant</li></ul> |
53+
| [Azure AI services multi-service account](/azure/ai-services/multi-service-resource#azure-ai-services-resource-for-azure-ai-search-skills) | For text and images:<ul><li>[Azure AI Vision multimodal](/azure/ai-services/computer-vision/how-to/image-retrieval) <sup>3</sup> (available in [select regions](/azure/ai-services/computer-vision/overview-image-analysis#region-availability))</li> |
54+
55+
<sup>1</sup> If you're using Azure OpenAI Service, the endpoint must have a [custom subdomain](/azure/ai-services/cognitive-services-custom-subdomains), such as `https://my-unique-name.cognitiveservices.azure.com`. If you created your service in the [Azure portal](https://portal.azure.com/), this subdomain was automatically generated during service setup. Ensure that your service has a custom subdomain before you use it with the Azure AI Search integration.
56+
57+
<sup>2</sup> Azure OpenAI Service resources (with access to embedding models) that were created in [Azure AI Foundry portal](https://ai.azure.com/) aren't supported. Only Azure OpenAI Service resources created in the Azure portal are compatible with the [Azure OpenAI Embedding skill](cognitive-search-skill-azure-openai-embedding.md) integration.
58+
59+
<sup>3</sup> Depending on how you [attach the multi-service resource](cognitive-search-attach-cognitive-services.md), your multi-service account might need to be in the same region as your Azure AI Search service.
60+
61+
### Permissions
62+
63+
You can use Microsoft Entra ID with role assignments or key-based authentication with full-access connection strings. For Azure AI Search service connections to other resources, we recommend role assignments.
64+
65+
To configure role-based access:
66+
67+
1. On your search service, [enable roles](search-security-enable-roles.md) and [configure a managed identity](search-howto-managed-identities-data-sources.md#create-a-system-managed-identity).
68+
69+
1. On your data source platform and embedding model provider, create role assignments that allow your search service to access data and models. See [Prepare your data](#prepare-your-data) and [Prepare your embedding model](#prepare-your-embedding-model).
70+
71+
> [!NOTE]
72+
> Free search services support role-based connections to Azure AI Search. However, they don't support managed identities on outbound connections to Azure Storage or Azure AI Vision. This lack of support requires key-based authentication on connections between free search services and other Azure resources.
73+
>
74+
> For more secure connections, use the Basic tier or higher. You can then enable roles and configure a managed identity for authorized access.
75+
76+
## Prepare your data
77+
78+
In this section, you prepare your data for integrated vectorization by uploading files to a [supported data source](#supported-data-sources) and assigning resource permissions.
79+
80+
### [Azure Blob Storage](#tab/prepare-data-storage)
81+
82+
1. Sign in to the [Azure portal](https://portal.azure.com/) and select your Azure Storage account.
83+
84+
1. From the left pane, select **Data Storage** > **Containers**.
85+
86+
1. Create a container or select an existing container, and then upload your files to the container.
87+
88+
1. To assign permissions:
89+
90+
1. If you're using roles, select **Access Control (IAM)** from the left pane, and then assign the [Storage Blob Data Reader](search-howto-managed-identities-data-sources.md#assign-a-role) role to your search service identity.
91+
92+
1. If you're using key-based authentication, select **Security + networking** > **Access keys** from the left pane, and then copy a connection string for your storage account.
93+
94+
1. (Optional) Synchronize deletions in your container with deletions in the search index. To configure your indexer for deletion detection:
95+
96+
1. [Enable soft delete](/azure/storage/blobs/soft-delete-blob-enable?tabs=azure-portal#enable-blob-soft-delete-hierarchical-namespace) on your storage account. If you're using [native soft delete](search-howto-index-changed-deleted-blobs.md#native-blob-soft-delete), the second step isn't required.
97+
98+
1. [Add custom metadata](search-howto-index-changed-deleted-blobs.md#soft-delete-strategy-using-custom-metadata) that an indexer can scan to determine which blobs are marked for deletion. Give your custom property a descriptive name. For example, you can name the property "IsDeleted" and set it to false. Repeat this step for every blob in the container. When you want to delete the blob, change the property to true. For more information, see [Change and delete detection when indexing from Azure Storage](search-howto-index-changed-deleted-blobs.md).
99+
100+
### [ADLS Gen2](#tab/prepare-data-adlsgen2)
101+
102+
1. Sign in to the [Azure portal](https://portal.azure.com/) and select your Azure Storage account.
103+
104+
1. From the left pane, select **Data Storage** > **Containers**.
105+
106+
1. Create a container or select an existing container, and then upload your files to the container.
107+
108+
1. To assign permissions:
109+
110+
1. If you're using roles, select **Access Control (IAM)** from the left pane, and then assign the [Storage Blob Data Reader](search-howto-managed-identities-data-sources.md#assign-a-role) role to your search service identity.
111+
112+
1. If you're using key-based authentication, select **Security + networking** > **Access keys** from the left pane, and then copy a connection string for your storage account.
113+
114+
1. (Optional) Synchronize deletions in your container with deletions in the search index. To configure your indexer for deletion detection:
115+
116+
1. [Enable soft delete](/azure/storage/blobs/soft-delete-blob-enable?tabs=azure-portal#enable-blob-soft-delete-hierarchical-namespace) on your storage account.
117+
118+
1. [Add custom metadata](search-howto-index-changed-deleted-blobs.md#soft-delete-strategy-using-custom-metadata) that an indexer can scan to determine which blobs are deleted. Give your custom property a descriptive name. For example, you can name the property "IsDeleted" and set it to false. Repeat this step for every blob in the container. When you want to delete the blob, change the property to true. For more information, see [Change and delete detection when indexing from Azure Storage](search-howto-index-changed-deleted-blobs.md).
119+
120+
### [OneLake](#tab/prepare-data-onelake)
121+
122+
1. Sign in to [Power BI](https://powerbi.com/) and [create a workspace](/fabric/data-engineering/tutorial-lakehouse-get-started).
123+
124+
1. From the left pane, select **Workspaces** and open your new workspace.
125+
126+
1. To assign permissions:
127+
128+
1. In the upper-right corner, select **Manage access**.
129+
130+
1. Select **Add people or groups**.
131+
132+
1. Enter the name of your search service. For example, if the URL is `https://my-demo-service.search.windows.net`, the search service name is `my-demo-service`.
133+
134+
1. Select a role. The default is **Viewer**, but you need **Contributor** to pull data into a search index.
135+
136+
1. To load your data:
137+
138+
1. From the **Power BI** switcher in the lower-left corner, select **Data Engineering**.
139+
140+
1. On the **Data Engineering** pane, select **Lakehouse** to create a lakehouse.
141+
142+
1. Provide a name, and then select **Create** to create and open the new lakehouse.
143+
144+
1. Select **Upload files** to upload your data.
145+
146+
1. To specify your lakehouse in REST or Python, copy the URL or get the workspace and lakehouse IDs. The URL has the following format: `https://msit.powerbi.com/groups/00000000-0000-0000-0000-000000000000/lakehouses/11111111-1111-1111-1111-111111111111?experience=power-bi`.
147+
148+
---
149+
150+
## Prepare your embedding model
151+
152+
In this section, you prepare your Azure AI resource for integrated vectorization by assigning resource permissions and deploying a [supported embedding model](#supported-embedding-models).
153+
154+
### [Azure OpenAI](#tab/prepare-model-aoai)
155+
156+
Azure AI Search supports text-embedding-ada-002, text-embedding-3-small, and text-embedding-3-large. Internally, Azure AI Search calls the [AzureOpenAIEmbedding skill](cognitive-search-skill-azure-openai-embedding.md) to connect to Azure OpenAI.
157+
158+
1. Sign in to the [Azure portal](https://portal.azure.com/) and select your Azure OpenAI resource.
159+
160+
1. To assign permissions:
161+
162+
1. If you're using roles, select **Access control (IAM)** from the left pane, and then assign the [Cognitive Services OpenAI User](/azure/ai-services/openai/how-to/role-based-access-control#azure-openai-roles) role to your search service identity.
163+
164+
1. If you're using key-based authentication, select **Resource Management** > **Keys and Endpoint** from the left pane, and then copy an endpoint or API key for your Azure OpenAI resource.
165+
166+
1. From the left pane, select **Resource Management** > **Model deployments**, and then select **Manage Deployments** to open Azure AI Foundry.
167+
168+
1. Copy the deployment name of **text-embedding-ada-002** or another [supported embedding model](#supported-embedding-models). If you don't have an embedding model, deploy one now.
169+
170+
### [Azure AI Vision](#tab/prepare-model-ai-vision)
171+
172+
Azure AI Search supports Azure AI Vision image retrieval through multimodal embeddings (version 4.0). Internally, Azure AI Search calls the [multimodal embeddings skill](cognitive-search-skill-vision-vectorize.md) to connect to Azure AI Vision.
173+
174+
1. Sign in to the [Azure portal](https://portal.azure.com/) and [create an Azure AI Vision resource](/azure/ai-services/computer-vision/how-to/image-retrieval?tabs=csharp#prerequisites). Make sure your Azure AI Search service is in the same region.
175+
176+
1. To assign permissions:
177+
178+
1. If you're using roles, select **Access control (IAM)** from the left pane, and then assign the **Cognitive Services User** role to your search service identity.
179+
180+
1. If you're using key-based authentication, select **Resource Management** > **Keys and Endpoint** from the left pane, and then copy an endpoint or API key for your Azure AI Vision resource.
181+
182+
1. Add deployment steps.
183+
184+
### [Azure AI Foundry model catalog](#tab/prepare-model-catalog)
185+
186+
Azure AI Search supports Azure, Cohere, and Facebook embedding models in the [Azure AI Foundry](https://ai.azure.com/) model catalog, but it doesn't currently support the OpenAI CLIP models. Internally, Azure AI Search calls the [AML skill](cognitive-search-aml-skill.md) to connect to the catalog.
187+
188+
1. For the model catalog, you should have an [Azure OpenAI resource](/azure/ai-services/openai/how-to/create-resource), a [hub in Azure AI Foundry portal](/azure/ai-foundry/how-to/create-projects), and a [project](/azure/ai-foundry/how-to/create-projects). Hubs and projects with the same name can share connection information and permissions.
189+
190+
1. To deploy an embedding model to the model catalog in your project:
191+
192+
1. Select **Models + Endpoints**, and then select **Deploy a model**.
193+
194+
1. Select **Deploy base model**.
195+
196+
1. Filter by inference task set to **Embeddings**.
197+
198+
1. Deploy one of the [supported embedding models](#supported-embedding-models).
199+
200+
---
201+
202+
## Connect to your data
203+
204+
For indexer-based indexing, you must connect to a [supported data source](#supported-data-sources). Indexers require a data source that specifies the type, credentials, and containers.
205+
206+
### [REST](#tab/connect-data-rest)
207+
208+
1. Use [Create Data Source](/rest/api/searchservice/data-sources/create) to define the data source.
209+
210+
```http
211+
POST https://my-search-service.search.windows.net/datasources?api-version=2024-07-01
212+
{
213+
"name": "my-data-source",
214+
"description": null,
215+
"type": "azureblob",
216+
"subtype": null,
217+
"credentials": {
218+
"connectionString": "DefaultEndpointsProtocol=https;AccountName=my-account-name"
219+
},
220+
"container": {
221+
"name": "my-blob-in-azure-blob",
222+
"query": ""
223+
}
224+
}
225+
```
226+
227+
1. Set `type` to your data source: `azureblob`, `azureadlsgen2`, or `onelake`.
228+
229+
1. Set `credentials` to...
230+
231+
1. Set `container` to...
232+
233+
### [Python](#tab/connect-data-python)
234+
235+
---
236+
237+
## Create a skillset
238+
239+
### Call a built-in skill to chunk your content
240+
241+
### Call an embedding skill to vectorize the chunks
242+
243+
## Create a vector index
244+
245+
## Add a vectorizer to the index
246+
247+
See vector-search-how-to-configure-vectorizer#define-a-vectorizer-and-vector-profile.
248+
249+
## Create an indexer
250+
251+
## Create vector queries
252+
253+
See vector-search-how-to-query.
254+
255+
## Related content
256+
257+
+ [Integrated vectorization sample](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/integrated-vectorization/azure-search-integrated-vectorization-sample.ipynb)

articles/search/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -347,6 +347,8 @@ items:
347347
href: search-how-to-semantic-chunking.md
348348
- name: Generate embeddings
349349
href: vector-search-how-to-generate-embeddings.md
350+
- name: Set up integrated vectorization
351+
href: search-how-to-integrated-vectorization.md
350352
- name: Use embedding models from Azure AI Foundry
351353
href: vector-search-integrated-vectorization-ai-studio.md
352354
- name: Reduce vector size

0 commit comments

Comments
 (0)