Skip to content

Commit 3433254

Browse files
authored
Merge pull request #270448 from HeidiSteen/heidist-fix
[azure search] Configure vectorizer is query-specific
2 parents 1609e0a + 06df2cc commit 3433254

File tree

4 files changed

+131
-71
lines changed

4 files changed

+131
-71
lines changed
45.5 KB
Loading
74.2 KB
Loading

articles/search/vector-search-how-to-configure-vectorizer.md

Lines changed: 127 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -9,46 +9,98 @@ ms.service: cognitive-search
99
ms.custom:
1010
- ignite-2023
1111
ms.topic: how-to
12-
ms.date: 12/02/2023
12+
ms.date: 03/27/2024
1313
---
1414

1515
# Configure a vectorizer in a search index
1616

1717
> [!IMPORTANT]
1818
> This feature is in public preview under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). The [2023-10-01-Preview REST API](/rest/api/searchservice/operation-groups?view=rest-searchservice-2023-10-01-preview&preserve-view=true) supports this feature.
1919
20-
A *vectorizer* is a component of a [search index](search-what-is-an-index.md) that specifies a vectorization agent, such as a deployed embedding model on Azure OpenAI that converts text to vectors. You can define a vectorizer once, and then reference it in the vector profile assigned to a vector field.
20+
In Azure AI Search a *vectorizer* is software that performs vectorization, such as a deployed embedding model on Azure OpenAI, that converts text to vectors during query execution.
2121

22-
A vectorizer is used for queries. It allows the search service to vectorize a text query on your behalf.
22+
It's defined in a [search index](search-what-is-an-index.md), it applies to searchable vector fields, and it's used at query time to generate an embedding for a text query input. If instead you need to vectorize text as part of the indexing process, refer to [Integrated Vectorization (Preview)](vector-search-integrated-vectorization.md). For built-in vectorization during indexing, you can configure an indexer and skillset that calls an Azure OpenAI embedding model for your raw text content.
2323

24-
If you need to vectorize data as part of the indexing process refer to [Integrated Vectorization (Preview)](vector-search-integrated-vectorization.md).
25-
26-
You can use the [**Import and vectorize data wizard**](search-get-started-portal-import-vectors.md), the [2023-10-01-Preview](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true) REST APIs, or any Azure beta SDK package that's been updated to provide this feature.
24+
To add a vectorizer to search index, you can use the index designer in Azure portal, or call the [Create or Update Index 2023-10-01-preview](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true) REST API, or use any Azure beta SDK package that's updated to provide this feature.
2725

2826
## Prerequisites
2927

30-
+ A deployed embedding model on Azure OpenAI, or a custom skill that wraps an embedding model.
28+
+ [An index with searchable vector fields](vector-search-how-to-create-index.md) on Azure AI Search.
29+
30+
+ A deployed embedding model, such as **text-embedding-ada-002** on Azure OpenAI. It's used to vectorize a query. It must be identical to the model used to generate the embeddings in your index.
31+
32+
+ Permissions to use the embedding model. If you're using Azure OpenAI, the caller must have [Cognitive Services OpenAI User](/azure/ai-services/openai/how-to/role-based-access-control#azure-openai-roles) permissions. Or, you can provide an API key.
33+
34+
+ [Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client) to send the query and accept a response.
35+
36+
We recommend that you enable diagnostic logging on your search service to confirm vector query execution.
3137

32-
+ Permissions to upload a payload to the embedding model. The connection to a vectorizer is specified in the skillset. If you're using Azure OpenAI, the caller must have [Cognitive Services OpenAI User](/azure/ai-services/openai/how-to/role-based-access-control#azure-openai-roles) permissions.
38+
## Try a vectorizer with sample data
3339

34-
+ A [supported data source](search-indexer-overview.md#supported-data-sources) and a [data source definition](search-howto-create-indexers.md#prepare-a-data-source) for your indexer.
40+
The [Import and vectorize data wizard](search-get-started-portal-import-vectors.md) reads files from Azure Blob storage, creates an index with chunked and vectorized fields, and adds a vectorizer. By design, the vectorizer that's created by the wizard is set to the same embedding model used to index the blob content.
3541

36-
+ A skillset that performs data chunking and vectorization of those chunks. You can omit a skillset if you only want integrated vectorization at query time, or if you don't need chunking or [index projections](index-projections-concept-intro.md) during indexing. This article assumes you already know how to [create a skillset](cognitive-search-defining-skillset.md).
42+
1. [Upload sample data files](/azure/storage/blobs/storage-quickstart-blobs-portal) to a container on Azure Storage. We used some [small text files from NASA's earth book](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/nasa-e-book/earth-txt-10) to test these instructions on a free search service.
43+
44+
1. Run the [Import and vectorize data wizard](search-get-started-portal-import-vectors.md), choosing the blob container for the data source.
3745

38-
+ An index that specifies vector and non-vector fields. This article assumes you already know how to [create a vector store](vector-search-how-to-create-index.md) and covers just the steps for adding vectorizers and field assignments.
46+
:::image type="content" source="media/vector-search-how-to-configure-vectorizer/connect-to-data.png" lightbox="media/vector-search-how-to-configure-vectorizer/connect-to-data.png" alt-text="Screenshot of the connect to your data page.":::
3947

40-
+ An [indexer](search-howto-create-indexers.md) that drives the pipeline.
48+
1. Choose an existing deployment of **text-embedding-ada-002**. This model generates embeddings during indexing and is also used to configure the vectorizer used during queries.
4149

42-
## Define a vectorizer
50+
:::image type="content" source="media/vector-search-how-to-configure-vectorizer/vectorize-enrich-data.png" lightbox="media/vector-search-how-to-configure-vectorizer/vectorize-enrich-data.png" alt-text="Screenshot of the vectorize and enrich data page.":::
4351

44-
1. Use [Create or Update Index (preview)](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true) to add a vectorizer.
52+
1. After the wizard is finished and all indexer processing is complete, you should have an index with a searchable vector field. The field's JSON definition looks like this:
4553

46-
1. Add the following JSON to your index definition. Provide valid values and remove any properties you don't need:
54+
```json
55+
{
56+
"name": "vector",
57+
"type": "Collection(Edm.Single)",
58+
"searchable": true,
59+
"retrievable": true,
60+
"dimensions": 1536,
61+
"vectorSearchProfile": "vector-nasa-ebook-text-profile"
62+
}
63+
```
64+
65+
1. You should also have a vector profile and a vectorizer, similar to the following example:
66+
67+
```json
68+
"profiles": [
69+
{
70+
"name": "vector-nasa-ebook-text-profile",
71+
"algorithm": "vector-nasa-ebook-text-algorithm",
72+
"vectorizer": "vector-nasa-ebook-text-vectorizer"
73+
}
74+
],
75+
"vectorizers": [
76+
{
77+
"name": "vector-nasa-ebook-text-vectorizer",
78+
"kind": "azureOpenAI",
79+
"azureOpenAIParameters": {
80+
"resourceUri": "https://my-fake-azure-openai-resource.openai.azure.com",
81+
"deploymentId": "text-embedding-ada-002",
82+
"apiKey": "0000000000000000000000000000000000000",
83+
"authIdentity": null
84+
},
85+
"customWebApiParameters": null
86+
}
87+
]
88+
```
89+
90+
1. Skip ahead to [test your vectorizer](#test-a-vectorizer) for text-to-vector conversion during query execution.
91+
92+
## Define a vectorizer and vector profile
93+
94+
This section explains the modifications to an index schema for defining a vectorizer manually.
95+
96+
1. Use [Create or Update Index (preview)](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true) to add `vectorizers` to a search index.
97+
98+
1. Add the following JSON to your index definition. The vectorizers section provides connection information to a deployed embedding model. This step shows two vectorizer examples so that you can compare an Azure OpenAI embedding model and a custom web API side by side.
4799

48100
```json
49101
"vectorizers": [
50102
{
51-
"name": "my_open_ai_vectorizer",
103+
"name": "my_azure_open_ai_vectorizer",
52104
"kind": "azureOpenAI",
53105
"azureOpenAIParameters": {
54106
"resourceUri": "https://url.openai.azure.com",
@@ -68,32 +120,19 @@ You can use the [**Import and vectorize data wizard**](search-get-started-portal
68120
]
69121
```
70122

71-
## Define a profile that includes a vectorizer
72-
73-
1. Use [Create or Update Index (preview)](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true) to add a profile.
74-
75-
1. Add a profiles section that specifies combinations of algorithms and vectorizers.
123+
1. In the same index, add a vector profiles section that specifies one of your vectorizers. Vector profiles also require a [vector search algorithm](vector-search-ranking.md) used to create navigation structures.
76124

77125
```json
78126
"profiles": [
79127
{
80-
"name": "my_open_ai_profile",
128+
"name": "my_vector_profile",
81129
"algorithm": "my_hnsw_algorithm",
82-
"vectorizer":"my_open_ai_vectorizer"
83-
},
84-
{
85-
"name": "my_custom_profile",
86-
"algorithm": "my_hnsw_algorithm",
87-
"vectorizer":"my_custom_vectorizer"
130+
"vectorizer":"my_azure_open_ai_vectorizer"
88131
}
89132
]
90133
```
91134

92-
## Assign a vector profile to a field
93-
94-
1. Use [Create or Update Index (preview)](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true) to add field attributes.
95-
96-
1. For each vector field in the fields collection, assign a profile.
135+
1. Assign a vector profile to a vector field. The following example shows a fields collection with the required key field, a title string field, and two vector fields with a vector profile assignment.
97136

98137
```json
99138
"fields": [
@@ -109,58 +148,79 @@ You can use the [**Import and vectorize data wizard**](search-get-started-portal
109148
            "type": "Edm.String"
110149
        },
111150
        {
112-
            "name": "synopsis",
151+
            "name": "vector",
113152
            "type": "Collection(Edm.Single)",
114153
            "dimensions": 1536,
115-
            "vectorSearchProfile": "my_open_ai_profile",
154+
            "vectorSearchProfile": "my_vector_profile",
116155
            "searchable": true,
117-
            "retrievable": true,
118-
            "filterable": false,
119-
            "sortable": false,
120-
            "facetable": false
156+
            "retrievable": true
121157
        },
122158
        {
123-
            "name": "reviews",
159+
            "name": "my-second-vector",
124160
            "type": "Collection(Edm.Single)",
125161
            "dimensions": 1024,
126-
            "vectorSearchProfile": "my_custom_profile",
162+
            "vectorSearchProfile": "my_vector_profile",
127163
            "searchable": true,
128-
            "retrievable": true,
129-
            "filterable": false,
130-
            "sortable": false,
131-
            "facetable": false
132-
        }
164+
            "retrievable": true
165+
}
133166
]
134167
```
135168

136169
## Test a vectorizer
137170

138-
1. [Run the indexer](search-howto-run-reset-indexers.md). When you run the indexer, the following operations occur:
171+
Use a search client to send a query through a vectorizer. This example assumes Visual Studio Code with a REST client and a [sample index](#try-a-vectorizer-with-sample-data).
139172

140-
+ Data retrieval from the supported data source
141-
+ Document cracking
142-
+ Skills processing for data chunking and vectorization
143-
+ Indexing to one or more indexes
173+
1. In Visual Studio Code, provide a search endpoint and [search query API key](search-security-api-keys.md#find-existing-keys):
144174

145-
1. [Query the vector field](vector-search-how-to-query.md) once the indexer is finished. In a query that uses integrated vectorization:
175+
```http
176+
@baseUrl:
177+
@queryApiKey: 00000000000000000000000
178+
```
146179

147-
+ Set `"kind"` to `"text"`.
148-
+ Set `"text"` to the string to be vectorized.
180+
1. Paste in a [vector query request](vector-search-how-to-query.md). Be sure to use a preview REST API version.
149181

150-
```json
151-
"count": true,
152-
"select": "title",
153-
"vectorQueries": [
154-
{
155-
"kind": "text",
156-
"text": "story about horses set in Australia",
157-
"fields": "synopsis",
158-
"k": 5
159-
}
160-
]
161-
```
182+
```http
183+
### Run a query
184+
POST {{baseUrl}}/indexes/vector-nasa-ebook-txt/docs/search?api-version=2023-10-01-preview HTTP/1.1
185+
Content-Type: application/json
186+
api-key: {{queryApiKey}}
187+
188+
{
189+
"count": true,
190+
"select": "title,chunk",
191+
"vectorQueries": [
192+
{
193+
"kind": "text",
194+
"text": "what cloud formations exists in the troposphere",
195+
"fields": "vector",
196+
"k": 3,
197+
"exhaustive": true
198+
}
199+
]
200+
}
201+
```
202+
203+
Key points about the query include:
204+
205+
+ `"kind": "text"` tells the search engine that the input is a text string, and to use the vectorizer associated with the search field.
206+
207+
+ `"text": "what cloud formations exists in the troposphere"` is the text string to vectorize.
208+
209+
+ `"fields": "vector"` is the name of the field to query over. If you use the sample index produced by the wizard, the generated vector field is named `vector`.
210+
211+
1. Send the request. You should get three `k` results, where the first result is the most relevant.
212+
213+
Notice that there are no vectorizer properties to set at query time. The query reads the vectorizer properties, as per the vector profile field assignment in the index.
214+
215+
## Check logs
216+
217+
If you enabled diagnostic logging for your search service, run a Kusto query to confirm query execution on your vector field:
162218

163-
There are no vectorizer properties to set at query time. The query uses the algorithm and vectorizer provided through the profile assignment in the index.
219+
```kusto
220+
OperationEvent
221+
| where TIMESTAMP > ago(30m)
222+
| where Name == "Query.Search" and AdditionalInfo["QueryMetadata"]["Vectors"] has "TextLength"
223+
```
164224

165225
## See also
166226

articles/search/vector-search-integrated-vectorization.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.service: cognitive-search
99
ms.custom:
1010
- ignite-2023
1111
ms.topic: conceptual
12-
ms.date: 11/07/2023
12+
ms.date: 03/27/2024
1313
---
1414

1515
# Integrated data chunking and embedding in Azure AI Search
@@ -77,9 +77,9 @@ We recommend using the built-in vectorization support of Azure AI Studio. If thi
7777

7878
For query-only vectorization:
7979

80-
1. [Add a vectorizer](vector-search-how-to-configure-vectorizer.md#define-a-vectorizer) to an index. It should be the same embedding model used to generate vectors in the index.
81-
1. [Assign the vectorizer](vector-search-how-to-configure-vectorizer.md#assign-a-vector-profile-to-a-field) to the vector field.
82-
1. [Formulate a vector query](vector-search-how-to-query.md#query-with-integrated-vectorization-preview) that specifies the text string to vectorize.
80+
1. [Add a vectorizer](vector-search-how-to-configure-vectorizer.md#define-a-vectorizer-and-vector-profile) to an index. It should be the same embedding model used to generate vectors in the index.
81+
1. [Assign the vectorizer](vector-search-how-to-configure-vectorizer.md#define-a-vectorizer-and-vector-profile) to a vector profile, and then assign a vector profile to the vector field.
82+
1. [Formulate a vector query](vector-search-how-to-configure-vectorizer.md#test-a-vectorizer) that specifies the text string to vectorize.
8383

8484
A more common scenario - data chunking and vectorization during indexing:
8585

0 commit comments

Comments
 (0)