You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> This feature is in public preview under [Supplemental Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). The [2023-10-01-Preview REST API](/rest/api/searchservice/operation-groups?view=rest-searchservice-2023-10-01-preview&preserve-view=true) supports this feature.
19
19
20
-
A *vectorizer* is a component of a [search index](search-what-is-an-index.md)that specifies a vectorization agent, such as a deployed embedding model on Azure OpenAI that converts text to vectors. You can define a vectorizer once, and then reference it in the vector profile assigned to a vector field.
20
+
In Azure AI Search a *vectorizer* is software that performs vectorization, such as a deployed embedding model on Azure OpenAI, that converts text to vectors during query execution.
21
21
22
-
A vectorizer is used for queries. It allows the search service to vectorize a text query on your behalf.
22
+
It's defined in a [search index](search-what-is-an-index.md), it applies to searchable vector fields, and it's used at query time to generate an embedding for a text query input. If instead you need to vectorize text as part of the indexing process, refer to [Integrated Vectorization (Preview)](vector-search-integrated-vectorization.md). For built-in vectorization during indexing, you can configure an indexer and skillset that calls an Azure OpenAI embedding model for your raw text content.
23
23
24
-
If you need to vectorize data as part of the indexing process refer to [Integrated Vectorization (Preview)](vector-search-integrated-vectorization.md).
25
-
26
-
You can use the [**Import and vectorize data wizard**](search-get-started-portal-import-vectors.md), the [2023-10-01-Preview](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true) REST APIs, or any Azure beta SDK package that's been updated to provide this feature.
24
+
To add a vectorizer to search index, you can use the index designer in Azure portal, or call the [Create or Update Index 2023-10-01-preview](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true) REST API, or use any Azure beta SDK package that's updated to provide this feature.
27
25
28
26
## Prerequisites
29
27
30
-
+ A deployed embedding model on Azure OpenAI, or a custom skill that wraps an embedding model.
28
+
+[An index with searchable vector fields](vector-search-how-to-create-index.md) on Azure AI Search.
29
+
30
+
+ A deployed embedding model, such as **text-embedding-ada-002** on Azure OpenAI. It's used to vectorize a query. It must be identical to the model used to generate the embeddings in your index.
31
+
32
+
+ Permissions to use the embedding model. If you're using Azure OpenAI, the caller must have [Cognitive Services OpenAI User](/azure/ai-services/openai/how-to/role-based-access-control#azure-openai-roles) permissions. Or, you can provide an API key.
33
+
34
+
+[Visual Studio Code](https://code.visualstudio.com/download) with a [REST client](https://marketplace.visualstudio.com/items?itemName=humao.rest-client) to send the query and accept a response.
35
+
36
+
We recommend that you enable diagnostic logging on your search service to confirm vector query execution.
31
37
32
-
+ Permissions to upload a payload to the embedding model. The connection to a vectorizer is specified in the skillset. If you're using Azure OpenAI, the caller must have [Cognitive Services OpenAI User](/azure/ai-services/openai/how-to/role-based-access-control#azure-openai-roles) permissions.
38
+
## Try a vectorizer with sample data
33
39
34
-
+ A [supported data source](search-indexer-overview.md#supported-data-sources)and a [data source definition](search-howto-create-indexers.md#prepare-a-data-source) for your indexer.
40
+
The [Import and vectorize data wizard](search-get-started-portal-import-vectors.md) reads files from Azure Blob storage, creates an index with chunked and vectorized fields, and adds a vectorizer. By design, the vectorizer that's created by the wizard is set to the same embedding model used to index the blob content.
35
41
36
-
+ A skillset that performs data chunking and vectorization of those chunks. You can omit a skillset if you only want integrated vectorization at query time, or if you don't need chunking or [index projections](index-projections-concept-intro.md) during indexing. This article assumes you already know how to [create a skillset](cognitive-search-defining-skillset.md).
42
+
1.[Upload sample data files](/azure/storage/blobs/storage-quickstart-blobs-portal) to a container on Azure Storage. We used some [small text files from NASA's earth book](https://github.com/Azure-Samples/azure-search-sample-data/tree/main/nasa-e-book/earth-txt-10) to test these instructions on a free search service.
43
+
44
+
1. Run the [Import and vectorize data wizard](search-get-started-portal-import-vectors.md), choosing the blob container for the data source.
37
45
38
-
+ An index that specifies vector and non-vector fields. This article assumes you already know how to [create a vector store](vector-search-how-to-create-index.md) and covers just the steps for adding vectorizers and field assignments.
46
+
:::image type="content" source="media/vector-search-how-to-configure-vectorizer/connect-to-data.png" lightbox="media/vector-search-how-to-configure-vectorizer/connect-to-data.png" alt-text="Screenshot of the connect to your data page.":::
39
47
40
-
+ An [indexer](search-howto-create-indexers.md) that drives the pipeline.
48
+
1. Choose an existing deployment of **text-embedding-ada-002**. This model generates embeddings during indexing and is also used to configure the vectorizer used during queries.
41
49
42
-
## Define a vectorizer
50
+
:::image type="content" source="media/vector-search-how-to-configure-vectorizer/vectorize-enrich-data.png" lightbox="media/vector-search-how-to-configure-vectorizer/vectorize-enrich-data.png" alt-text="Screenshot of the vectorize and enrich data page.":::
43
51
44
-
1.Use [Create or Update Index (preview)](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true) to add a vectorizer.
52
+
1.After the wizard is finished and all indexer processing is complete, you should have an index with a searchable vector field. The field's JSON definition looks like this:
45
53
46
-
1. Add the following JSON to your index definition. Provide valid values and remove any properties you don't need:
1. Skip ahead to [test your vectorizer](#test-a-vectorizer) for text-to-vector conversion during query execution.
91
+
92
+
## Define a vectorizer and vector profile
93
+
94
+
This section explains the modifications to an index schema for defining a vectorizer manually.
95
+
96
+
1. Use [Create or Update Index (preview)](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true) to add `vectorizers` to a search index.
97
+
98
+
1. Add the following JSON to your index definition. The vectorizers section provides connection information to a deployed embedding model. This step shows two vectorizer examples so that you can compare an Azure OpenAI embedding model and a custom web API side by side.
47
99
48
100
```json
49
101
"vectorizers": [
50
102
{
51
-
"name": "my_open_ai_vectorizer",
103
+
"name": "my_azure_open_ai_vectorizer",
52
104
"kind": "azureOpenAI",
53
105
"azureOpenAIParameters": {
54
106
"resourceUri": "https://url.openai.azure.com",
@@ -68,32 +120,19 @@ You can use the [**Import and vectorize data wizard**](search-get-started-portal
68
120
]
69
121
```
70
122
71
-
## Define a profile that includes a vectorizer
72
-
73
-
1. Use [Create or Update Index (preview)](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true) to add a profile.
74
-
75
-
1. Add a profiles section that specifies combinations of algorithms and vectorizers.
123
+
1. In the same index, add a vector profiles section that specifies one of your vectorizers. Vector profiles also require a [vector search algorithm](vector-search-ranking.md) used to create navigation structures.
76
124
77
125
```json
78
126
"profiles": [
79
127
{
80
-
"name": "my_open_ai_profile",
128
+
"name": "my_vector_profile",
81
129
"algorithm": "my_hnsw_algorithm",
82
-
"vectorizer":"my_open_ai_vectorizer"
83
-
},
84
-
{
85
-
"name": "my_custom_profile",
86
-
"algorithm": "my_hnsw_algorithm",
87
-
"vectorizer":"my_custom_vectorizer"
130
+
"vectorizer":"my_azure_open_ai_vectorizer"
88
131
}
89
132
]
90
133
```
91
134
92
-
## Assign a vector profile to a field
93
-
94
-
1. Use [Create or Update Index (preview)](/rest/api/searchservice/indexes/create-or-update?view=rest-searchservice-2023-10-01-preview&preserve-view=true) to add field attributes.
95
-
96
-
1. For each vector field in the fields collection, assign a profile.
135
+
1. Assign a vector profile to a vector field. The following example shows a fields collection with the required key field, a title string field, and two vector fields with a vector profile assignment.
97
136
98
137
```json
99
138
"fields": [
@@ -109,58 +148,79 @@ You can use the [**Import and vectorize data wizard**](search-get-started-portal
109
148
"type": "Edm.String"
110
149
},
111
150
{
112
-
"name": "synopsis",
151
+
"name": "vector",
113
152
"type": "Collection(Edm.Single)",
114
153
"dimensions": 1536,
115
-
"vectorSearchProfile": "my_open_ai_profile",
154
+
"vectorSearchProfile": "my_vector_profile",
116
155
"searchable": true,
117
-
"retrievable": true,
118
-
"filterable": false,
119
-
"sortable": false,
120
-
"facetable": false
156
+
"retrievable": true
121
157
},
122
158
{
123
-
"name": "reviews",
159
+
"name": "my-second-vector",
124
160
"type": "Collection(Edm.Single)",
125
161
"dimensions": 1024,
126
-
"vectorSearchProfile": "my_custom_profile",
162
+
"vectorSearchProfile": "my_vector_profile",
127
163
"searchable": true,
128
-
"retrievable": true,
129
-
"filterable": false,
130
-
"sortable": false,
131
-
"facetable": false
132
-
}
164
+
"retrievable": true
165
+
}
133
166
]
134
167
```
135
168
136
169
## Test a vectorizer
137
170
138
-
1. [Run the indexer](search-howto-run-reset-indexers.md). When you run the indexer, the following operations occur:
171
+
Use a search client to send a query through a vectorizer. This example assumes Visual Studio Code with a REST client and a [sample index](#try-a-vectorizer-with-sample-data).
139
172
140
-
+ Data retrieval from the supported data source
141
-
+ Document cracking
142
-
+ Skills processing for data chunking and vectorization
143
-
+ Indexing to one or more indexes
173
+
1. In Visual Studio Code, provide a search endpoint and [search query API key](search-security-api-keys.md#find-existing-keys):
144
174
145
-
1. [Query the vector field](vector-search-how-to-query.md) once the indexer is finished. In a query that uses integrated vectorization:
175
+
```http
176
+
@baseUrl:
177
+
@queryApiKey: 00000000000000000000000
178
+
```
146
179
147
-
+ Set `"kind"` to `"text"`.
148
-
+ Set `"text"` to the string to be vectorized.
180
+
1. Paste in a [vector query request](vector-search-how-to-query.md). Be sure to use a preview REST API version.
149
181
150
-
```json
151
-
"count": true,
152
-
"select": "title",
153
-
"vectorQueries": [
154
-
{
155
-
"kind": "text",
156
-
"text": "story about horses set in Australia",
157
-
"fields": "synopsis",
158
-
"k": 5
159
-
}
160
-
]
161
-
```
182
+
```http
183
+
### Run a query
184
+
POST {{baseUrl}}/indexes/vector-nasa-ebook-txt/docs/search?api-version=2023-10-01-preview HTTP/1.1
185
+
Content-Type: application/json
186
+
api-key: {{queryApiKey}}
187
+
188
+
{
189
+
"count": true,
190
+
"select": "title,chunk",
191
+
"vectorQueries": [
192
+
{
193
+
"kind": "text",
194
+
"text": "what cloud formations exists in the troposphere",
195
+
"fields": "vector",
196
+
"k": 3,
197
+
"exhaustive": true
198
+
}
199
+
]
200
+
}
201
+
```
202
+
203
+
Key points about the query include:
204
+
205
+
+`"kind": "text"` tells the search engine that the input is a text string, and to use the vectorizer associated with the search field.
206
+
207
+
+`"text": "what cloud formations exists in the troposphere"` is the text string to vectorize.
208
+
209
+
+`"fields": "vector"` is the name of the field to query over. If you use the sample index produced by the wizard, the generated vector field is named `vector`.
210
+
211
+
1. Send the request. You should get three `k` results, where the first result is the most relevant.
212
+
213
+
Notice that there are no vectorizer properties to set at query time. The query reads the vectorizer properties, as per the vector profile field assignment in the index.
214
+
215
+
## Check logs
216
+
217
+
If you enabled diagnostic logging for your search service, run a Kusto query to confirm query execution on your vector field:
162
218
163
-
There are no vectorizer properties to set at query time. The query uses the algorithm and vectorizer provided through the profile assignment in the index.
219
+
```kusto
220
+
OperationEvent
221
+
| where TIMESTAMP > ago(30m)
222
+
| where Name == "Query.Search" and AdditionalInfo["QueryMetadata"]["Vectors"] has "TextLength"
Copy file name to clipboardExpand all lines: articles/search/vector-search-integrated-vectorization.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,7 @@ ms.service: cognitive-search
9
9
ms.custom:
10
10
- ignite-2023
11
11
ms.topic: conceptual
12
-
ms.date: 11/07/2023
12
+
ms.date: 03/27/2024
13
13
---
14
14
15
15
# Integrated data chunking and embedding in Azure AI Search
@@ -77,9 +77,9 @@ We recommend using the built-in vectorization support of Azure AI Studio. If thi
77
77
78
78
For query-only vectorization:
79
79
80
-
1.[Add a vectorizer](vector-search-how-to-configure-vectorizer.md#define-a-vectorizer) to an index. It should be the same embedding model used to generate vectors in the index.
81
-
1.[Assign the vectorizer](vector-search-how-to-configure-vectorizer.md#assign-a-vector-profile-to-a-field) to the vector field.
82
-
1.[Formulate a vector query](vector-search-how-to-query.md#query-with-integrated-vectorization-preview) that specifies the text string to vectorize.
80
+
1.[Add a vectorizer](vector-search-how-to-configure-vectorizer.md#define-a-vectorizer-and-vector-profile) to an index. It should be the same embedding model used to generate vectors in the index.
81
+
1.[Assign the vectorizer](vector-search-how-to-configure-vectorizer.md#define-a-vectorizer-and-vector-profile) to a vector profile, and then assign a vector profile to the vector field.
82
+
1.[Formulate a vector query](vector-search-how-to-configure-vectorizer.md#test-a-vectorizer) that specifies the text string to vectorize.
83
83
84
84
A more common scenario - data chunking and vectorization during indexing:
0 commit comments