You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/vector-search-how-to-create-index.md
+47-25Lines changed: 47 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,27 +15,21 @@ ms.date: 08/10/2023
15
15
> [!IMPORTANT]
16
16
> Vector search is in public preview under [supplemental terms of use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). It's available through the Azure portal, preview REST API, and [beta client libraries](https://github.com/Azure/cognitive-search-vector-pr#readme).
17
17
18
-
In Azure Cognitive Search, vector data is indexed as *vector fields* in a [search index](search-what-is-an-index.md), using a *vector configuration* to specify the embedding space. Do this to create an index schema that contains vector data:
18
+
In Azure Cognitive Search, vector data is indexed as *vector fields* in a [search index](search-what-is-an-index.md), using a *vector configuration* to specify the embedding space. Follow these steps to index vector data:
19
19
20
-
+ Add one or more vector fields of type `Collection(Edm.Single)`. This type holds single-precision floating-point values. A field of this type also has a "dimensions" property and a "vectorConfiguration" property.
21
-
22
-
+ Add one or more vector configurations. A configuration specifies the algorithm and parameters used during indexing to create "nearest neighbor" information among the vector nodes. Currently, only Hierarchical Navigable Small World (HNSW) is supported.
23
-
24
-
During indexing, HNSW determines how closely the vectors match and stores the neighborhood information as a proximity graph in the index. You can have multiple configurations within an index if you want different HNSW parameter combinations. As long as the vector fields contain embeddings from the same model, having a different vector configuration per field has no effect on queries.
25
-
26
-
[Loading the index with vector data](#load-vector-data-for-indexing) is a separate step that can occur once the index definition is in place.
20
+
> [!div class="checklist"]
21
+
> + Add one or more vector fields to the index schema.
22
+
> + Add one or more vector configurations to the index schema.
23
+
> + Load the index with vector data [as a separate step](#load-vector-data-for-indexing), after the index schema is defined.
27
24
28
25
## Prerequisites
29
26
30
27
+ Azure Cognitive Search, in any region and on any tier. Most existing services support vector search. For a small subset of services created prior to January 2019, an index containing vector fields fails on creation. In this situation, a new service must be created.
31
28
32
-
+ Pre-existing vector embeddings in your source documents. Cognitive Search doesn't generate vectors. We recommend [Azure OpenAI embedding models](/azure/ai-services/openai/concepts/models#embeddings-models) but you can use any model for vectorization.
29
+
+ Pre-existing vector embeddings in your source documents. Cognitive Search doesn't generate vectors. We recommend [Azure OpenAI embedding models](/azure/ai-services/openai/concepts/models#embeddings-models) but you can use any model for vectorization. For more information, see [Create and use embeddings for search queries and documents](vector-search-how-to-generate-embeddings.md).
33
30
34
31
+ You should know the dimensions limit of the model used to create the embeddings and how similarity is computed. In Azure OpenAI, for **text-embedding-ada-002**, the length of the numerical vector is 1536. Similarity is computed using `cosine`.
35
32
36
-
> [!NOTE]
37
-
> During query execution, your workflow must call an embedding model that converts the user's query string into a vector. Be sure to use the same embedding model for both queries and indexing. For more information, see [Create and use embeddings for search queries and documents](vector-search-how-to-generate-embeddings.md).
38
-
39
33
## Prepare documents for indexing
40
34
41
35
Prior to indexing, assemble a document payload that includes fields of vector and non-vector data. The document structure must conform to the index schema.
@@ -56,13 +50,21 @@ A short example of a documents payload that includes vector and non-vector field
56
50
57
51
## Add a vector field to the fields collection
58
52
59
-
The schema must include a `vectorConfiguration`` section, a field for the document key, vector fields, and any other fields that you require for hybrid search scenarios.
53
+
The schema must include a `vectorConfiguration` section, a field for the document key, vector fields, and any other fields that you need for hybrid search scenarios.
54
+
55
+
+`vectorConfiguration` specifies the algorithm and parameters used during indexing to create "nearest neighbor" information among the vector nodes. Currently, only Hierarchical Navigable Small World (HNSW) is supported.
56
+
57
+
+ Vector fields are of type `Collection(Edm.Single)` and single-precision floating-point values. A field of this type also has a `dimensions` property and a `vectorConfiguration` property
58
+
59
+
During indexing, HNSW determines how closely the vectors match and stores the neighborhood information as a proximity graph in the index. You can have multiple configurations within an index if you want different HNSW parameter combinations. As long as the vector fields contain embeddings from the same model, having a different vector configuration per field has no effect on queries.
60
+
61
+
You can use the Azure portal, REST APIs, or the beta packages of the Azure SDKs to index vectors.
60
62
61
63
### [**Azure portal**](#tab/portal-add-field)
62
64
63
-
You can use the index designer in the Azure portal to add vector field definitions. If the index doesn't have a vector configuration, you're prompted to create one when you add your first vector field to the index.
65
+
Use the index designer in the Azure portal to add vector field definitions. If the index doesn't have a vector configuration, you're prompted to create one when you add your first vector field to the index.
64
66
65
-
Although you can add a field definition, there's no portal support for loading vectors into fields. Use the REST APIs or an SDK for data import.
67
+
Although you can add a field to an index, there's no portal (Import data wizard) support for loading it with vector data. Instead, use the REST APIs or an SDK for data import.
66
68
67
69
1.[Sign in to Azure portal](https://portal.azure.com) and open your search service page in a browser.
68
70
@@ -95,15 +97,15 @@ Although you can add a field definition, there's no portal support for loading v
95
97
+ "efSearch default is 500. It's the number of nearest neighbors used during search.
96
98
+ "Similarity metric" should be "cosine" if you're using Azure OpenAI, otherwise use the similarity metric of the embedding model. Supported values are `cosine`, `dotProduct`, `euclidean`.
97
99
98
-
If you're familiar with HNSW parameters, you might be wondering about "k" number of nearest neighbors to return in the result. In Cognitive Search, that value is set on the query request.
100
+
If you're familiar with HNSW parameters, you might be wondering about how to set the "k" number of nearest neighbors to return in the result. In Cognitive Search, that value is set on the [query request](vector-search-how-to-query.md).
99
101
100
102
1. Select **Save** to save the vector configuration and the field definition.
101
103
102
104
### [**REST API**](#tab/rest-add-field)
103
105
104
-
In the following example, "title" and "content" contain textual content used in full text search and semantic search, while "titleVector" and "contentVector" contain vector data.
106
+
Use the **2023-07-01-Prevew** REST API for vector scenarios. If you're updating an existing index to include vector fields, make sure the `allowIndexDowntime` query parameter is set to `true`.
105
107
106
-
Updating an existing index with vector fields requires `allowIndexDowntime` query parameter to be `true`.
108
+
In the following REST API example, "title" and "content" contain textual content used in full text search and semantic search, while "titleVector" and "contentVector" contain vector data.
107
109
108
110
1. Use the [Create or Update Index Preview REST API](/rest/api/searchservice/preview-api/create-or-update-index) to create the index.
109
111
@@ -202,13 +204,31 @@ Updating an existing index with vector fields requires `allowIndexDowntime` quer
202
204
}
203
205
```
204
206
207
+
### [**.NET**](#tab/dotnet-add-field)
208
+
209
+
+ Use the [**Azure.Search.Documents 11.5.0-beta.4**](https://www.nuget.org/packages/Azure.Search.Documents/11.5.0-beta.4) package for vector scenarios.
210
+
211
+
+ See the [cognitive-search-vector-pr](https://github.com/Azure/cognitive-search-vector-pr/tree/main/demo-dotnet) GitHub repository for .NET code samples.
212
+
213
+
### [**Python**](#tab/python-add-field)
214
+
215
+
+ Use the [**Azure.Search.Documents 11.4.0b8**](https://pypi.org/project/azure-search-documents/11.4.0b8/) package for vector scenarios.
216
+
217
+
+ See the [cognitive-search-vector-pr](https://github.com/Azure/cognitive-search-vector-pr/tree/main/demo-python) GitHub repository for Python code samples.
218
+
219
+
### [**JavaScript**](#tab/js-add-field)
220
+
221
+
+ Use the [**@azure/search-documents 12.0.0-beta.2**](https://www.npmjs.com/package/@azure/search-documents/v/12.0.0-beta.2) package for vector scenarios.
222
+
223
+
+ See the [cognitive-search-vector-pr](https://github.com/Azure/cognitive-search-vector-pr/tree/main/demo-javascript) GitHub repository for JavaScript code samples.
224
+
205
225
---
206
226
207
227
## Load vector data for indexing
208
228
209
-
Content that you provide for indexing must conform to the index schema and include a unique string value for the document key. Vector data is loaded into one or more vector fields, which can coexist with other fields containing alphanumeric text.
229
+
Content that you provide for indexing must conform to the index schema and include a unique string value for the document key. Vector data is loaded into one or more vector fields, which can coexist with other fields containing alphanumeric content.
210
230
211
-
You can use either [push or pull methodologies](search-what-is-data-import.md) for data ingestion. You can't use the portal for this step.
231
+
You can use either [push or pull methodologies](search-what-is-data-import.md) for data ingestion. You can't use the portal (Import data wizard) for this step.
212
232
213
233
### [**Push APIs**](#tab/push)
214
234
@@ -279,13 +299,15 @@ For validation purposes, you can query the index using Search Explorer in Azure
279
299
280
300
Fields must be attributed as "retrievable" to be included in the results.
281
301
282
-
### [**Azure portal**](#tab/portal-add-field)
302
+
### [**Azure portal**](#tab/portal-check-index)
283
303
284
-
You can use [Search Explorer](search-explorer.md) to query an index. Search explorer has two views: Query view (default) and JSON view. The default query view is for full text search only. You can issue an empty search (`search=*`) to return all fields, including vector fields, as a quick check to confirm the presence of vector content.
304
+
You can use [Search Explorer](search-explorer.md) to query an index. Search explorer has two views: Query view (default) and JSON view.
285
305
286
-
If you want to execute a vector query, use the JSON view and paste in a JSON definition of a vector query. For more information, see [Query vector data in a search index](vector-search-how-to-query.md).
306
+
+[Use the JSON view for vector queries](vector-search-how-to-query.md), pasting in a JSON definition of the vector query you want to execute.
287
307
288
-
### [**REST API**](#tab/rest-add-field)
308
+
+ Use the default Query view for a quick confirmation that the index contains vectors. The query view is for full text search. Although you can't use it for vector queries, you can send an empty search (`search=*`) to check for content. The content of all fields, including vector fields, is returned as plain text.
309
+
310
+
### [**REST API**](#tab/rest-check-index)
289
311
290
312
The following REST API example is a vector query, but it returns only non-vector fields (title, content, category). Only fields marked as "retrievable" can be returned in search results.
291
313
@@ -315,4 +337,4 @@ api-key: {{admin-api-key}}
315
337
316
338
As a next step, we recommend [Query vector data in a search index](vector-search-how-to-query.md).
317
339
318
-
You might also consider reviewing the demo code for [Python](https://github.com/Azure/cognitive-search-vector-pr/tree/main/demo-python) or [C#](https://github.com/Azure/cognitive-search-vector-pr/tree/main/demo-dotnet).
340
+
You might also consider reviewing the demo code for [Python](https://github.com/Azure/cognitive-search-vector-pr/tree/main/demo-python), [C#](https://github.com/Azure/cognitive-search-vector-pr/tree/main/demo-dotnet) or [JavaScript](https://github.com/Azure/cognitive-search-vector-pr/tree/main/demo-javascript).
Copy file name to clipboardExpand all lines: articles/search/vector-search-how-to-query.md
+26-4Lines changed: 26 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,9 +15,9 @@ ms.date: 08/10/2023
15
15
> [!IMPORTANT]
16
16
> Vector search is in public preview under [supplemental terms of use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). It's available through the Azure portal, preview REST API, and [beta client libraries](https://github.com/Azure/cognitive-search-vector-pr#readme).
17
17
18
-
In Azure Cognitive Search, if you added vector fields to a search index, this article explains how to query those fields. It also explains how to combine vector queries with full text search and semantic search for hybrid query combination scenarios.
18
+
In Azure Cognitive Search, if you added vector fields to a search index, this article explains how to query those fields. It also explains how to combine vector queries with full text search and semantic search for *hybrid query* combination scenarios.
19
19
20
-
Query execution in Cognitive Search doesn't include vector conversion of the input string. Encoding (text-to-vector) of the query string requires that you pass the text to an embedding model for vectorization. You would then pass the output of the call to the embedding model to the search engine for similarity search over vector fields.
20
+
Cognitive Search doesn't provide built-in vectorization of the input string. Encoding (text-to-vector) of the query string requires that you pass the string to an embedding model for vectorization. You would then pass the output of the call to the embedding model to the search engine for similarity search over vector fields.
21
21
22
22
All results are returned in plain text, including vectors. If you use Search Explorer in the Azure portal to query an index that contains vectors, the numeric vectors are returned in plain text. Because numeric vectors aren't useful in search results, choose other fields in the index as a proxy for the vector match. For example, if an index has "descriptionVector" and "descriptionText" fields, the query can match on "descriptionVector" but the search result shows "descriptionText". Use the `select` parameter to specify only human-readable fields in the results.
23
23
@@ -45,7 +45,9 @@ You can also send an empty query (`search=*`) against the index. If the vector f
45
45
46
46
To query a vector field, the query itself must be a vector. To convert a text query string provided by a user into a vector representation, your application must call an embedding library that provides this capability. Use the same embedding library that you used to generate embeddings in the source documents.
47
47
48
-
Here's an example of a query string submitted to a deployment of an Azure OpenAI model:
48
+
You can find multiple instances of query string conversion in the [cognitive-search-vector-pr](https://github.com/Azure/cognitive-search-vector-pr/) repository for each of the Azure SDKs.
49
+
50
+
Here's a REST API example of a query string submitted to a deployment of an Azure OpenAI model:
49
51
50
52
```http
51
53
POST https://{{openai-service-name}}.openai.azure.com/openai/deployments/{{openai-deployment-name}}/embeddings?api-version={{openai-api-version}}
@@ -86,6 +88,8 @@ The actual response for this POST call to the deployment model includes 1536 emb
86
88
87
89
## Query syntax for vector search
88
90
91
+
You can use the Azure portal, REST APIs, or the beta packages of the Azure SDKs to query vectors.
92
+
89
93
### [**Azure portal**](#tab/portal-vector-query)
90
94
91
95
Be sure to the **JSON view** and formulate the query in JSON. The search bar in **Query view** is for full text search and will treat any vector input as plain text.
@@ -100,7 +104,7 @@ Be sure to the **JSON view** and formulate the query in JSON. The search bar in
100
104
101
105
:::image type="content" source="media/vector-search-how-to-query/select-json-view.png" alt-text="Screenshot of the index list." border="true":::
102
106
103
-
1. By default, the search API is 2023-07-01-Preview. This is the correct API version for vector search.
107
+
1. By default, the search API is **2023-07-01-Preview**. This is the correct API version for vector search.
104
108
105
109
1. Paste in a JSON vector query, and then select **Search**. You can use the REST example as a template for your JSON query.
106
110
@@ -136,6 +140,24 @@ The response includes 5 matches, and each result provides a search score, title,
136
140
137
141
Notice that "select" returns textual fields from the index. Although the vector field is "retrievable" in this example, its content isn't usable as a search result.
138
142
143
+
### [**.NET**](#tab/dotnet-vector-query)
144
+
145
+
+ Use the [**Azure.Search.Documents 11.5.0-beta.4**](https://www.nuget.org/packages/Azure.Search.Documents/11.5.0-beta.4) package for vector scenarios.
146
+
147
+
+ See the [cognitive-search-vector-pr](https://github.com/Azure/cognitive-search-vector-pr/tree/main/demo-dotnet) GitHub repository for .NET code samples.
148
+
149
+
### [**Python**](#tab/python-vector-query)
150
+
151
+
+ Use the [**Azure.Search.Documents 11.4.0b8**](https://pypi.org/project/azure-search-documents/11.4.0b8/) package for vector scenarios.
152
+
153
+
+ See the [cognitive-search-vector-pr](https://github.com/Azure/cognitive-search-vector-pr/tree/main/demo-python) GitHub repository for Python code samples.
154
+
155
+
### [**JavaScript**](#tab/js-vector-query)
156
+
157
+
+ Use the [**@azure/search-documents 12.0.0-beta.2**](https://www.npmjs.com/package/@azure/search-documents/v/12.0.0-beta.2) package for vector scenarios.
158
+
159
+
+ See the [cognitive-search-vector-pr](https://github.com/Azure/cognitive-search-vector-pr/tree/main/demo-javascript) GitHub repository for JavaScript code samples.
0 commit comments