You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/vector-search-how-to-create-index.md
+19-10Lines changed: 19 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -15,33 +15,40 @@ ms.date: 07/31/2023
15
15
> [!IMPORTANT]
16
16
> Vector search is in public preview under [supplemental terms of use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). It's available through the Azure portal, preview REST API, and [alpha SDKs](https://github.com/Azure/cognitive-search-vector-pr#readme).
17
17
18
-
In Azure Cognitive Search, vector data is represented in fields in a [search index](search-what-is-an-index.md). A vector field is of type `Collection(Edm.Single)` so that it can hold single-precision floating-point values. It also has a "dimensions" property and a "vectorConfiguration" property that names which configuration is used to create the embedding space.
18
+
In Azure Cognitive Search, vector data is indexed as *vector fields* within a [search index](search-what-is-an-index.md), using a *vector configuration* to create the embedding space.
19
+
20
+
+ A vector field is of type `Collection(Edm.Single)` so that it can hold single-precision floating-point values. It also has a "dimensions" property and a "vectorConfiguration" property.
21
+
22
+
+ A vector configuration specifies the algorithm and parameters used during query execution to find the nearest neighbors in a similarity search. You can have multiple configurations within an index, but each vector field is assigned to just one.
19
23
20
24
## Prerequisites
21
25
22
-
+ Azure Cognitive Search, in any region and on any tier.
26
+
+ Azure Cognitive Search, in any region and on any tier.
23
27
24
28
Most existing services support vector search. For a small subset of services created prior to January 2019, an index containing vector fields fails on creation. In this situation, a new service must be created.
25
29
26
30
+ Pre-existing vector embeddings in your source documents. Cognitive Search doesn't generate vectors. We recommend [Azure OpenAI embedding models](/azure/ai-services/openai/concepts/models#embeddings-models) but you can use any model for vectorization.
27
31
28
32
+ You should know the dimensions limit of the model used to create the embeddings and how similarity is computed. For **text-embedding-ada-002**, the length of the numerical vector is 1546. Similarity is computed using `cosine`.
29
33
30
-
Be sure to use the same embedding model for both indexing and queries. During query execution, your workflow must call an embedding model that converts the user's query string into a vector. For more information, see [Create and use embeddings for search queries and documents](vector-search-how-to-generate-embeddings.md).
34
+
A short example of a documents payload that includes vector and non-vector fields is in the [load vector data](#load-vector-data-for-indexing) section of this article.
35
+
36
+
> [!NOTE]
37
+
> During query execution, your workflow must call an embedding model that converts the user's query string into a vector. Be sure to use the same embedding model for both queries and indexing. For more information, see [Create and use embeddings for search queries and documents](vector-search-how-to-generate-embeddings.md).
31
38
32
39
## Prepare documents for indexing
33
40
34
-
Prior to indexing, assemble a document payload that includes fields vector data. The document structure must conform to the index schema.
41
+
Prior to indexing, assemble a document payload that includes fields of vector data. The document structure must conform to the index schema.
35
42
36
43
Make sure your documents:
37
44
38
-
1. Provide a field or a metadata property that uniquely identifies each document. All search indexes require a document key. Your documents must have one field or property that can be mapped to type `Edm.String` and `key=true` in the search index.
45
+
1. Provide a field or a metadata property that uniquely identifies each document. All search indexes require a document key. To satisfy document key requirements, your documents must have one field or property that can be mapped to type `Edm.String` and `key=true` in the search index.
39
46
40
47
1. Provide vector data (an array of single-precision floating point numbers) in source fields.
41
48
42
49
Vector fields contain numeric data generated by embedding models, one embedding per field. We recommend the embedding models in [Azure OpenAI](https://aka.ms/oai/access), such as **text-embedding-ada-002** for text documents or the [Image Retrieval REST API](/rest/api/computervision/2023-02-01-preview/image-retrieval/vectorize-image) for images.
43
50
44
-
1. Provide other fields with alphanumeric content for the query response and for hybrid query scenarios that include full text search or semantic ranking in the same request.
51
+
1. Provide other fields with human-readable alphanumeric content for the query response, and for hybrid query scenarios that include full text search or semantic ranking in the same request.
45
52
46
53
Your search index should include fields and content for all of the query scenarios you want to support. Suppose you want to search or filter over product names, versions, metadata, or addresses. In this case, similarity search isn't especially helpful. Keyword search, geo-search, or filters would be a better choice. A search index that includes a comprehensive field collection of vector and non-vector data provides maximum flexibility for query construction and response composition.
47
54
@@ -50,11 +57,13 @@ Your search index should include fields and content for all of the query scenari
50
57
The schema must include fields for the document key, vector fields, and any other fields that you require for hybrid search scenarios.
51
58
52
59
> [!NOTE]
53
-
> Vectors are added to fields in a search index. Internally, a *vector index* is created for each vector field, but indexing and queries target fields in a search index, and not the vector indexes directly.
60
+
> Vectors are added to fields in a search index. Internally, a *vector index* is created for each vector field, but those indexes are considered internal. In Cognitive Search, indexing and query workloads target the fields in a search index, and not the vector indexes directly.
54
61
55
62
### [**Azure portal**](#tab/portal-add-field)
56
63
57
-
If the index doesn't have a vector configuration, you're prompted to create one when you add your first vector field to the index.
64
+
You can use the index designer in the Azure portal to add vector field definitions. If the index doesn't have a vector configuration, you're prompted to create one when you add your first vector field to the index.
65
+
66
+
Currently, there's no portal support for loading vector data into fields.
58
67
59
68
1.[Sign in to Azure portal](https://portal.azure.com) and open your search service page in a browser.
60
69
@@ -82,7 +91,7 @@ If the index doesn't have a vector configuration, you're prompted to create one
82
91
83
92
+ Name the configuration. The name must be unique within the index.
84
93
+ "hnsw" is the Approximate Nearest Neighbors (ANN) algorithm used to find similar vectors. Currently, only Hierarchical Navigable Small World (HNSW) is supported.
85
-
+ "Bi-directional link count" default is 4. The range is 2 to 100. Lower values (lower recall) should return less noise in the results.
94
+
+ "Bi-directional link count" default is 4. The range is 2 to 100. Lower values should return less noise in the results.
86
95
+ "efConstruction" default is 400. It's the number of nearest neighbors used during indexing.
87
96
+ "efSearch default is 500. It's the number of nearest neighbors used during search.
88
97
+ "Similarity metric" should be "cosine" if you're using Azure OpenAI, otherwise use the similarity metric of the embedding model. Supported values are `cosine`, `dotProduct`, `euclidean`.
@@ -99,7 +108,7 @@ Updating an existing index with vector fields requires `allowIndexDowntime` quer
99
108
100
109
1. Use the [Create or Update Index Preview REST API](/rest/api/searchservice/preview-api/create-or-update-index) to create the index.
101
110
102
-
1. Add a `vectorSearch` section in the index that specifies the algorithm used to create the embedding space. Currently, only `"hnsw"` is supported. For "metric", valid values are `cosine`, `euclidean`, and `dotProduct`. The `cosine` metric is specified because it's the similarity metric that the Azure OpenAI models use to create embeddings.
111
+
1. Add a `vectorSearch` section in the index that specifies the similarity algorithm used to create the embedding space. Currently, only `"hnsw"` is supported. For "metric", valid values are `cosine`, `euclidean`, and `dotProduct`. The `cosine` metric is specified because it's the similarity metric that the Azure OpenAI models use to create embeddings.
0 commit comments