Skip to content

Commit 1570332

Browse files
Merge pull request #246920 from HeidiSteen/heidist-vectors
Updates to vector index doc
2 parents 01c749a + 6693f08 commit 1570332

File tree

1 file changed

+19
-10
lines changed

1 file changed

+19
-10
lines changed

articles/search/vector-search-how-to-create-index.md

Lines changed: 19 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -15,33 +15,40 @@ ms.date: 07/31/2023
1515
> [!IMPORTANT]
1616
> Vector search is in public preview under [supplemental terms of use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). It's available through the Azure portal, preview REST API, and [alpha SDKs](https://github.com/Azure/cognitive-search-vector-pr#readme).
1717
18-
In Azure Cognitive Search, vector data is represented in fields in a [search index](search-what-is-an-index.md). A vector field is of type `Collection(Edm.Single)` so that it can hold single-precision floating-point values. It also has a "dimensions" property and a "vectorConfiguration" property that names which configuration is used to create the embedding space.
18+
In Azure Cognitive Search, vector data is indexed as *vector fields* within a [search index](search-what-is-an-index.md), using a *vector configuration* to create the embedding space.
19+
20+
+ A vector field is of type `Collection(Edm.Single)` so that it can hold single-precision floating-point values. It also has a "dimensions" property and a "vectorConfiguration" property.
21+
22+
+ A vector configuration specifies the algorithm and parameters used during query execution to find the nearest neighbors in a similarity search. You can have multiple configurations within an index, but each vector field is assigned to just one.
1923

2024
## Prerequisites
2125

22-
+ Azure Cognitive Search, in any region and on any tier.
26+
+ Azure Cognitive Search, in any region and on any tier.
2327

2428
Most existing services support vector search. For a small subset of services created prior to January 2019, an index containing vector fields fails on creation. In this situation, a new service must be created.
2529

2630
+ Pre-existing vector embeddings in your source documents. Cognitive Search doesn't generate vectors. We recommend [Azure OpenAI embedding models](/azure/ai-services/openai/concepts/models#embeddings-models) but you can use any model for vectorization.
2731

2832
+ You should know the dimensions limit of the model used to create the embeddings and how similarity is computed. For **text-embedding-ada-002**, the length of the numerical vector is 1546. Similarity is computed using `cosine`.
2933

30-
Be sure to use the same embedding model for both indexing and queries. During query execution, your workflow must call an embedding model that converts the user's query string into a vector. For more information, see [Create and use embeddings for search queries and documents](vector-search-how-to-generate-embeddings.md).
34+
A short example of a documents payload that includes vector and non-vector fields is in the [load vector data](#load-vector-data-for-indexing) section of this article.
35+
36+
> [!NOTE]
37+
> During query execution, your workflow must call an embedding model that converts the user's query string into a vector. Be sure to use the same embedding model for both queries and indexing. For more information, see [Create and use embeddings for search queries and documents](vector-search-how-to-generate-embeddings.md).
3138
3239
## Prepare documents for indexing
3340

34-
Prior to indexing, assemble a document payload that includes fields vector data. The document structure must conform to the index schema.
41+
Prior to indexing, assemble a document payload that includes fields of vector data. The document structure must conform to the index schema.
3542

3643
Make sure your documents:
3744

38-
1. Provide a field or a metadata property that uniquely identifies each document. All search indexes require a document key. Your documents must have one field or property that can be mapped to type `Edm.String` and `key=true` in the search index.
45+
1. Provide a field or a metadata property that uniquely identifies each document. All search indexes require a document key. To satisfy document key requirements, your documents must have one field or property that can be mapped to type `Edm.String` and `key=true` in the search index.
3946

4047
1. Provide vector data (an array of single-precision floating point numbers) in source fields.
4148

4249
Vector fields contain numeric data generated by embedding models, one embedding per field. We recommend the embedding models in [Azure OpenAI](https://aka.ms/oai/access), such as **text-embedding-ada-002** for text documents or the [Image Retrieval REST API](/rest/api/computervision/2023-02-01-preview/image-retrieval/vectorize-image) for images.
4350

44-
1. Provide other fields with alphanumeric content for the query response and for hybrid query scenarios that include full text search or semantic ranking in the same request.
51+
1. Provide other fields with human-readable alphanumeric content for the query response, and for hybrid query scenarios that include full text search or semantic ranking in the same request.
4552

4653
Your search index should include fields and content for all of the query scenarios you want to support. Suppose you want to search or filter over product names, versions, metadata, or addresses. In this case, similarity search isn't especially helpful. Keyword search, geo-search, or filters would be a better choice. A search index that includes a comprehensive field collection of vector and non-vector data provides maximum flexibility for query construction and response composition.
4754

@@ -50,11 +57,13 @@ Your search index should include fields and content for all of the query scenari
5057
The schema must include fields for the document key, vector fields, and any other fields that you require for hybrid search scenarios.
5158

5259
> [!NOTE]
53-
> Vectors are added to fields in a search index. Internally, a *vector index* is created for each vector field, but indexing and queries target fields in a search index, and not the vector indexes directly.
60+
> Vectors are added to fields in a search index. Internally, a *vector index* is created for each vector field, but those indexes are considered internal. In Cognitive Search, indexing and query workloads target the fields in a search index, and not the vector indexes directly.
5461
5562
### [**Azure portal**](#tab/portal-add-field)
5663

57-
If the index doesn't have a vector configuration, you're prompted to create one when you add your first vector field to the index.
64+
You can use the index designer in the Azure portal to add vector field definitions. If the index doesn't have a vector configuration, you're prompted to create one when you add your first vector field to the index.
65+
66+
Currently, there's no portal support for loading vector data into fields.
5867

5968
1. [Sign in to Azure portal](https://portal.azure.com) and open your search service page in a browser.
6069

@@ -82,7 +91,7 @@ If the index doesn't have a vector configuration, you're prompted to create one
8291

8392
+ Name the configuration. The name must be unique within the index.
8493
+ "hnsw" is the Approximate Nearest Neighbors (ANN) algorithm used to find similar vectors. Currently, only Hierarchical Navigable Small World (HNSW) is supported.
85-
+ "Bi-directional link count" default is 4. The range is 2 to 100. Lower values (lower recall) should return less noise in the results.
94+
+ "Bi-directional link count" default is 4. The range is 2 to 100. Lower values should return less noise in the results.
8695
+ "efConstruction" default is 400. It's the number of nearest neighbors used during indexing.
8796
+ "efSearch default is 500. It's the number of nearest neighbors used during search.
8897
+ "Similarity metric" should be "cosine" if you're using Azure OpenAI, otherwise use the similarity metric of the embedding model. Supported values are `cosine`, `dotProduct`, `euclidean`.
@@ -99,7 +108,7 @@ Updating an existing index with vector fields requires `allowIndexDowntime` quer
99108

100109
1. Use the [Create or Update Index Preview REST API](/rest/api/searchservice/preview-api/create-or-update-index) to create the index.
101110

102-
1. Add a `vectorSearch` section in the index that specifies the algorithm used to create the embedding space. Currently, only `"hnsw"` is supported. For "metric", valid values are `cosine`, `euclidean`, and `dotProduct`. The `cosine` metric is specified because it's the similarity metric that the Azure OpenAI models use to create embeddings.
111+
1. Add a `vectorSearch` section in the index that specifies the similarity algorithm used to create the embedding space. Currently, only `"hnsw"` is supported. For "metric", valid values are `cosine`, `euclidean`, and `dotProduct`. The `cosine` metric is specified because it's the similarity metric that the Azure OpenAI models use to create embeddings.
103112

104113
```json
105114
"vectorSearch": {

0 commit comments

Comments
 (0)