You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/vector-search-how-to-create-index.md
+8-10Lines changed: 8 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -14,7 +14,7 @@ ms.date: 02/14/2025
14
14
15
15
# Create a vector index
16
16
17
-
In Azure AI Search, you can store vectors in a search index. A vector store on Azure AI Search has an index schema that defines both vector and nonvector fields. It also has a vector configuration for algorithms used to create and compress the embedding space.
17
+
In Azure AI Search, you can store vectors in a search index and send vector queries to match on semantic similarity. A vector store in Azure AI Search has an index schema that defines both vector and nonvector fields. It also has a vector configuration for algorithms used to create and compress the embedding space.
18
18
19
19
[Create or Update Index API](/rest/api/searchservice/indexes/create-or-update) creates the vector store. Follow these steps to index vectors in Azure AI Search:
20
20
@@ -54,18 +54,18 @@ Make sure your documents provide the following content:
54
54
| Content | Description |
55
55
|---------|-------------|
56
56
| Unique identifier | A field or a metadata property that uniquely identifies each document. All search indexes require a document key. To satisfy document key requirements, a source document must have one field or property uniquely identifies it in the index. If you're indexing blobs, it might be the metadata_storage_path that uniquely identifies each blob. If you're indexing from a database, it might be primary key. This source field must be mapped to an index field of type `Edm.String` and `key=true` in the search index.|
57
-
| Non-vector content | Provide other fields with human-readable content. Human readable content is useful for the query response, and for hybrid query scenarios that include full text search or semantic ranking in the same request. If you're using a chat completion model, the content you provide to the model is in plain text. |
57
+
| Non-vector content | Provide other fields with human-readable content. Human readable content is useful for the query response, and for hybrid query scenarios that include full text search or semantic ranking in the same request. If you're using a chat completion model, most models like ChatGPT don't accept raw vectors as input. |
58
58
| Vector content| A vectorized version of your non-vector content. A vector is an array of single-precision floating point numbers generated by an embedding model. Each vector field contains an array generated by an embedding model, one embedding per field, where the field is a top-level field (not part of a nested or complex type). For the simplest integration, we recommend the embedding models in [Azure OpenAI](https://aka.ms/oai/access), such as an **text-embedding-3** model for text documents or the [Image Retrieval REST API](/rest/api/computervision/image-retrieval/vectorize-image) for images. <br><br>If you can take a dependency on indexers and skillsets, consider using [integrated vectorization](vector-search-integrated-vectorization.md) that encodes images and textual content during indexing. Your field definitions are for vector fields, but incoming source data can be text or images, which are converted to vector arrays during indexing. |
59
59
60
60
Your search index should include fields and content for all of the query scenarios you want to support. Suppose you want to search or filter over product names, versions, metadata, or addresses. In this case, vector similarity search isn't especially helpful. Keyword search, geo-search, or filters would be a better choice. A search index that includes a comprehensive collection of both vector and nonvector fields provides maximum flexibility for query construction and response composition.
61
61
62
62
A short example of a documents payload that includes vector and nonvector fields is in the [load vector data](#load-vector-data-for-indexing) section of this article.
63
63
64
-
## Create a basic index
64
+
## Start with a basic index
65
65
66
66
Start with a minimum schema so that you have a definition to work with before adding a vector configuration and vector fields. A simple index might look the following example. You can learn more about an index schema in [Create a search index](search-how-to-create-search-index.md).
67
67
68
-
Notice that it has a required name, a required document key (`"key": true`), and human readable content in plain text. It's common to have a human-readable version of whatever content you intend to vectorize. For example, if you have a chunk of text from a PDF file, your index schema should have the plain text content, coupled with a vector field equivalent that you add in a later step.
68
+
Notice that it has a required name, a required document key (`"key": true`), and fields for human readable content in plain text. It's common to have a human-readable version of whatever content you intend to vectorize. For example, if you have a chunk of text from a PDF file, your index schema should have the plain text equivalent of the vectorized text.
69
69
70
70
```http
71
71
POST https://[servicename].search.windows.net/indexes?api-version=[api-version]
@@ -84,29 +84,27 @@ POST https://[servicename].search.windows.net/indexes?api-version=[api-version]
84
84
85
85
## Add a vector search configuration
86
86
87
-
In this step, add a vector search configuration to your schema. In the schema, you might add it after "suggesters", "scoringProfiles", or "analyzers".
87
+
Next, add a vector search configuration to your schema. Configuration occurs before field definitions because you specify one on each field as part of its definition. In the schema, vector configuration is typically added after the fields collection, perhaps after `"suggesters"`, `"scoringProfiles"`, or `"analyzers"`.
88
88
89
-
A vector configuration specifies the parameters used during indexing to create "nearest neighbor" information among the vector nodes:
89
+
A vector configuration specifies the parameters used during indexing to create "nearest neighbor" information among the vector nodes. Algorithms include:
90
90
91
91
+ Hierarchical Navigable Small World (HNSW)
92
92
+ Exhaustive k-Nearest Neighbor (KNN)
93
93
94
-
If you choose HNSW on a field, you can opt in for exhaustive KNN at query time. But the other direction doesn’t work: if you choose exhaustive, you can’t later request HNSW search because the extra data structures that enable approximate search don’t exist.
94
+
If you choose HNSW on a field, you can opt in for exhaustive KNN at query time. But the other direction doesn’t work: if you choose exhaustive for indexing, you can’t later request HNSW search because the extra data structures that enable approximate search don’t exist.
95
95
96
96
Optionally, a vector configuration also specifies quantization methods for reducing vector size:
97
97
98
98
+ Scalar
99
99
+ Binary (available in 2024-07-01 only and in newer Azure SDK packages)
100
100
101
-
For instructions on how to migrate to the latest version, see [Upgrade REST API](search-api-migration.md).
102
-
103
101
### [**2024-07-01**](#tab/config-2024-07-01)
104
102
105
103
[**2024-07-01**](/rest/api/searchservice/search-service-api-versions#2024-07-01) is generally available. It supports a vector configuration having:
106
104
107
105
+`vectorSearch.algorithms` support HNSW and exhaustive KNN.
108
106
+`vectorSearch.compressions` support scalar and binary quantization, oversampling, and reranking with original vectors.
109
-
+`vectorSearch.profiles`provide for multiple combinations of algorithm and compression configurations.
107
+
+`vectorSearch.profiles` for specifying multiple combinations of algorithm and compression configurations.
110
108
111
109
Be sure to have a strategy for [vectorizing your content](vector-search-how-to-generate-embeddings.md). We recommend [integrated vectorization](vector-search-integrated-vectorization.md) and [query-time vectorizers](vector-search-how-to-configure-vectorizer.md) for built-in encoding.
0 commit comments