Skip to content

Commit de82193

Browse files
committed
edits for readability
1 parent 0ac281b commit de82193

File tree

1 file changed

+8
-10
lines changed

1 file changed

+8
-10
lines changed

articles/search/vector-search-how-to-create-index.md

Lines changed: 8 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ ms.date: 02/14/2025
1414

1515
# Create a vector index
1616

17-
In Azure AI Search, you can store vectors in a search index. A vector store on Azure AI Search has an index schema that defines both vector and nonvector fields. It also has a vector configuration for algorithms used to create and compress the embedding space.
17+
In Azure AI Search, you can store vectors in a search index and send vector queries to match on semantic similarity. A vector store in Azure AI Search has an index schema that defines both vector and nonvector fields. It also has a vector configuration for algorithms used to create and compress the embedding space.
1818

1919
[Create or Update Index API](/rest/api/searchservice/indexes/create-or-update) creates the vector store. Follow these steps to index vectors in Azure AI Search:
2020

@@ -54,18 +54,18 @@ Make sure your documents provide the following content:
5454
| Content | Description |
5555
|---------|-------------|
5656
| Unique identifier | A field or a metadata property that uniquely identifies each document. All search indexes require a document key. To satisfy document key requirements, a source document must have one field or property uniquely identifies it in the index. If you're indexing blobs, it might be the metadata_storage_path that uniquely identifies each blob. If you're indexing from a database, it might be primary key. This source field must be mapped to an index field of type `Edm.String` and `key=true` in the search index.|
57-
| Non-vector content | Provide other fields with human-readable content. Human readable content is useful for the query response, and for hybrid query scenarios that include full text search or semantic ranking in the same request. If you're using a chat completion model, the content you provide to the model is in plain text. |
57+
| Non-vector content | Provide other fields with human-readable content. Human readable content is useful for the query response, and for hybrid query scenarios that include full text search or semantic ranking in the same request. If you're using a chat completion model, most models like ChatGPT don't accept raw vectors as input. |
5858
| Vector content| A vectorized version of your non-vector content. A vector is an array of single-precision floating point numbers generated by an embedding model. Each vector field contains an array generated by an embedding model, one embedding per field, where the field is a top-level field (not part of a nested or complex type). For the simplest integration, we recommend the embedding models in [Azure OpenAI](https://aka.ms/oai/access), such as an **text-embedding-3** model for text documents or the [Image Retrieval REST API](/rest/api/computervision/image-retrieval/vectorize-image) for images. <br><br>If you can take a dependency on indexers and skillsets, consider using [integrated vectorization](vector-search-integrated-vectorization.md) that encodes images and textual content during indexing. Your field definitions are for vector fields, but incoming source data can be text or images, which are converted to vector arrays during indexing. |
5959

6060
Your search index should include fields and content for all of the query scenarios you want to support. Suppose you want to search or filter over product names, versions, metadata, or addresses. In this case, vector similarity search isn't especially helpful. Keyword search, geo-search, or filters would be a better choice. A search index that includes a comprehensive collection of both vector and nonvector fields provides maximum flexibility for query construction and response composition.
6161

6262
A short example of a documents payload that includes vector and nonvector fields is in the [load vector data](#load-vector-data-for-indexing) section of this article.
6363

64-
## Create a basic index
64+
## Start with a basic index
6565

6666
Start with a minimum schema so that you have a definition to work with before adding a vector configuration and vector fields. A simple index might look the following example. You can learn more about an index schema in [Create a search index](search-how-to-create-search-index.md).
6767

68-
Notice that it has a required name, a required document key (`"key": true`), and human readable content in plain text. It's common to have a human-readable version of whatever content you intend to vectorize. For example, if you have a chunk of text from a PDF file, your index schema should have the plain text content, coupled with a vector field equivalent that you add in a later step.
68+
Notice that it has a required name, a required document key (`"key": true`), and fields for human readable content in plain text. It's common to have a human-readable version of whatever content you intend to vectorize. For example, if you have a chunk of text from a PDF file, your index schema should have the plain text equivalent of the vectorized text.
6969

7070
```http
7171
POST https://[servicename].search.windows.net/indexes?api-version=[api-version]
@@ -84,29 +84,27 @@ POST https://[servicename].search.windows.net/indexes?api-version=[api-version]
8484

8585
## Add a vector search configuration
8686

87-
In this step, add a vector search configuration to your schema. In the schema, you might add it after "suggesters", "scoringProfiles", or "analyzers".
87+
Next, add a vector search configuration to your schema. Configuration occurs before field definitions because you specify one on each field as part of its definition. In the schema, vector configuration is typically added after the fields collection, perhaps after `"suggesters"`, `"scoringProfiles"`, or `"analyzers"`.
8888

89-
A vector configuration specifies the parameters used during indexing to create "nearest neighbor" information among the vector nodes:
89+
A vector configuration specifies the parameters used during indexing to create "nearest neighbor" information among the vector nodes. Algorithms include:
9090

9191
+ Hierarchical Navigable Small World (HNSW)
9292
+ Exhaustive k-Nearest Neighbor (KNN)
9393

94-
If you choose HNSW on a field, you can opt in for exhaustive KNN at query time. But the other direction doesn’t work: if you choose exhaustive, you can’t later request HNSW search because the extra data structures that enable approximate search don’t exist.
94+
If you choose HNSW on a field, you can opt in for exhaustive KNN at query time. But the other direction doesn’t work: if you choose exhaustive for indexing, you can’t later request HNSW search because the extra data structures that enable approximate search don’t exist.
9595

9696
Optionally, a vector configuration also specifies quantization methods for reducing vector size:
9797

9898
+ Scalar
9999
+ Binary (available in 2024-07-01 only and in newer Azure SDK packages)
100100

101-
For instructions on how to migrate to the latest version, see [Upgrade REST API](search-api-migration.md).
102-
103101
### [**2024-07-01**](#tab/config-2024-07-01)
104102

105103
[**2024-07-01**](/rest/api/searchservice/search-service-api-versions#2024-07-01) is generally available. It supports a vector configuration having:
106104

107105
+ `vectorSearch.algorithms` support HNSW and exhaustive KNN.
108106
+ `vectorSearch.compressions` support scalar and binary quantization, oversampling, and reranking with original vectors.
109-
+ `vectorSearch.profiles` provide for multiple combinations of algorithm and compression configurations.
107+
+ `vectorSearch.profiles` for specifying multiple combinations of algorithm and compression configurations.
110108

111109
Be sure to have a strategy for [vectorizing your content](vector-search-how-to-generate-embeddings.md). We recommend [integrated vectorization](vector-search-integrated-vectorization.md) and [query-time vectorizers](vector-search-how-to-configure-vectorizer.md) for built-in encoding.
112110

0 commit comments

Comments
 (0)