Skip to content

Commit 0904cec

Browse files
committed
edits for readability
1 parent 9d73332 commit 0904cec

File tree

1 file changed

+32
-22
lines changed

1 file changed

+32
-22
lines changed

articles/search/vector-search-how-to-create-index.md

Lines changed: 32 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -14,17 +14,19 @@ ms.date: 02/14/2025
1414

1515
# Create a vector index
1616

17-
In Azure AI Search, you can store vectors in a search index. Your vector store has an index schema that defines both vector and nonvector fields, a vector configuration for algorithms used to create and compress the embedding space, and settings on vector field definitions used in query requests.
17+
In Azure AI Search, you can store vectors in a search index. A vector store on Azure AI Search has an index schema that defines both vector and nonvector fields. It also has a vector configuration for algorithms used to create and compress the embedding space.
1818

19-
The [Create or Update Index API](/rest/api/searchservice/indexes/create-or-update) creates the vector store. Follow these steps to index vectors in Azure AI Search:
19+
[Create or Update Index API](/rest/api/searchservice/indexes/create-or-update) creates the vector store. Follow these steps to index vectors in Azure AI Search:
2020

2121
> [!div class="checklist"]
2222
> + Start with a basic schema definition
2323
> + Add vector algorithms and optional compression
2424
> + Add vector field definitions
2525
> + Load prevectorized data [as a separate step](#load-vector-data-for-indexing), or use [integrated vectorization](vector-search-integrated-vectorization.md) for data chunking and encoding during indexing
2626
27-
This article explains the workflow and uses REST for illustration. Once you understand the basic workflow, continue with the Azure SDK code samples in the [azure-search-vector-samples](https://github.com/Azure/azure-search-vector-samples) repository for guidance on using these features in test and production code.
27+
This article explains the workflow and uses REST for illustration.
28+
29+
Once you understand the basic workflow, continue with the Azure SDK code samples in the [azure-search-vector-samples](https://github.com/Azure/azure-search-vector-samples) repository for guidance on using vectors in test and production code.
2830

2931
> [!TIP]
3032
> You can also use the Azure portal to [create a vector index](search-get-started-portal-import-vectors.md) and try out integrated data chunking and vectorization.
@@ -33,52 +35,48 @@ This article explains the workflow and uses REST for illustration. Once you unde
3335

3436
+ Azure AI Search, in any region and on any tier. If you plan to use [integrated vectorization](vector-search-integrated-vectorization.md), Azure AI Search must be in the same region as the embedding models hosted on Azure OpenAI or in Azure AI Vision.
3537

36-
+ If you aren't using integrated vectorization, your source documents must have [vector embeddings](vector-search-how-to-generate-embeddings.md) to upload to the index.
38+
+ Your source documents must have [vector embeddings](vector-search-how-to-generate-embeddings.md) to upload to the index. Or, you can use [integrated vectorization](vector-search-integrated-vectorization.md) for this step.
3739

38-
+ You should know the dimensions limit of the model used to create the embeddings so that you can assign that limit to the vector field. Integrated vectorization supports a finite number of embedding models. For **text-embedding-ada-002**, dimensions are fixed at 1536. For **text-embedding-3-small** or **text-embedding-3-large**, the vector length ranges from 1 to 1536 and 3072, respectively.
40+
+ You should know the dimensions limit of the model used to create the embeddings so that you can assign that limit to the vector field. For **text-embedding-ada-002**, dimensions are fixed at 1536. For **text-embedding-3-small** or **text-embedding-3-large**, dimensions range from 1 to 1536 and 3072, respectively.
3941

4042
+ You should also know what similarity metric to use. For embedding models on Azure OpenAI, similarity is [computed using `cosine`](/azure/ai-services/openai/concepts/understand-embeddings#cosine-similarity).
4143

4244
+ You should be familiar with [creating an index](search-how-to-create-search-index.md). The schema must include a field for the document key, other fields you want to search or filter, and other configurations for behaviors needed during indexing and queries.
4345

4446
## Limitations
4547

46-
+ For search services created before January 2019, there's a small subset that can't create a vector index. If this applies to you, create a new service to use vectors.
48+
For search services created before January 2019, there's a small subset that can't create a vector index. If this applies to you, create a new service to use vectors.
4749

4850
## Prepare documents for indexing
4951

5052
Before indexing, assemble a document payload that includes fields of vector and nonvector data. The document structure must conform to the index schema.
5153

52-
Make sure your documents:
53-
54-
1. Provide a field or a metadata property that uniquely identifies each document. All search indexes require a document key. To satisfy document key requirements, a source document must have one field or property uniquely identifies it in the index. If you're indexing blobs, it might be the metadata_storage_path. This source field must be mapped to an index field of type `Edm.String` and `key=true` in the search index.
55-
56-
1. Provide vector data (an array of single-precision floating point numbers) in source fields.
57-
58-
Vector fields contain an array generated by an embedding model, one embedding per field, where the field is a top-level field (not part of a nested or complex type). For the simplest integration, we recommend the embedding models in [Azure OpenAI](https://aka.ms/oai/access), such as an **text-embedding-3** model for text documents or the [Image Retrieval REST API](/rest/api/computervision/image-retrieval/vectorize-image) for images.
59-
60-
If you can take a dependency on indexers and skillsets, consider using [integrated vectorization](vector-search-integrated-vectorization.md) that encodes images and textual content during indexing. Your field definitions are for vector fields, but incoming source data can be text or images, which are converted to vector arrays during indexing.
54+
Make sure your documents provide the following content:
6155

62-
1. Provide other fields with human-readable content for the query response, and for hybrid query scenarios that include full text search or semantic ranking in the same request.
56+
| Content | Description |
57+
|---------|-------------|
58+
| Unique identifier | A field or a metadata property that uniquely identifies each document. All search indexes require a document key. To satisfy document key requirements, a source document must have one field or property uniquely identifies it in the index. If you're indexing blobs, it might be the metadata_storage_path. This source field must be mapped to an index field of type `Edm.String` and `key=true` in the search index.|
59+
| Non-vector content | Provide other fields with human-readable content. Human readable content is useful for the query response, and for hybrid query scenarios that include full text search or semantic ranking in the same request. If you're using a chat completion model, the content you provide to the model is in plain text. |
60+
| Vector content| A vectorized version of your non-vector content. A vector is an array of single-precision floating point numbers generated by an embedding model. Each vector field contains an array generated by an embedding model, one embedding per field, where the field is a top-level field (not part of a nested or complex type). For the simplest integration, we recommend the embedding models in [Azure OpenAI](https://aka.ms/oai/access), such as an **text-embedding-3** model for text documents or the [Image Retrieval REST API](/rest/api/computervision/image-retrieval/vectorize-image) for images. <br><br>If you can take a dependency on indexers and skillsets, consider using [integrated vectorization](vector-search-integrated-vectorization.md) that encodes images and textual content during indexing. Your field definitions are for vector fields, but incoming source data can be text or images, which are converted to vector arrays during indexing. |
6361

64-
Your search index should include fields and content for all of the query scenarios you want to support. Suppose you want to search or filter over product names, versions, metadata, or addresses. In this case, similarity search isn't especially helpful. Keyword search, geo-search, or filters would be a better choice. A search index that includes a comprehensive field collection of vector and nonvector data provides maximum flexibility for query construction and response composition.
62+
Your search index should include fields and content for all of the query scenarios you want to support. Suppose you want to search or filter over product names, versions, metadata, or addresses. In this case, vector similarity search isn't especially helpful. Keyword search, geo-search, or filters would be a better choice. A search index that includes a comprehensive collection of both vector and nonvector fields provides maximum flexibility for query construction and response composition.
6563

6664
A short example of a documents payload that includes vector and nonvector fields is in the [load vector data](#load-vector-data-for-indexing) section of this article.
6765

6866
## Create a basic index
6967

7068
Start with a minimum schema so that you have a definition to work with before adding a vector configuration and vector fields. A simple index might look the following example. You can learn more about an index schema in [Create a search index](search-how-to-create-search-index.md).
7169

72-
Notice that it has a required name, a required document key, and human readable content. It's common to have a human-readable version of whatever content you intend to vectorize. For example, if you have a chunk of text from a PDF file, your index schema should have the plain text content, coupled with a vector field equivalent that you add in a later step.
70+
Notice that it has a required name, a required document key (`"key": true`), and human readable content in plain text. It's common to have a human-readable version of whatever content you intend to vectorize. For example, if you have a chunk of text from a PDF file, your index schema should have the plain text content, coupled with a vector field equivalent that you add in a later step.
7371

7472
```http
7573
POST https://[servicename].search.windows.net/indexes?api-version=[api-version]
7674
{
7775
"name": "example-index",
7876
"fields": [
7977
{ "name": "documentId", "type": "Edm.String", "key": true, "retrievable": true, "searchable": true, "filterable": true },
80-
{ "name": "humanReadableNameField", "type": "Edm.String", "retrievable": true, "searchable": true, "filterable": false, "sortable": true, "facetable": false },
81-
{ "name": "humanReadableContentField", "type": "Edm.String", "retrievable": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "analyzer": "en.microsoft" },
78+
{ "name": "myHumanReadableNameField", "type": "Edm.String", "retrievable": true, "searchable": true, "filterable": false, "sortable": true, "facetable": false },
79+
{ "name": "myHumanReadableContentField", "type": "Edm.String", "retrievable": true, "searchable": true, "filterable": false, "sortable": false, "facetable": false, "analyzer": "en.microsoft" },
8280
],
8381
"suggesters": [ ],
8482
"scoringProfiles": [ ],
@@ -298,7 +296,19 @@ For more information about new preview features, see [What's new in Azure AI Sea
298296

299297
The fields collection must include a field for the document key, vector fields, and any other fields that you need for hybrid search scenarios.
300298

301-
Vector fields are characterized by [their data type](/rest/api/searchservice/supported-data-types#edm-data-types-for-vector-fields), a `dimensions` property based on the embedding model used to output the vectors, and a vector profile.
299+
Vector fields are characterized by [their data type](/rest/api/searchservice/supported-data-types#edm-data-types-for-vector-fields), a `dimensions` property based on the embedding model used to output the vectors, and a vector profile that you created in a previous step.
300+
301+
```json
302+
{
303+
"name": "contentVector",
304+
"type": "Collection(Edm.Single)",
305+
"searchable": true,
306+
"retrievable": false,
307+
"stored": false,
308+
"dimensions": 1536,
309+
"vectorSearchProfile": "vector-profile-1"
310+
}
311+
```
302312

303313
### [**2024-07-01**](#tab/rest-2024-07-01)
304314

@@ -636,7 +646,7 @@ Key points include:
636646
+ A few modifications can be made with no rebuild requirement:
637647

638648
+ Add new fields to a fields collection.
639-
+ Add new vector configurations, assigned to new fields but not existing fields that have already been vectorized.
649+
+ Add new vector configurations, assigned to new fields but not existing fields that are already vectorized.
640650
+ Change "retrievable" (values are true or false) on an existing field. Vector fields must be searchable and retrievable, but if you want to disable access to a vector field in situations where drop and rebuild isn't feasible, you can set retrievable to false.
641651

642652
## Next steps

0 commit comments

Comments
 (0)