You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/search/vector-search-how-to-create-index.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,5 @@
1
1
---
2
-
title: Add vector search
2
+
title: Create a vector store
3
3
titleSuffix: Azure AI Search
4
4
description: Create or update a search index to include vector fields.
5
5
@@ -9,17 +9,17 @@ ms.service: cognitive-search
9
9
ms.custom:
10
10
- ignite-2023
11
11
ms.topic: how-to
12
-
ms.date: 11/27/2023
12
+
ms.date: 01/29/2024
13
13
---
14
14
15
-
# Add vector fields to a search index
15
+
# Create a vector store
16
16
17
-
In Azure AI Search, vector data is indexed as *vector fields* in a [search index](search-what-is-an-index.md).
17
+
In Azure AI Search, vector data is indexed and stored as *vector fields* in a [search index](search-what-is-an-index.md).
18
18
19
19
Follow these steps to index vector data:
20
20
21
21
> [!div class="checklist"]
22
-
> +Add one or more vector configurations to an index schema.
22
+
> +Define a schema with one or more vector configurations that specifies algorithms for indexing and search
23
23
> + Add one or more vector fields.
24
24
> + Load the index with vector data [as a separate step](#load-vector-data-for-indexing), or use [integrated vectorization (preview)](vector-search-integrated-vectorization.md) for data chunking and encoding during indexing.
25
25
@@ -30,9 +30,9 @@ This article applies to the generally available, non-preview version of [vector
30
30
31
31
## Prerequisites
32
32
33
-
+ Azure AI Search, in any region and on any tier. Most existing services support vector search. For services created prior to January 2019, there's a small subset that support vector search. If an index containing vector fields fails to be created or updated, this is an indicator. In this situation, a new service must be created.
33
+
+ Azure AI Search, in any region and on any tier. Most existing services support vector search. For services created prior to January 2019, there's a small subset that don't support vector search. If an index containing vector fields fails to be created or updated, this is an indicator. In this situation, a new service must be created.
34
34
35
-
+ Pre-existing vector embeddings in your source documents. Azure AI Search doesn't generate vectors in the generally available version of vector search. We recommend [Azure OpenAI embedding models](/azure/ai-services/openai/concepts/models#embeddings-models) but you can use any model for vectorization. For more information, see [Generate embeddings](vector-search-how-to-generate-embeddings.md).
35
+
+ Pre-existing vector embeddings in your source documents. Azure AI Search doesn't generate vectors in the generally available version of the Azure SDKs and REST APIs. We recommend [Azure OpenAI embedding models](/azure/ai-services/openai/concepts/models#embeddings-models) but you can use any model for vectorization. For more information, see [Generate embeddings](vector-search-how-to-generate-embeddings.md).
36
36
37
37
+ You should know the dimensions limit of the model used to create the embeddings and how similarity is computed. In Azure OpenAI, for **text-embedding-ada-002**, the length of the numerical vector is 1536. Similarity is computed using `cosine`.
Copy file name to clipboardExpand all lines: articles/search/vector-search-overview.md
+22-17Lines changed: 22 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,49 +9,54 @@ ms.service: cognitive-search
9
9
ms.custom:
10
10
- ignite-2023
11
11
ms.topic: conceptual
12
-
ms.date: 12/05/2023
12
+
ms.date: 01/29/2024
13
13
---
14
14
15
-
# Vector search in Azure AI Search
15
+
# Vector stores and vector search in Azure AI Search
16
16
17
-
Vector search is an approach in information retrieval that uses numeric representations of content for search scenarios. Because the content is numeric rather than plain text, the search engine matches on vectors that are the most similar to the query, with no requirement for matching on exact terms.
17
+
Vector search is an approach in information retrieval that stores numeric representations of content for search scenarios. Because the content is numeric rather than plain text, the search engine matches on vectors that are the most similar to the query, with no requirement for matching on exact terms.
18
18
19
19
This article is a high-level introduction to vector support in Azure AI Search. It also explains integration with other Azure services and covers [terminology and concepts](#vector-search-concepts) related to vector search development.
20
20
21
21
We recommend this article for background, but if you'd rather get started, follow these steps:
22
22
23
23
> [!div class="checklist"]
24
-
> +[Generate vector embeddings](vector-search-how-to-generate-embeddings.md) before you start, or try out [integrated vectorization (preview)](vector-search-integrated-vectorization.md).
25
-
> +[Add vector fields to an index](vector-search-how-to-create-index.md).
26
-
> +[Load vector data](search-what-is-data-import.md) into an index using push or pull methodologies.
27
-
> +[Query vector data](vector-search-how-to-query.md) using the Azure portal, REST APIs, or Azure SDK packages.
24
+
> +[Provide embeddings](vector-search-how-to-generate-embeddings.md) or [generate embeddings (preview)](vector-search-integrated-vectorization.md)
25
+
> +[Create a vector store](vector-search-how-to-create-index.md)
You could also begin with the [vector quickstart](search-get-started-vector.md) or the [code samples on GitHub](https://github.com/Azure/azure-search-vector-samples).
30
29
31
-
Vector search is in the Azure portal and the Azure SDKs for [.NET](https://www.nuget.org/packages/Azure.Search.Documents), [Python](https://pypi.org/project/azure-search-documents), and [JavaScript](https://www.npmjs.com/package/@azure/search-documents/v/12.0.0-beta.2).
32
-
33
30
## What's vector search in Azure AI Search?
34
31
35
-
Vector search is a new capability for indexing, storing, and retrieving vector embeddings from a search index. You can use it to power similarity search, multi-modal search, recommendations engines, or applications implementing the [Retrieval Augmented Generation (RAG) architecture](https://aka.ms/what-is-rag).
32
+
Vector search is a new capability for indexing, storing, and querying vector embeddings from a search index. You can use it to power similarity search, multimodal search, recommendations engines, or applications implementing the [Retrieval Augmented Generation (RAG) architecture](https://aka.ms/what-is-rag).
36
33
37
34
The following diagram shows the indexing and query workflows for vector search.
38
35
39
36
:::image type="content" source="media/vector-search-overview/vector-search-architecture-diagram-3.svg" alt-text="Architecture of vector search workflow." border="false" lightbox="media/vector-search-overview/vector-search-architecture-diagram-3-high-res.png":::
40
37
41
-
On the indexing side, Azure AI Search takes vector embeddings and uses a [nearest neighbors algorithm](vector-search-ranking.md) to co-locate similar vectors together in the search index (vectors about popular movies are closer than vectors about popular dog breeds).
38
+
On the indexing side, Azure AI Search takes vector embeddings and uses a [nearest neighbors algorithm](vector-search-ranking.md) to place similar vectors close together in an index.
42
39
43
-
How you get embeddings from your source content depends on your approach and whether you can use preview features. You can vectorize or generate embeddings using models from OpenAI, Azure OpenAI, and any number of providers, over a wide range of source content including text, images, and other content types supported by the models. You can then push pre-vectorized content to [vector fields](vector-search-how-to-create-index.md)in a search index. That's the generally available approach. If you can use preview features, Azure AI Search provides[integrated data chunking and vectorization](vector-search-integrated-vectorization.md) in an indexer pipeline. You still provide the resources (endpoints and connection information), but Azure AI Search makes all of the calls and handles the transitions.
40
+
How you get embeddings from your source content depends on your approach and whether you can use preview features. You can vectorize or generate embeddings using models from OpenAI, Azure OpenAI, and any number of providers, over a wide range of source content including text, images, and other content types supported by the models. You can then push pre-vectorized content to [vector fields](vector-search-how-to-create-index.md)to a vector store. That's the generally available approach. If you can use preview features, Azure AI Search offers[integrated data chunking and vectorization](vector-search-integrated-vectorization.md) in an indexer pipeline. You still provide the resources (endpoints and connection information to Azure OpenAI), but Azure AI Search makes all of the calls and handles the transitions.
44
41
45
-
On the query side, in your client application, collect the query input from a user. You can then add an encoding step that converts the input into a vector, and then send the vector query to your index on Azure AI Search for a similarity search. As with indexing, you can deploy the [integrated vectorization (preview)](vector-search-integrated-vectorization.md) to convert text inputs to a vector. For either approach, Azure AI Search returns documents with the requested `k` nearest neighbors (kNN) in the results.
42
+
On the query side, in your client application, you collect the query input from a user, usually through a prompt workflow. You can then add an encoding step that converts the input into a vector, and then send the vector query to your index on Azure AI Search for a similarity search. As with indexing, you can deploy the [integrated vectorization (preview)](vector-search-integrated-vectorization.md) to convert the question into a vector. For either approach, Azure AI Search returns documents with the requested `k` nearest neighbors (kNN) in the results.
46
43
47
-
Azure AI Search supports [hybrid scenarios](hybrid-search-overview.md). You can index vector data as fields in documents alongside alphanumeric content. Vector queries can be issued singly or in combination with filters and other query types, including term queries and semantic ranking in the same search request.
44
+
Azure AI Search supports [hybrid scenarios](hybrid-search-overview.md) that run vector and keyword search in parallel, returning a unified result set that often provides better results than just vector or keyword search alone. For hybrid, vector and non-vector content is ingested into the same index, for queries that run side by side.
48
45
49
46
## Availability and pricing
50
47
51
48
Vector search is available as part of all Azure AI Search tiers in all regions at no extra charge.
52
49
53
50
Newer services created after July 1, 2023 support [higher quotas for vector indexes](vector-search-index-size.md).
54
51
52
+
Vector search is available in:
53
+
54
+
+ Azure portal using the [Import and vectorize data wizard](search-get-started-portal-import-vectors.md)
55
+
+ Azure OpenAI Studio, see this [Quickstart](search-get-started-retrieval-augmented-generation.md)
+ Azure SDKs for [.NET](https://www.nuget.org/packages/Azure.Search.Documents), [Python](https://pypi.org/project/azure-search-documents), and [JavaScript](https://www.npmjs.com/package/@azure/search-documents/v/12.0.0-beta.2)
59
+
55
60
> [!NOTE]
56
61
> Some older search services created before January 1, 2019 are deployed on infrastructure that doesn't support vector workloads. If you try to add a vector field to a schema and get an error, it's a result of outdated services. In this situation, you must create a new search service to try out the vector feature.
57
62
@@ -115,13 +120,13 @@ For example, documents that talk about different species of dogs would be cluste
115
120
116
121
### Nearest neighbors search
117
122
118
-
In vector search, the search engine searches through the vectors within the embedding space to identify those that are near to the query vector. This technique is called [*nearest neighbor search*](https://en.wikipedia.org/wiki/Nearest_neighbor_search). Nearest neighbors help quantify the similarity between items. A high degree of vector similarity indicates that the original data was similar too. To facilitate fast nearest neighbor search, the search engine will perform optimizations or employ data structures or data partitioning to reduce the search space. Each vector search algorithm will have different approaches to this problem, trading off different characteristics such as latency, throughput, recall, and memory. To compute similarity, similarity metrics provide the mechanism for computing this distance.
123
+
In vector search, the search engine searches through the vectors within the embedding space to identify those that are near to the query vector. This technique is called [*nearest neighbor search*](https://en.wikipedia.org/wiki/Nearest_neighbor_search). Nearest neighbors help quantify the similarity between items. A high degree of vector similarity indicates that the original data was similar too. To facilitate fast nearest neighbor search, the search engine performs optimizations or employ data structures or data partitioning to reduce the search space. Each vector search algorithm has different approaches to this problem, trading off different characteristics such as latency, throughput, recall, and memory. To compute similarity, similarity metrics provide the mechanism for computing this distance.
119
124
120
125
Azure AI Search currently supports the following algorithms:
121
126
122
127
+ Hierarchical Navigable Small World (HNSW): HNSW is a leading ANN algorithm optimized for high-recall, low-latency applications where data distribution is unknown or can change frequently. It organizes high-dimensional data points into a hierarchical graph structure that enables fast and scalable similarity search while allowing a tunable a trade-off between search accuracy and computational cost. Because the algorithm requires all data points to reside in memory for fast random access, this algorithm consumes [vector storage](vector-search-index-size.md) quota.
123
128
124
-
+ Exhaustive K-nearest neighbors (KNN): Calculates the distances between the query vector and all data points. It's computationally intensive, so it works best for smaller datasets. Because the algorithm doesn't require fast random access of data points, this algorithm doesn't consume vector storage quota. However, this algorithm will provide the global set of nearest neighbors.
129
+
+ Exhaustive K-nearest neighbors (KNN): Calculates the distances between the query vector and all data points. It's computationally intensive, so it works best for smaller datasets. Because the algorithm doesn't require fast random access of data points, this algorithm doesn't consume vector storage quota. However, this algorithm provides the global set of nearest neighbors.
125
130
126
131
Within an index definition, you can specify one or more algorithms, and then for each vector field specify which algorithm to use:
127
132
@@ -131,7 +136,7 @@ Within an index definition, you can specify one or more algorithms, and then for
131
136
132
137
Algorithm parameters that are used to initialize the index during index creation are immutable and can't be changed after the index is built. However, parameters that affect the query-time characteristics (`efSearch`) can be modified.
133
138
134
-
In addition, fields that specify HNSW algorithm also support exhaustive KNN search using the [query request](vector-search-how-to-query.md) parameter `"exhaustive": true`. The opposite isn't true however. If a field is indexed for `exhaustiveKnn`, you can't use HNSW in the query because the additional data structures that enable efficient search don’t exist.
139
+
In addition, fields that specify HNSW algorithm also support exhaustive KNN search using the [query request](vector-search-how-to-query.md) parameter `"exhaustive": true`. The opposite isn't true however. If a field is indexed for `exhaustiveKnn`, you can't use HNSW in the query because the extra data structures that enable efficient search don’t exist.
0 commit comments