You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The SQL Database Engine provides the ability to store any kind of data and run any kind of query: structured and unstructured, and to perform vector search on that data. It is a good choice for scenarios where you need to do search across all these data together, and you don't want to use a separate service for search that would complicate your architecture.
24
+
The SQL Database Engine provides the ability to store any kind of data and run any kind of query: structured and unstructured, and to perform vector search on that data. It's a good choice for scenarios where you need to search across all these data together, and you don't want to use a separate service for search that would complicate your architecture.
25
25
26
26
> [!NOTE]
27
-
> - Vector support in preview and is subject to change. Make sure to read preview usage terms in [Service Level Agreements (SLA) for Online Services](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services).
28
-
29
-
Vector features are available in Azure SQL Managed Instance configured with the [Always-up-to-date](/azure/azure-sql/managed-instance/update-policy#always-up-to-date-update-policy) policy.
27
+
> - Vector support is currently in preview and subject to change. Be sure to read preview usage terms in [Service Level Agreements (SLA) for Online Services](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services).
28
+
> - Vector features are available in Azure SQL Managed Instance configured with the [Always-up-to-date](/azure/azure-sql/managed-instance/update-policy#always-up-to-date-update-policy) update policy.
30
29
31
30
## Vectors
32
31
@@ -71,13 +70,13 @@ SELECT
71
70
CAST(@v AS JSON) AS j
72
71
```
73
72
74
-
### Exact Search and Vector Distance (Exact Nearest Neighbors)
73
+
### Exact search and vector distance (exact nearest neighbors)
75
74
76
-
Exact Search, also known as k-Nearest Neighbor (k-NN) search, involves calculating the distance between a given vector and all other vectors in a dataset, sorting the results, and selecting the closest neighbors based on a specified distance metric. This method guarantees precise retrieval of the nearest neighbors but can be computationally intensive, especially for large datasets.
75
+
Exact search, also known as k-nearest neighbor (k-NN) search, involves calculating the distance between a given vector and all other vectors in a dataset, sorting the results, and selecting the closest neighbors based on a specified distance metric. This method guarantees precise retrieval of the nearest neighbors but can be computationally intensive, especially for large datasets.
77
76
78
-
Vector Distance functions are used to measure the closeness between vectors. Common distance metrics include Euclidean distance, cosine similarity, and dot product. These functions are essential for performing k-NN searches and ensuring accurate results.
77
+
Vector distance functions are used to measure the closeness between vectors. Common distance metrics include Euclidean distance, cosine similarity, and dot product. These functions are essential for performing k-NN searches and ensuring accurate results.
79
78
80
-
Exact Nearest Neighbor (ENN) Vector Search performs an exhaustive distance calculation across all indexed vectors to guarantee the retrieval of the closest neighbors based on a specified distance metric. This method is precise but resource-intensive, making it suitable for smaller datasets or scenarios where accuracy is paramount.
79
+
Exact nearest neighbor (ENN) vector search performs an exhaustive distance calculation across all indexed vectors to guarantee the retrieval of the closest neighbors based on a specified distance metric. This method is precise but resource-intensive, making it suitable for smaller datasets or scenarios where accuracy is paramount.
81
80
82
81
In the SQL Database Engine, k-NN searches can be performed using the [VECTOR_DISTANCE](../../t-sql/functions/vector-distance-transact-sql.md) function, which allows for efficient calculation of distances between vectors and facilitates the retrieval of the nearest neighbors.
83
82
@@ -92,23 +91,26 @@ ORDER BY distance
92
91
93
92
Using an exact search is recommended when you don't have many vectors to search on (less than 50,000 vectors as a general recommendation). The table can contain many more vectors as long as your search predicates reduce the number of vectors to use for neighbor search to 50,000 or fewer.
94
93
95
-
### Approximate Vector Index and Vector Search (Approximate Nearest Neighbors)
94
+
### Approximate vector index and vector search (approximate nearest neighbors)
95
+
96
+
> [!NOTE]
97
+
> Approximate vector index and vector search are in preview and currently only available in [!INCLUDE [sssql25-md](../../includes/sssql25-md.md)].
96
98
97
-
Identifying all vectors close to a given query vector requires substantial resources to calculate the distance between the query vector and the vectors stored in the table. Searching for all vectors close to a given query vector involves a complete scan of the table and significant CPU usage. This is called a "K-Nearest Neighbors" or "KNN" query and returns the "k" closest vectors.
99
+
Identifying all vectors close to a given query vector requires substantial resources to calculate the distance between the query vector and the vectors stored in the table. Searching for all vectors close to a given query vector involves a complete scan of the table and significant CPU usage. This is called a "K-nearest neighbors" or "k-NN" query and returns the "k" closest vectors.
98
100
99
101
Vectors are used to find similar data for AI models to answer user queries. This involves querying the database for the "k" vectors nearest to the query vector using distance metrics like dot (inner) product, cosine similarity, or Euclidean distance.
100
102
101
-
KNN queries often struggle with scalability, making it acceptable in many cases to trade off some accuracy, particularly recall, for significant speed gains. This method is known as Approximate Nearest Neighbors (ANN).
103
+
K-NN queries often struggle with scalability, making it acceptable in many cases to trade off some accuracy, particularly recall, for significant speed gains. This method is known as approximate nearest neighbors (ANN).
102
104
103
105
Recall is an important concept that should become familiar to everyone using or planning to use vectors and embeddings. In fact, recall measures the proportion of the approximate nearest neighbors that are identified by the algorithm, compared to the exact nearest neighbors that an exhaustive search would return. Therefore, it is a good measurement of the quality of the approximation that the algorithm is doing. A perfect recall, which is equivalent to no approximation, is 1.
104
106
105
-
For AI applications, the trade-off is quite reasonable. Since vector embeddings already approximate concepts, using ANN doesn't significantly affect the results, provided the recall is close to 1. This ensures that the returned results are very similar to those from KNN, while offering vastly improved performance and significantly reduced resource usage, which is highly beneficial for operational databases.
107
+
For AI applications, the trade-off is quite reasonable. Since vector embeddings already approximate concepts, using ANN doesn't significantly affect the results, provided the recall is close to 1. This ensures that the returned results are very similar to those from k-NN, while offering vastly improved performance and significantly reduced resource usage, which is highly beneficial for operational databases.
106
108
107
109
It is important to understand that the term "index" when used referring to a [vector index](../../t-sql/statements/create-vector-index-transact-sql.md) has a different meaning than the index you are used to working with in relational databases. In fact, a vector index returns approximate results.
108
110
109
-
In MSSQL engine, vector indexes are based on the [DiskANN](https://www.microsoft.com/en-us/research/publication/diskann-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node) algorithm. DiskANN relies on creating a graph to navigate quickly through all the indexed vectors to find the closest match to a given vector. DiskANN is a graph-based system for indexing and searching large sets of vector data using limited computational resources. It efficiently uses SSDs and minimal memory to handle significantly more data than in-memory indices, while maintaining high queries per second (QPS) and low latency, ensuring a balance between memory, CPU and I/O usage and search performance.
111
+
In the SQL Database engine, vector indexes are based on the [DiskANN](https://www.microsoft.com/en-us/research/publication/diskann-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node) algorithm. DiskANN relies on creating a graph to navigate quickly through all the indexed vectors to find the closest match to a given vector. DiskANN is a graph-based system for indexing and searching large sets of vector data using limited computational resources. It efficiently uses SSDs and minimal memory to handle significantly more data than in-memory indices, while maintaining high queries per second (QPS) and low latency, ensuring a balance between memory, CPU and I/O usage and search performance.
110
112
111
-
An Approximate Nearest Neighbors algorithm search can be done first creating a vector index using the [CREATE VECTOR INDEX](../../t-sql/statements/create-vector-index-transact-sql.md) T-SQL command and then using [VECTOR_SEARCH](../../t-sql/functions/vector-search-transact-sql.md) T-SQL function to run the approximate search.
113
+
An approximate nearest neighbors algorithm search can be done first creating a vector index using the [CREATE VECTOR INDEX](../../t-sql/statements/create-vector-index-transact-sql.md) T-SQL command and then using [VECTOR_SEARCH](../../t-sql/functions/vector-search-transact-sql.md) T-SQL function to run the approximate search.
112
114
113
115
```sql
114
116
DECLARE @qv VECTOR(1536) = AI_GENERATE_EMBEDDING(N'Pink Floyd music style' USE MODEL Ada2Embeddings);
0 commit comments