Skip to content

Commit 1d84a76

Browse files
committed
Add vector FAQ
1 parent 1ab12ea commit 1d84a76

File tree

3 files changed

+133
-9
lines changed

3 files changed

+133
-9
lines changed
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
---
2+
title: Vector & Embeddings Frequently Asked Questions (FAQ)
3+
description: Answers to common questions about vector search and vector indexes in SQL Server.
4+
author: yorek
5+
ms.author: damauri
6+
ms.reviewer: damauri
7+
ms.date: 07/17/2025
8+
ms.service: sql
9+
ms.topic: language-reference
10+
ms.collection:
11+
- ce-skilling-ai-copilot
12+
ms.custom:
13+
- intro-quickstart
14+
helpviewer_keywords:
15+
- "Vectors"
16+
- "Vectors, built-in support"
17+
monikerRange: "=sql-server-ver17 || =sql-server-linux-ver17 || =azuresqldb-current || =azuresqldb-mi-current || =fabric"
18+
---
19+
20+
# Vector and embeddings: Frequently asked questions (FAQ)
21+
22+
[!INCLUDE [sqlserver2025-asdb-asmi-fabricsqldb](../../includes/applies-to-version/sqlserver2025-asdb-asmi-fabricsqldb.md)]
23+
24+
> [!NOTE]
25+
> Vector features are available in Azure SQL Managed Instance configured with the [Always-up-to-date](/azure/azure-sql/managed-instance/update-policy#always-up-to-date-update-policy) policy.
26+
27+
## How do I keep embedding up to date?
28+
29+
Update embeddings every time the underlying data that they represent changes. This is especially important for scenarios where the data is dynamic, such as user-generated content or frequently updated databases. To find out more about several strategies to keep embeddings up to date, see [Database and AI: solutions for keeping embeddings updated](https://devblogs.microsoft.com/azure-sql/database-and-ai-solutions-for-keeping-embeddings-updated/).
30+
31+
## What is the overhead storage and processing for vector search?
32+
33+
The overhead for vector search primarily involves the storage of the vector data type and the computational resources required for indexing and searching. The `VECTOR` data type is designed to be efficient in terms of storage, but the exact overhead can vary based on the size - the number of dimensions - of the vectors stored.
34+
35+
For more information about how to choose the right vector size, review [Embedding models and dimensions: optimizing the performance-resource usage ratio](https://devblogs.microsoft.com/azure-sql/embedding-models-and-dimensions-optimizing-the-performance-resource-usage-ratio/).
36+
37+
A SQL Server data page can hold up to 8,060 bytes, so the size of the vector affects how many vectors can be stored in a single page. For example, if you have a vector with 1,024 dimensions, and each dimension is a **float** (4 bytes), the total size of the vector would be 4,104 bytes (4096 bytes payload + 8 bytes header). This limits the number of vectors that can fit in a single page to one.
38+
39+
## What embedding model should I use, and when?
40+
41+
There are many embedding models available, and the choice of which one to use depends on the specific use case and the type of data being processed. Some models support multiple languages, while others support multimodal data (text, images, etc.). Some are available only online, others can be run locally.
42+
43+
In addition to the model itself, consider the size of the model and the number of dimensions it produces. Larger models may provide better accuracy but require more computational resources and storage space, but in many cases having more dimension doesn't really change the quality that much, for common use cases.
44+
45+
For more information about how to choose the right embedding model, see [Embedding models and dimensions: optimizing the performance-resource usage ratio](https://devblogs.microsoft.com/azure-sql/embedding-models-and-dimensions-optimizing-the-performance-resource-usage-ratio/).
46+
47+
## What about sparse vectors?
48+
49+
At this time, the **vector** data type in SQL Server is designed for dense vectors, which are arrays of floating-point numbers where most of the elements are non-zero. Sparse vectors, which contain a significant number of zero elements, aren't natively supported.
50+
51+
## What are some performance benchmarks for SQL vector search?
52+
53+
Performance can vary widely based on the specific use case, the size of the dataset, and the complexity of the queries. However, SQL Server's vector search capabilities are designed to be efficient and scalable, leveraging indexing techniques to optimize search performance.
54+
55+
## What if I have more than one column that I would like to use for generating embeddings?
56+
57+
If you have multiple columns that you want to use for generating embeddings, you have two main options:
58+
59+
- Create one embedding for each column, or
60+
- Concatenate the values of multiple columns into a single string and then generate a single embedding for that concatenated string.
61+
62+
For more information about the two options and the related database design considerations, see [Efficiently and Elegantly Modeling Embeddings in Azure SQL and SQL Server](https://devblogs.microsoft.com/azure-sql/efficiently-and-elegantly-modeling-embeddings-in-azure-sql-and-sql-server/).
63+
64+
## What about re-ranking?
65+
66+
Re-ranking is a technique used to improve the relevance of search results by re-evaluating the initial results based on additional criteria or models. In SQL Server, you can implement re-ranking by combining vector search with full-text (which provides BM25 ranking) or additional SQL queries or machine learning models to refine the results based on specific business logic or user preferences.
67+
68+
For more information, review [Enhancing Search Capabilities in SQL Server and Azure SQL with Hybrid Search and RRF Re-Ranking](https://devblogs.microsoft.com/azure-sql/enhancing-search-capabilities-in-sql-server-and-azure-sql-with-hybrid-search-and-rrf-re-ranking/).
69+
70+
## When to use AI Search (now AI Foundry) vs using SQL for vectors search scenarios?
71+
72+
AI Search (now AI Foundry) is a specialized service designed for advanced search scenarios, including vector search, natural language processing, and AI-driven insights. It provides a comprehensive set of features for building intelligent search applications, such as built-in support for various AI models, advanced ranking algorithms, and integration with other AI services.
73+
74+
Azure SQL and SQL Server provide the ability to store any kind of data and run any kind of query: structured and unstructured, and to perform vector search on that data. It is a good choice for scenarios where you need to do search across all these data together, and you don't want to use a separate service for search that would complicate your architecture. Azure SQL and SQL Server offer critical enterprise security features to make sure data is always protected, such as row-level security (RLS), dynamic data masking (DDM), Always Encrypted, immutable ledger tables, and transparent data encryption (TDE).
75+
76+
Here's an example of a single query that can be run in Azure SQL or SQL Server that combines vector, geospatial, structured and unstructured data all at once. The sample query retrieves the top 50 most relevant restaurants based on the description of the restaurant, the location of the restaurant, and the user's preferences, using vector search for the description and geospatial search for the location, filtering also by star numbers, number of reviews, category and so on:
77+
78+
```sql
79+
DECLARE @p GEOGRAPHY = GEOGRAPHY::Point(47.6694141, - 122.1238767, 4326);
80+
DECLARE @e VECTOR (1536) = AI_GENERATE_EMBEDDINGS('I want to eat a good focaccia' USE model Text3Embedding);
81+
82+
SELECT TOP (50)
83+
b.id AS business_id,
84+
b.name AS business_name,
85+
r.id AS review_id,
86+
r.stars,
87+
r.review,
88+
VECTOR_DISTANCE('cosine', re.embedding, @e) AS semantic_distance,
89+
@p.STDistance(geo_location) AS geo_distance
90+
FROM dbo.reviews r
91+
INNER JOIN dbo.reviews_embeddings re
92+
ON r.id = re.review_id
93+
INNER JOIN dbo.business b
94+
ON r.business_id = b.id
95+
WHERE b.city = 'Redmond'
96+
AND @p.STDistance(b.geo_location) < 5000 -- 5 km
97+
AND r.stars >= 4
98+
AND b.reviews >= 30
99+
AND JSON_VALUE(b.custom_attributes, '$.local_recommended') = 'true'
100+
AND VECTOR_DISTANCE('cosine', re.embedding, @e) < 0.2
101+
ORDER BY semantic_distance DESC;
102+
```
103+
104+
In the previous sample, Exact Nearest Neighbor (ENN) search is used to find the most relevant reviews based on the semantic distance of the embeddings, while also filtering by geospatial distance and other business attributes. This query demonstrates the power of combining vector search with traditional SQL capabilities to create a rich and efficient search experience.
105+
106+
If you want to use Approximate Nearest Neighbor (ANN) search, you can create a vector index on the `reviews_embeddings` table and use the `VECTOR_SEARCH` function to perform the search.
107+
108+
## Where can I find a self-paced lab to learn more about embeddings and vector search?
109+
110+
Review the self-paced [Azure SQL Cryptozoology AI Embeddings](https://devblogs.microsoft.com/azure-sql/azure-sql-cryptozoology-ai-embeddings-lab-now-available/) lab.
111+
112+
## Related content
113+
114+
- [Vector data type](../../t-sql/data-types/vector-data-type.md)
115+
- [Vector functions](../../t-sql/functions/vector-functions-transact-sql.md)
116+
- [VECTOR_DISTANCE (Transact-SQL)](../../t-sql/functions/vector-distance-transact-sql.md)
117+
- [VECTOR_SEARCH (Transact-SQL)](../../t-sql/functions/vector-search-transact-sql.md)
118+
- [CREATE VECTOR INDEX (Transact-SQL)](../../t-sql/statements/create-vector-index-transact-sql.md)
119+
- [Azure SQL Database Vector Search Samples](https://github.com/Azure-Samples/azure-sql-db-vector-search)
120+
- [Intelligent applications with Azure SQL Database](/azure/azure-sql/database/ai-artificial-intelligence-intelligent-applications)

docs/relational-databases/vectors/vectors-sql-server.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,13 @@ description: How to create, manage, and search vectors in the SQL Database Engin
44
author: WilliamDAssafMSFT
55
ms.author: wiassaf
66
ms.reviewer: damauri, pookam, jovanpop, randolphwest
7-
ms.date: 05/06/2025
7+
ms.date: 07/17/2025
88
ms.service: sql
99
ms.topic: language-reference
1010
ms.collection:
1111
- ce-skilling-ai-copilot
1212
ms.custom:
1313
- intro-quickstart
14-
- build-2025
1514
helpviewer_keywords:
1615
- "Vectors"
1716
- "Vectors, built-in support"
@@ -22,12 +21,13 @@ monikerRange: "=sql-server-ver17 || =sql-server-linux-ver17 || =azuresqldb-curre
2221

2322
[!INCLUDE [sqlserver2025-asdb-asmi-fabricsqldb](../../includes/applies-to-version/sqlserver2025-asdb-asmi-fabricsqldb.md)]
2423

25-
Vectors are ordered arrays of numbers (typically floats) that can represent information about some data. For example, an image can be represented as a vector of pixel values, or a string of text can be represented as a vector of ASCII values. The process to turn data into a vector is called vectorization.
26-
2724
> [!NOTE]
28-
> - Vector support in preview and is subject to change. Make sure to read preview usage terms in [Service Level Agreements (SLA) for Online Services](https://www.microsoft.com/licensing/docs/view/Service-Level-Agreements-SLA-for-Online-Services).
2925
> - Vector features are available in Azure SQL Managed Instance configured with the [Always-up-to-date](/azure/azure-sql/managed-instance/update-policy#always-up-to-date-update-policy) policy.
3026
27+
## Vectors
28+
29+
Vectors are ordered arrays of numbers (typically floats) that can represent information about some data. For example, an image can be represented as a vector of pixel values, or a string of text can be represented as a vector of ASCII values. The process to turn data into a vector is called vectorization. The **[vector](../../t-sql/data-types/vector-data-type.md)** data type in SQL Server is designed to store these arrays of numbers efficiently.
30+
3131
## Embeddings
3232

3333
Embeddings are vectors that represent important features of data. Embeddings are often learned by using a deep learning model, and machine learning and AI models utilize them as features. Embeddings can also capture semantic similarity between similar concepts. For example, in generating an embedding for the words `person` and `human`, we would expect their embeddings (vector representation) to be similar in value since the words are also semantically similar.

docs/toc.yml

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2217,8 +2217,12 @@ items:
22172217
href: relational-databases/xml/xml-system-stored-procedures.md
22182218
- name: Use the value() & nodes() Methods with OPENXML
22192219
href: relational-databases/xml/use-the-value-and-nodes-methods-with-openxml.md
2220-
- name: Vectors and Embeddings
2221-
href: relational-databases/vectors/vectors-sql-server.md
2220+
- name: Vectors and embeddings
2221+
items:
2222+
- name: Vectors and embeddings
2223+
href: relational-databases/vectors/vectors-sql-server.md
2224+
- name: Vectors and embeddings FAQ
2225+
href: relational-databases/vectors/vectors-faq.md
22222226
- name: Development
22232227
items:
22242228
- name: Code a client program >
@@ -9042,7 +9046,7 @@ items:
90429046
href: linux/sql-server-linux-editions-and-components-2019.md
90439047
- name: SQL Server 2017
90449048
href: linux/sql-server-linux-editions-and-components-2017.md
9045-
- name: FAQ
9049+
- name: Linux FAQ
90469050
href: linux/sql-server-linux-faq.yml
90479051
- name: Known issues
90489052
href: linux/sql-server-linux-known-issues.md
@@ -9469,7 +9473,7 @@ items:
94699473
href: sql-server/azure-arc/migration-inventory.md
94709474
- name: Migrate to Azure SQL Managed Instance
94719475
href: sql-server/azure-arc/migrate-to-azure-sql-managed-instance.md
9472-
- name: FAQ
9476+
- name: SQL Server enabled by Azure Arc FAQ
94739477
href: sql-server/azure-arc/faq.yml
94749478
- name: Data collection & reporting
94759479
href: sql-server/azure-arc/data-collection.md

0 commit comments

Comments
 (0)