Skip to content

Commit cebcd36

Browse files
committed
clean up - removing commented out content
1 parent fe57572 commit cebcd36

File tree

2 files changed

+9
-44
lines changed

2 files changed

+9
-44
lines changed

articles/search/vector-search-how-to-quantization.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ The generalized process for rescoring is:
8383
1. Oversampled k candidates are rescored using either the uncompressed original vectors for scalar quantization, or the dot product of binary quantization.
8484
1. After rescoring, results are adjusted so that more relevant matches appear first.
8585

86-
Oversampling for scalar quantized vectors requires the availability of the original full precision vectors. Oversampling for binary quantized vectors can use either full precision vectors (`preserveOriginals`) or the dot product of the binary vector (`discardOriginals`). If you're optimizing vector storage, make sure to keep the full precision vectors in the index for rescoring purposes. For more information, see [Eliminate optional vector instances from storage](vector-search-how-to-storage-options.md).
86+
Oversampling for scalar quantized vectors requires the availability of the original full precision vectors. Oversampling for binary quantized vectors can use either full precision vectors (`preserveOriginals`) or the dot product of the binary vector (`discardOriginals`). If you're optimizing vector storage, make sure to keep the full precision vectors in the index if you need them for rescoring purposes. For more information, see [Eliminate optional vector instances from storage](vector-search-how-to-storage-options.md).
8787

8888
## Add "compressions" to a search index
8989

articles/search/vector-search-how-to-storage-options.md

Lines changed: 8 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Azure AI Search stores multiple copies of vector fields that are used in specifi
1818

1919
Use cases where an extra copy is used include:
2020

21-
- Returning raw vectors in a query response, which you might want to do if you have downstream processes that consume vectors.
21+
- Returning raw vectors in a query response or supporting incremental updates to vector content.
2222
- Rescoring compressed (quantized) vectors as a query optimization technique.
2323

2424
Removing storage is irreversible and requires reindexing if you want it back.
@@ -31,42 +31,19 @@ Removing storage is irreversible and requires reindexing if you want it back.
3131

3232
| Instance | Usage | Required for search | How removed |
3333
|----------|-------|---------------------|-------------|
34+
| Vectors in the [HNSW graph for Approximate Nearest Neighbors (ANN) search](vector-search-overview.md) (HNSW graph) or vectors for exhaustive K-Nearest Neighbors (eKNN index) | Used for query execution. Consists of either full-precision vectors (when no compression is applied) or quantized vectors. | Essential | There are no parameters for removing this instance. |
3435
| Source vectors received during document indexing (JSON data) | Used for incremental data refresh with `merge` or `mergeOrUpload` indexing actions. Also used to return "retrievable" vectors in the query response. | No | Set `stored` property to false. |
3536
| Original full-precision vectors (binary data) <sup>1</sup> | For compressed vectors, it's used for `preserveOriginals` rescoring on an oversampled candidate set of results from ANN search. This applies to vector fields that undergo [scalar or binary quantization](vector-search-how-to-quantization.md), and it applies to queries using the HNSW graph. If you're using eKNN, all vectors are in scope for the query, so rescoring has no effect and thus not supported. | No | Set `rescoringOptions.rescoreStorageMethod` property to `discardOriginals` in `vectorSearch.compressions`. |
36-
| Vectors in the [HNSW graph for Approximate Nearest Neighbors (ANN) search](vector-search-overview.md) (HNSW graph) or vectors for exhaustive K-Nearest Neighbors (eKNN index) | Used for query execution. Consists of either full-precision vectors (when no compression is applied) or quantized vectors. | Essential | There are no parameters for removing this instance. |
3737

3838
<sup>1</sup> This copy is also for internal index operations and for exhaustive KNN search in older API versions, on indexes created using the 2023 APIs. On newer indexes, an eKNN-configured field consists of full-precision vectors so no extra copy is needed.
3939

40-
<!--
41-
Depending on when your index was created, for every vector field, there can be up to three copies of the vectors, each serving a different purpose.
42-
43-
If you created an index using the 2024-11-01-preview or later, you only have two copies of vector data. For older indexes, you might have up to three.
44-
45-
| Instance | Usage | Controlled using |
46-
|----------|-------|------------------|
47-
| Source vectors received during document indexing (JSON data) | Used for incremental data refresh with `merge` or `mergeOrUpload` indexing action. Also used to return "retrievable" vectors in the query response. | `stored` property on vector fields |
48-
| Original full-precision vectors (binary data) | Used for internal index operations and for exhaustive KNN search in older API versions. For compressed vectors, it's also used for `preserveOriginals` rescoring on an oversampled candidate set of results from ANN search. This applies to vector fields that undergo [scalar or binary quantization](vector-search-how-to-quantization.md). | `rescoringOptions.rescoreStorageMethod` property in `vectorSearch.compressions`. |
49-
| Vectors in the [HNSW graph for Approximate Nearest Neighbors (ANN) search](vector-search-overview.md) (HNSW graph) or vectors for exhaustive K-Nearest Neighbors (eKNN index) | Used for query execution. Consists of either full-precision vectors (when no compression is applied) or quantized vectors. | Essential. There are no parameters for removing this instance. |
50-
51-
You can set properties that permanently discard the first two instances (JSON data and binary data) from vector storage, but not the last instance.
52-
53-
To offset lossy compression for HNSW, you can keep the second instance (binary data) for rescoring purposes to improve ANN search quality. For eKNN, only scalar quantization is supported, and rescoring isn't an option. In newer API versions like the latest preview, the second instance isn't kept for eKNN because the third instance provides full-precision vectors in an eKNN index.
54-
55-
### Indexes created with 2024-11-01-preview or later API versions
56-
57-
For indexes created with the 2024-11-01-preview or a later API with uncompressed vector fields, the second and third instances (binary data and HNSW graph) are combined as part of our cost reduction investments, reducing overall storage. A newer generation index with consolidated vectors is functionally equivalent to older indexes, but uses less storage. Physical data structures are established on a Create Index request, so you must delete and recreate the index to realize the storage reductions.
58-
59-
If you choose [vector compression](vector-search-how-to-configure-compression-storage.md), AI Search compresses (quantizes) the in-memory portion of the vector index. Since memory is often a primary constraint for vector indexes, this practice allows you to store more vectors within the same search service. However, lossy compression equates to less information in the index, which can affect search quality.
60-
61-
To mitigate the loss in information, you can [enable "rescoring" and "oversampling" options](vector-search-how-to-quantization.md#supported-rescoring-techniques) to help maintain quality. The effect is retrieval of a larger set of candidate documents from the compressed index, with recomputation of similarity scores using the original vectors or the dot product. For rescoring to work, original vectors must be retained in storage for certain scenarios. As a result, while quantization reduces memory usage (vector index size usage), it slightly increases storage requirements since both compressed and original vectors are stored. The extra storage is approximately equal to the size of the compressed index. -->
62-
6340
## Remove source vectors (JSON data)
6441

6542
In a vector field definition, `stored` is a boolean property that determines whether storage is allocated for retrievable vector content obtained during indexing (the source instance). By default, `stored` is set to `true`. If you don't need raw vector content in a query response, changing `stored` to `false` can save up to 50% storage per field.
6643

6744
Considerations for setting `"stored": false`:
6845

69-
- Because vectors aren't human readable, you can omit them from results sent to LLMs in RAG scenarios or from results rendered on a search page. However, keep them if you're using vectors in a downstream process that consumes vector content.
46+
- Because vectors aren't human readable, you can generally omit them from results sent to LLMs in RAG scenarios or from results rendered on a search page. However, you should keep them if you're using vectors in a downstream process that consumes vector content.
7047

7148
- If your indexing strategy uses [partial document updates](search-howto-reindex.md#update-content), such as `merge` or `mergeOrUpload` on an existing document, setting `"stored": false` prevents content updates to those fields during the merge. You must include the entire vector field (and nonvector fields you're updating) in each reindexing operation. Otherwise, the vector data is lost without an error or warning. To avoid this risk altogether, set `"stored": true`.
7249

@@ -109,19 +86,15 @@ PUT https://[service-name].search.windows.net/indexes/demo-index?api-version=202
10986

11087
Original full-precision vectors are used in rescoring operations over compressed (quantized) vectors. The intent of rescoring is to mitigate the loss in information due to compression. The effect of rescoring is retrieval of a larger set of candidate documents from the compressed index, with recomputation of similarity scores using the original vectors or the dot product. For rescoring to work, original vectors must be retained in storage for certain scenarios. As a result, while quantization reduces memory usage (vector index size usage), it slightly increases storage requirements since both compressed and original vectors are stored. The extra storage is approximately equal to the size of the compressed index.
11188

112-
Rescoring of scalar quantized vectors requires retention of the original full-precision vectors.
113-
114-
Rescoring of binary quantized vectors can use original full-precision vectors, or the dot product of the binary embedding, which produces high quality search results, without having to reference full-precision vectors in the index.
89+
Rescoring requirements by quantization approach:
11590

116-
<!-- If you choose [vector compression](vector-search-how-to-configure-compression-storage.md), AI Search compresses (quantizes) the in-memory portion of the vector index. Since memory is often a primary constraint for vector indexes, this practice allows you to store more vectors within the same search service. However, lossy compression equates to less information in the index, which can affect search quality.
91+
- Rscoring of scalar quantized vectors requires retention of the original full-precision vectors.
11792

118-
To mitigate the loss in information, you can [enable "rescoring" and "oversampling" options](vector-search-how-to-quantization.md#supported-rescoring-techniques) to help maintain quality. The effect is retrieval of a larger set of candidate documents from the compressed index, with recomputation of similarity scores using the original vectors or the dot product. For rescoring to work, original vectors must be retained in storage for certain scenarios. As a result, while quantization reduces memory usage (vector index size usage), it slightly increases storage requirements since both compressed and original vectors are stored. The extra storage is approximately equal to the size of the compressed index. -->
93+
- Rescoring of binary quantized vectors can use original full-precision vectors, or the dot product of the binary embedding, which produces high quality search results, without having to reference full-precision vectors in the index.
11994

120-
<!-- When you compress vectors using either scalar or binary quantization, query execution is over the quantized vectors. In this case, you only need the original full-precision vectors (binary data) if you want to rescore. For rescoring, ehe effect is retrieval of a larger set of candidate documents from the compressed index, with recomputation of similarity scores using the original vectors or the dot product. For rescoring to work, original vectors must be retained in storage for certain scenarios. As a result, while quantization reduces memory usage (vector index size usage), it slightly increases storage requirements since both compressed and original vectors are stored. The extra storage is approximately equal to the size of the compressed index.
95+
The `rescoreStorageMethod` property controls whether full-precision vectors are stored.
12196

122-
If you use newer APIs *and* binary quantization, you can safely discard full-precision vectors because rescoring strategies now use the dot product of a binary embedding, which produces high quality search results, without having to reference full-precision vectors in the index. -->
123-
124-
The `rescoreStorageMethod` property controls whether full-precision vectors are stored. The guidance for whether to retain full-precision vectors is:
97+
Recommendations:
12598

12699
- For scalar quantization, preserve original full-precision vectors in the index because they're required for rescore.
127100

@@ -130,14 +103,6 @@ The `rescoreStorageMethod` property controls whether full-precision vectors are
130103
> [!NOTE]
131104
> Vector storage strategies have been evolving over the last several releases. Index creation date and API version determine your storage options. For example, in the 2024-11-01-preview, if you set discardOriginals to remove full-precision vectors, there was no rescoring for binary quantization because the dot product approach wasn't available. We recommend using the latest APIs for the best mitigation options.
132105
133-
<!-- | API version | Applies to | Remove full-precision vectors |
134-
|--|--|--|
135-
| 2024-07-01 and earlier | Not applicable. | There's no mechanism for removing full-precision vectors. |
136-
| 2024-11-01-preview | Binary embeddings | Use `rescoreStorageMethod.discardOriginals` to remove full-precision vectors, but doing so prevents rescoring. `enableRescoring` must be false if originals are gone.|
137-
| 2025-03-01-preview | Binary embeddings | Use `rescoreStorageMethod.discardOriginals` to remove full-precision vectors in the index while still retaining rescore options. In this preview, rescoring is possible because the technique changed. The dot product of the binary embeddings is used on the rescore, producing high quality search results equivalent to or better than earlier techniques based on full-precision vectors. |
138-
139-
Notice that scalar isn't listed in the table. If you use scalar quantization, you must retain original full-precision vectors if you want to rescore. -->
140-
141106
In `vectorSearch.compressions`, the `rescoreStorageMethod` property is set to `preserveOriginals` by default, which retains full-precision vectors for [oversampling and rescoring capabilities](vector-search-how-to-quantization.md#add-compressions-to-a-search-index) to reduce the effect of lossy compression on the HNSW graph. If you don't need rescoring, of if you used binary quantization and the dot product for rescoring, you can reduce vector storage by setting `rescoreStorageMethod` to `discardOriginals`.
142107

143108
> [!IMPORTANT]

0 commit comments

Comments
 (0)