Fixed broken links and warnings

HeidiSteen · HeidiSteen · commit 8d0104cadc00 · 2024-11-04T12:11:21.000-08:00
diff --git a/articles/search/vector-search-how-to-configure-compression-storage.md b/articles/search/vector-search-how-to-configure-compression-storage.md
@@ -21,11 +21,9 @@ This article enumerates all of optimization techniques in Azure AI Search that c
 
 Vector optimization settings are specified in vector field definitions in a search index. Most of the features described in this article are generally available in [2024-07-01 REST API](/rest/api/searchservice/operation-groups?view=rest-searchservice-2024-07-01&preserve-view=true) and in the Azure SDK packages targeting that version. The [latest preview version](/rest/api/searchservice/operation-groups?view=rest-searchservice-2024-09-01-preview&preserve-view=true) adds support for truncated dimensions if you're using text-embedding-3-large or text-embedding-3-small for vectorization.
 
-[An example](#example-vector-compression-techniques) at the end of this article shows the variations in vector size for each of the approaches described in this article.
-
 ## Evaluate the options
 
-Review the approaches in Azure AI Search for reducing the amount of storage used by vector fields. These approaches aren't mutually exclusive and can be combined for [maximum reduction in vector size](#example-vector-compression-techniques).
+Review the approaches in Azure AI Search for reducing the amount of storage used by vector fields. These approaches aren't mutually exclusive and can be combined for [maximum reduction in vector size](vector-search-how-to-quantization.md#example-vector-compression-techniques).
 
 We recommend built-in quantization because it compresses vector size in memory *and* on disk with minimal effort, and that tends to provide the most benefit in most scenarios. In contrast, narrow types (except for float16) require a special effort into making them, and `stored` saves on disk storage, which isn't as expensive as memory.
 
@@ -385,7 +383,7 @@ The following example shows the fields collection of a search index. Set `stored
 
 - Defaults are `stored` set to true and `retrievable` set to false. In a default configuration, a retrievable copy is stored, but it's not automatically returned in results. When `stored` is true, you can toggle `retrievable` between true and false at any time without having to rebuild an index. When `stored` is false, `retrievable` must be false and can't be changed. -->
 
-## Example: vector compression techniques
+<!-- ## Example: vector compression techniques
 
 Here's Python code that demonstrates quantization, narrow data types, and use of the stored property: [Code sample: Vector quantization and storage options using Python](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/vector-quantization-and-storage/README.md).
 
@@ -420,7 +418,7 @@ Search APIs report storage and vector size at the index level, so indexes and no
 
 Query syntax for a compressed or quantized vector field is the same as for noncompressed vector fields, unless you want to override parameters associated with oversampling or reranking with original vectors.
 
-Recall that the [vector compression definition](#add-compressions-to-a-search-index) in the index has settings for `rerankWithOriginalVectors` and `defaultOversampling` to mitigate the effects of a smaller vector index. You can override the default values to vary the behavior at query time. For example, if `defaultOversampling` is 10.0, you can change it to something else in the query request.
+Recall that the [vector compression definition](vector-search-how-to-quantization.md) in the index has settings for `rerankWithOriginalVectors` and `defaultOversampling` to mitigate the effects of a smaller vector index. You can override the default values to vary the behavior at query time. For example, if `defaultOversampling` is 10.0, you can change it to something else in the query request.
 
 You can set the oversampling parameter even if the index doesn't explicitly have a `rerankWithOriginalVectors` or `defaultOversampling` definition. Providing `oversampling` at query time overrides the index settings for that query and executes the query with an effective `rerankWithOriginalVectors` as true.
 
@@ -446,7 +444,7 @@ POST https://[service-name].search.windows.net/indexes/demo-index/docs/search?ap
 
 - Applies to vector fields that undergo vector compression, per the vector profile assignment.
 
-- Overrides the `defaultOversampling` value or introduces oversampling at query time, even if the index's compression configuration didn't specify oversampling or reranking options.
+- Overrides the `defaultOversampling` value or introduces oversampling at query time, even if the index's compression configuration didn't specify oversampling or reranking options. -->
 
 ## See also
 
diff --git a/articles/search/vector-search-how-to-quantization.md b/articles/search/vector-search-how-to-quantization.md
@@ -10,7 +10,7 @@ ms.topic: how-to
 ms.date: 11/04/2024
 ---
 
-## Use scalar or binary quantization to compress vector size
+# Use scalar or binary quantization to compress vector size
 
 Quantization is recommended for reducing vector size because it lowers both memory and disk storage requirements for float16 and float32 embeddings. To offset the effects of a smaller index, you can add oversampling and reranking over uncompressed vectors.
 
@@ -32,9 +32,9 @@ To use built-in quantization, follow these steps:
 > - Create a new vector profile that uses the named configuration
 > - Create a new vector field having the new vector profile
 > - Load the index with float32 or float16 data that's quantized during indexing with the configuration you defined
-> - Optionally, [query quantized data](#query-a-quantized-vector-field-using-oversampling) using the oversampling parameter if you want to override the default
+> - Optionally, [query quantized data](#) using the oversampling parameter if you want to override the default
 
-### Add "compressions" to a search index
+## Add "compressions" to a search index
 
 The following example shows a partial index definition with a fields collection that includes a vector field, and a `vectorSearch.compressions` section.
 
@@ -84,7 +84,7 @@ POST https://[servicename].search.windows.net/indexes?api-version=2024-07-01
 
 - `quantizedDataType` is optional and applies to scalar quantization only. If you add it, it must be set to `int8`. This is the only primitive data type supported for scalar quantization at this time. Default is `int8`.
 
-### Add the HNSW algorithm
+## Add the HNSW algorithm
 
 Make sure your index has the Hierarchical Navigable Small Worlds (HNSW) algorithm. Built-in quantization isn't supported with exhaustive KNN.
 
@@ -107,7 +107,7 @@ Make sure your index has the Hierarchical Navigable Small Worlds (HNSW) algorith
    }
    ```
 
-### Create and assign a new vector profile
+## Create and assign a new vector profile
 
 To use a new quantization configuration, you must create a *new* vector profile. Creation of a new vector profile is necessary for building compressed indexes in memory. Your new profile uses HNSW.
 
@@ -151,14 +151,79 @@ To use a new quantization configuration, you must create a *new* vector profile.
 
 1. [Load the index](search-what-is-data-import.md) using indexers for pull model indexing, or APIs for push model indexing.
 
-### How scalar quantization works in Azure AI Search
+## How scalar quantization works in Azure AI Search
 
 Scalar quantization reduces the resolution of each number within each vector embedding. Instead of describing each number as a 16-bit or 32-bit floating point number, it uses an 8-bit integer. It identifies a range of numbers (typically 99th percentile minimum and maximum) and divides them into a finite number of levels or bin, assigning each bin an identifier. In 8-bit scalar quantization, there are 2^8, or 256, possible bins.
 
 Each component of the vector is mapped to the closest representative value within this set of quantization levels in a process akin to rounding a real number to the nearest integer. In the quantized 8-bit vector, the identifier number stands in place of the original value. After quantization, each vector is represented by an array of identifiers for the bins to which its components belong. These quantized vectors require much fewer bits to store compared to the original vector, thus reducing storage requirements and memory footprint.
 
-### How  binary quantization works in Azure AI Search
+## How  binary quantization works in Azure AI Search
 
 Binary quantization compresses high-dimensional vectors by representing each component as a single bit, either 0 or 1. This method drastically reduces the memory footprint and accelerates vector comparison operations, which are crucial for search and retrieval tasks. Benchmark tests show up to 96% reduction in vector index size.
 
 It's particularly effective for embeddings with dimensions greater than 1024. For smaller dimensions, we recommend testing the quality of binary quantization, or trying scalar instead. Additionally, we’ve found BQ performs very well when embeddings are centered around zero. Most popular embedding models such as OpenAI, Cohere, and Mistral are centered around zero.
+
+## Example: vector compression techniques
+
+Here's Python code that demonstrates quantization, [narrow data types](vector-search-how-to-assign-narrow-data-types.md), and use of the [stored property](vector-search-how-to-storage-options.md). 
+
+This code is borrowed from [Code sample: Vector quantization and storage options using Python](https://github.com/Azure/azure-search-vector-samples/blob/main/demo-python/code/vector-quantization-and-storage/README.md).
+
+This code creates and compares storage and vector index size for each option.
+
+```bash
+****************************************
+Index Name: compressiontest-baseline
+Storage Size: 21.3613MB
+Vector Size: 4.8277MB
+****************************************
+Index Name: compressiontest-compression
+Storage Size: 17.7604MB
+Vector Size: 1.2242MB
+****************************************
+Index Name: compressiontest-narrow
+Storage Size: 16.5567MB
+Vector Size: 2.4254MB
+****************************************
+Index Name: compressiontest-no-stored
+Storage Size: 10.9224MB
+Vector Size: 4.8277MB
+****************************************
+Index Name: compressiontest-all-options
+Storage Size: 4.9192MB
+Vector Size: 1.2242MB
+```
+
+Search APIs report storage and vector size at the index level, so indexes and not fields must be the basis of comparison. Use the [GET Index Statistics](/rest/api/searchservice/indexes/get-statistics) or an equivalent API in the Azure SDKs to obtain vector size.
+
+## Query a quantized vector field using oversampling
+
+Query syntax for a compressed or quantized vector field is the same as for noncompressed vector fields, unless you want to override parameters associated with oversampling or reranking with original vectors.
+
+Recall that the [vector compression definition](vector-search-how-to-quantization.md) in the index has settings for `rerankWithOriginalVectors` and `defaultOversampling` to mitigate the effects of a smaller vector index. You can override the default values to vary the behavior at query time. For example, if `defaultOversampling` is 10.0, you can change it to something else in the query request.
+
+You can set the oversampling parameter even if the index doesn't explicitly have a `rerankWithOriginalVectors` or `defaultOversampling` definition. Providing `oversampling` at query time overrides the index settings for that query and executes the query with an effective `rerankWithOriginalVectors` as true.
+
+```http
+POST https://[service-name].search.windows.net/indexes/demo-index/docs/search?api-version=2024-07-01   
+  Content-Type: application/json   
+  api-key: [admin key]   
+
+    {    
+       "vectorQueries": [
+            {    
+                "kind": "vector",    
+                "vector": [8, 2, 3, 4, 3, 5, 2, 1],    
+                "fields": "myvector",
+                "oversampling": 12.0,
+                "k": 5   
+            }
+      ]    
+    }
+```
+
+**Key points**:
+
+- Applies to vector fields that undergo vector compression, per the vector profile assignment.
+
+- Overrides the `defaultOversampling` value or introduces oversampling at query time, even if the index's compression configuration didn't specify oversampling or reranking options.