Merge pull request #272819 from HeidiSteen/heidist-vectors

American-Dipper · web-flow · commit ec42a4d04c8d · 2024-04-19T17:00:05.000-07:00
[azure search] vector index size edits
diff --git a/articles/search/vector-search-index-size.md b/articles/search/vector-search-index-size.md
@@ -9,35 +9,33 @@ ms.service: cognitive-search
 ms.custom:
   - ignite-2023
 ms.topic: conceptual
-ms.date: 04/03/2024
+ms.date: 04/19/2024
 ---
 
 # Vector index size and staying under limits
 
 For each vector field, Azure AI Search constructs an internal vector index using the algorithm parameters specified on the field. Because Azure AI Search imposes quotas on vector index size, you should know how to estimate and monitor vector size to ensure you stay under the limits.
 
 > [!NOTE]
-> A note about terminology. Internally, the physical data structures of a search index include raw content (used for retrieval patterns requiring non-tokenized content), inverted indexes (used for searchable text fields), and vector indexes (used for searchable vector fields). This article explains the limits for the physical vector indexes that back each of your vector fields. 
+> A note about terminology. Internally, the physical data structures of a search index include raw content (used for retrieval patterns requiring non-tokenized content), inverted indexes (used for searchable text fields), and vector indexes (used for searchable vector fields). This article explains the limits for the internal vector indexes that back each of your vector fields. 
 
 > [!TIP]
-> [Vector quantization and storage configuration](vector-search-how-to-configure-compression-storage.md) is now in preview. You can use narrow data types, apply scalar quantization, and eliminate some storage requirements if you don't need the data.
+> [Vector quantization and storage configuration](vector-search-how-to-configure-compression-storage.md) is now in preview. Use capabilities like narrow data types, scalar quantization, and elimination of redundant storage to stay under vector quota and storage quota.
 
 ## Key points about quota and vector index size
 
 + Vector index size is measured in bytes.
 
-+ There's no quota at the search index level. Instead vector quotas are enforced service-wide at the partition level. Quota varies by service tier (or `SKU`) and the service creation date, with newer services having much higher quotas per partition.
++ Vector quotas are based on memory constraints. All searchable vector indexes must be loaded into memory. At the same time, there must also be sufficient memory for other runtime operations. Vector quotas exist to ensure that the overall system remains stable and balanced for all workloads.
+
++ Vector indexes are also subject to disk quota, in the sense that all indexes, vector and nonvector, are subject disk quota. There's no separate disk quota for vector indexes.
+
++ Vector quotas are enforced on the search service as a whole, per partition, meaning that if you add partitions, vector quota goes up. Per-partition vector quotas are higher on newer services:
 
   + [Vector quota for services created after April 3, 2024](search-limits-quotas-capacity.md#vector-limits-on-services-created-after-april-3-2024-in-supported-regions)
   + [Vector quota for services created between July 1, 2023 and April 3, 2024](search-limits-quotas-capacity.md#vector-limits-on-services-created-between-july-1-2023-and-april-3-2024)
   + [Vector quota for services created before July 1, 2023](search-limits-quotas-capacity.md#vector-limits-on-services-created-before-july-1-2023)
 
-+ Vector quotas are primarily designed around memory constraints. All searchable vector indexes must be loaded into memory. At the same time, there must also be sufficient memory for other runtime operations. Vector quotas exist to ensure that the overall system remains stable and balanced for all workloads.
-
-+ Vector quotas are expressed in terms of physical storage, and physical storage is contingent upon partition size and quantity. Each tier offers increasingly powerful and larger partitions. Higher tiers and more partitions give you more vector quota to work with. In [service limits](search-limits-quotas-capacity.md#service-limits), maximum vector quotas are based on the maximum amount of physical space that all vector indexes can consume collectively, assuming all partitions are in use for that service.
-
-  For example, on new services in a supported region, the sum total of all vector indexes on a Basic search service can't be more than 15 GB because Basic can have up to three partitions (5-GB quota per partition). On S1, which can have up to 12 partitions, the quota for vector data is 35 GB per partition, or up to 160 GB if you allocate all 12 partitions.
-
 ## How to check partition size and quantity
 
 If you aren't sure what your search service limits are, here are two ways to get that information:
@@ -78,31 +76,38 @@ A request for vector metrics is a data plane operation. You can use the Azure po
 
 Usage information can be found on the **Overview** page's **Usage** tab. Portal pages refresh every few minutes so if you recently updated an index, wait a bit before checking results.
 
-The following screenshot is for a Standard 1 (S1) tier, configured for one partition and one replica. Vector index quota, measured in megabytes, refers to the internal vector indexes created for each vector field. Overall, indexes consume almost 460 megabytes of available storage, but the vector index component takes up just 93 megabytes of the 460 used on this search service.
+The following screenshot is for an older Standard 1 (S1) search service, configured for one partition and one replica. 
+
++ Storage quota is a disk constraint, and it's inclusive of all indexes (vector and nonvector) on a search service.
++ Vector index size quota is a memory constraint. It's the amount of memory required to load all internal vector indexes created for each vector field on a search service. 
+
+The screenshot indicates that indexes (vector and nonvector) consume almost 460 megabytes of available disk storage. Vector indexes consume almost 93 megabytes of memory at the service level. 
 
 :::image type="content" source="media/vector-search-index-size/portal-vector-index-size.png" lightbox="media/vector-search-index-size/portal-vector-index-size.png" alt-text="Screenshot of the Overview page's usage tab showing vector index consumption against quota.":::
 
-The tile on the Usage tab tracks vector index consumption at the search service level. If you increase or decrease search service capacity, the tile reflects the changes accordingly.
+Quotas for both storage and vector index size increase or decrease as you add or remove partitions. If you change the partition count, the tile shows a corresponding change in storage and vector quota. 
+
+> [!NOTE]
+> On disk, vector indexes aren't 93 megabytes. Vector indexes on disk take up about three times more space than vector indexes in memory. See [How vector fields affect disk storage](#how-vector-fields-affect-disk-storage) for details.
 
 ### [**REST**](#tab/rest-vector-quota)
 
 Use the following data plane REST APIs (version 2023-10-01-preview, 2023-11-01, and later) for vector usage statistics:
 
++ [GET Service Statistics](/rest/api/searchservice/get-service-statistics/get-service-statistics) returns quota and usage for the search service all-up. 
 + [GET Index Statistics](/rest/api/searchservice/indexes/get-statistics) returns usage for a given index.
-+ [GET Service Statistics](/rest/api/searchservice/get-service-statistics/get-service-statistics) returns quota and usage for the search service all-up.
 
-For a visual, here's the sample response for a Basic search service that has the quickstart vector search index. `storageSize` and `vectorIndexSize` are reported in bytes. 
+Usage and quota are reported in bytes. 
 
-```json
-{
-    "@odata.context": "https://my-demo.search.windows.net/$metadata#Microsoft.Azure.Search.V2023_11_01.IndexStatistics",
-    "documentCount": 108,
-    "storageSize": 5853396,
-    "vectorIndexSize": 1342756
-}
+Here's GET Service Statistics:
+
+```http
+GET {{baseUrl}}/servicestats?api-version=2023-11-01  HTTP/1.1
+    Content-Type: application/json
+    api-key: {{apiKey}}
 ```
 
-Return service statistics to compare usage against available quota at the service level:
+Response includes metrics for `storageSize`, which doesn't distinguish between vector and nonvector indexes. The `vectorIndexSize` statistic shows usage and quota at the service level.  
 
 ```json
 {
@@ -136,6 +141,24 @@ Return service statistics to compare usage against available quota at the servic
 }
 ```
 
+You can also send a GET Index Statistics to get the physical size of the index on disk, plus the in-memory size of the vector fields. 
+
+```http
+GET {{baseUrl}}/indexes/vector-healthplan-idx/stats?api-version=2023-11-01  HTTP/1.1
+    Content-Type: application/json
+    api-key: {{apiKey}}
+```
+
+Response includes usage information at the index level. This example is based on the index created in the [integrated vectorization quickstart](search-get-started-portal-import-vectors.md) that chunks and vectorizes health plan PDFs. Each chunk contributes to `documentCount`.
+
+```json
+{
+    "@odata.context": "https://my-demo.search.windows.net/$metadata#Microsoft.Azure.Search.V2023_11_01.IndexStatistics",
+    "documentCount": 147,
+    "storageSize": 4592870,
+    "vectorIndexSize": 915484
+}
+```
 ---
 
 ## Factors affecting vector index size
@@ -206,8 +229,4 @@ To obtain the **vector index size**, multiply this **raw_size** by the **algorit
 
 ## How vector fields affect disk storage
 
-Disk storage overhead of vector data is roughly three times the size of vector index size.
-
-### Storage vs. vector index size quotas
-
-Service storage and vector index size quotas aren't separate quotas. Vector indexes contribute to the [storage quota for the search service](search-limits-quotas-capacity.md#service-limits) as a whole. For example, if your storage quota is exhausted but there's remaining vector quota, you can't index any more documents, regardless if they're vector documents, until you scale up in partitions to increase storage quota or delete documents (either text or vector) to reduce storage usage. Similarly, if vector quota is exhausted but there's remaining storage quota, further indexing attempts fail until vector quota is freed, either by deleting some vector documents or by scaling up in partitions.
+Most of this article provides information about the size of vectors in memory. If you want to know about vector size on disk, the disk consumption for vector data is roughly three times the size of the vector index in memory. For example, if your `vectorIndexSize` usage is at 100 megabytes (10 million bytes), you would have used least 300 megabytes of `storageSize` quota to accommodate your vector indexes
diff --git a/includes/azure-search-limits-per-service.md b/includes/azure-search-limits-per-service.md
@@ -40,7 +40,7 @@ Currently, there's no in-place upgrade. You should [create a new search service]
 | Resource | Free | Basic  | S1 | S2 | S3 | S3&nbsp;HD | L1 | L2 |
 |----------|------|--------|----|----|----|------------|----|----|
 | Service level agreement (SLA) | No |Yes |Yes |Yes |Yes |Yes |Yes |Yes |
-| Storage (partition size)  | 50&nbsp;MB | 15&nbsp;GB | 160&nbsp;GB | 350&nbsp;GB | 700&nbsp;GB |200&nbsp;GB| 1&nbsp;TB | 2&nbsp;TB  |
+| Storage (partition size)  | 50&nbsp;MB | 15&nbsp;GB | 160&nbsp;GB | 350&nbsp;GB | 700&nbsp;GB |700&nbsp;GB| 1&nbsp;TB | 2&nbsp;TB  |
 | Partitions | N/A |3 |12 |12 |12 |3 |12 |12 |
 | Replicas | N/A | 3 |12 |12 |12 |12 |12 |12 |