Merge pull request #261993 from jcocchi/add-bypass-cache

prmerger-automator[bot] · web-flow · commit 68e775498e64 · 2023-12-27T22:17:10.000Z
Cosmos DB: Add bypass integrated cache section
diff --git a/articles/cosmos-db/how-to-configure-integrated-cache.md b/articles/cosmos-db/how-to-configure-integrated-cache.md
@@ -68,7 +68,7 @@ You must ensure the request consistency is session or eventual. If not, the requ
 
 ## Adjust MaxIntegratedCacheStaleness
 
-Configure `MaxIntegratedCacheStaleness`, which is the maximum time in which you are willing to tolerate stale cached data. It is recommended to set the `MaxIntegratedCacheStaleness` as high as possible because it will increase the likelihood that repeated point reads and queries can be cache hits. If you set `MaxIntegratedCacheStaleness` to 0, your read request will **never** use the integrated cache, regardless of the consistency level. When not configured, the default `MaxIntegratedCacheStaleness` is 5 minutes.
+Configure `MaxIntegratedCacheStaleness`, which is the maximum time in which you're willing to tolerate stale cached data. It's recommended to set the `MaxIntegratedCacheStaleness` as high as possible because it will increase the likelihood that repeated point reads and queries can be cache hits. If you set `MaxIntegratedCacheStaleness` to 0, your read request will **never** use the integrated cache, regardless of the consistency level. When not configured, the default `MaxIntegratedCacheStaleness` is 5 minutes.
 
 >[!NOTE]
 > The `MaxIntegratedCacheStaleness` can be set as high as 10 years. In practice, this value is the maximum staleness and the cache may be reset sooner due to node restarts which may occur. 
@@ -133,6 +133,54 @@ container.query_items(
 ---
 
 
+## Bypass the integrated cache (Preview)
+
+Use the `BypassIntegratedCache` request option to control which requests use the integrated cache. Writes, point reads, and queries that bypass the integrated cache won't use cache storage, saving space for other items. Requests that bypass the cache are still routed through the dedicated gateway. These requests are served from the backend and cost RUs.
+
+Bypassing the cache is supported in these versions of each SDK:
+
+| SDK | Supported versions |
+| --- | ------------------ |
+| **.NET SDK v3** | *>= 3.35.0-preview* |
+| **Java SDK v4** | *>= 4.49.0* |
+| **Node.js SDK** | Not supported |
+| **Python SDK**  | Not supported |
+
+### [.NET](#tab/dotnet)
+
+```csharp
+FeedIterator<MyClass> myQuery = container.GetItemQueryIterator<MyClass>(new QueryDefinition("SELECT * FROM c"), requestOptions: new QueryRequestOptions
+        {
+            DedicatedGatewayRequestOptions = new DedicatedGatewayRequestOptions 
+            { 
+                BypassIntegratedCache = true
+            }
+        }
+);
+```
+
+### [Java](#tab/java)
+
+```java
+DedicatedGatewayRequestOptions dgOptions = new DedicatedGatewayRequestOptions()
+   .setIntegratedCacheBypassed(true);
+CosmosQueryRequestOptions queryOptions = new CosmosQueryRequestOptions()
+   .setDedicatedGatewayRequestOptions(dgOptions);
+
+CosmosPagedFlux<MyClass> pagedFluxResponse = container.queryItems(
+        "SELECT * FROM c", queryOptions, MyClass.class);
+```
+
+### [Node.js](#tab/nodejs)
+
+The bypass integrated cache request option isn't available in the Node.js SDK.
+
+### [Python](#tab/python)
+
+The bypass integrated cache request option isn't available in the Python SDK.
+
+---
+
 ## Verify cache hits
 
 Finally, you can restart your application and verify integrated cache hits for repeated point reads or queries by seeing if the request charge is 0. Once you’ve modified your `CosmosClient` to use the dedicated gateway endpoint, all requests will be routed through the dedicated gateway.
diff --git a/articles/cosmos-db/integrated-cache.md b/articles/cosmos-db/integrated-cache.md
@@ -6,7 +6,7 @@ ms.service: cosmos-db
 ms.subservice: nosql
 ms.custom: ignite-2022
 ms.topic: conceptual
-ms.date: 03/15/2023
+ms.date: 12/27/2023
 ms.author: sidandrews
 ms.reviewer: jucocchi
 ---
@@ -21,7 +21,7 @@ An integrated cache is automatically configured within the dedicated gateway. Th
 * An item cache for point reads 
 * A query cache for queries
 
-The integrated cache is a read-through, write-through cache with a Least Recently Used (LRU) eviction policy. The item cache and query cache share the same capacity within the integrated cache and the LRU eviction policy applies to both. Data is evicted from the cache strictly based on when it was least recently used, regardless of whether it's a point read or query. The cached data within each node depends on the data that was recently [written or read](integrated-cache.md#item-cache) through that specific node. If an item or query is cached on one node, it isn't necessarily cached on the others.
+The integrated cache is a read-through, write-through cache with a Least Recently Used (LRU) eviction policy. The item cache and query caches share the same capacity within the integrated cache and the LRU eviction policy applies to both. Data is evicted from the cache strictly based on when it was least recently used, regardless of whether it's a point read or query. The cached data within each node depends on the data that was recently [written or read](integrated-cache.md#item-cache) through that specific node. If an item or query is cached on one node, it isn't necessarily cached on the others.
 
 > [!NOTE]
 > Do you have any feedback about the integrated cache? We want to hear it! Feel free to share feedback directly with the Azure Cosmos DB engineering team:
@@ -68,12 +68,12 @@ Because each node has an independent cache, it's possible items are invalidated
 
 ## Query cache
 
-The query cache is used to cache queries. The query cache transforms a query into a key/value lookup where the key is the query text and the value is the query results. The integrated cache doesn't have a query engine, it only stores the key/value lookup for each query. Query results are stored as a set, and the cache doesn't keep track of individual items. A given item can be stored in the query cache multiple times if it appears in the result set of multiple queries. Updates to the underlying items won't be reflected in query results unless the [max integrated cache staleness](#maxintegratedcachestaleness) for the query is reached and the query is served from the backend database.
+The query cache is used to cache queries. The query cache transforms a query into a key/value lookup where the key is the query text and the value is the query results. The integrated cache doesn't have a query engine, it only stores the key/value lookup for each query. Query results are stored as a set, and the cache doesn't keep track of individual items. A given item can be stored in the query cache multiple times if it appears in the result set of multiple queries. Updates to the underlying items aren't reflected in query results unless the [max integrated cache staleness](#maxintegratedcachestaleness) for the query is reached and the query is served from the backend database.
 
 ### Populating the query cache
 
 - If the cache doesn't have a result for that query (cache miss) on the node it was routed through, the query is sent to the backend. After the query is run, the cache will store the results for that query
-- Queries with the same shape but different parameters or request options that affect the results (ex. max item count) will be stored as their own key/value pair
+- Queries with the same shape but different parameters or request options that affect the results (ex. max item count) are stored as their own key/value pair
 
 ### Query cache eviction
 
@@ -108,7 +108,7 @@ The easiest way to configure either session or eventual consistency for all read
 
 The `MaxIntegratedCacheStaleness` is the maximum acceptable staleness for cached point reads and queries, regardless of the selected consistency. The `MaxIntegratedCacheStaleness` is configurable at the request-level. For example, if you set a `MaxIntegratedCacheStaleness` of 2 hours, your request will only return cached data if the data is less than 2 hours old. To increase the likelihood of repeated reads utilizing the integrated cache, you should set the `MaxIntegratedCacheStaleness` as high as your business requirements allow.
 
-It's important to understand that the `MaxIntegratedCacheStaleness`, when configured on a request that ends up populating the cache, doesn't affect how long that request is cached. `MaxIntegratedCacheStaleness` enforces consistency when you try to use cached data. There's no global TTL or cache retention setting, so data is only evicted from the cache if either the integrated cache is full or a new read is run with a lower `MaxIntegratedCacheStaleness` than the age of the current cached entry.
+The `MaxIntegratedCacheStaleness`, when configured on a request that ends up populating the cache, doesn't affect how long that request is cached. `MaxIntegratedCacheStaleness` enforces consistency when you try to read cached data. There's no global TTL or cache retention setting, so data is only evicted from the cache if either the integrated cache is full or a new read is run with a lower `MaxIntegratedCacheStaleness` than the age of the current cached entry.
 
 This is an improvement from how most caches work and allows for the following other customizations:
 
@@ -133,6 +133,12 @@ To better understand the `MaxIntegratedCacheStaleness` parameter, consider the f
 
 [Learn to configure the `MaxIntegratedCacheStaleness`.](how-to-configure-integrated-cache.md#adjust-maxintegratedcachestaleness)
 
+## Bypass the integrated cache (Preview)
+
+The integrated cache has a limited storage capacity determined by the dedicated gateway SKU provisioned. By default, all requests from clients configured with the dedicated gateway connection string go through the integrated cache and take up cache space. You can control which items and queries are cached with the bypass integrated cache request option, currently in preview. This request option is useful for item writes or read requests that aren't expected to be frequently repeated. Bypassing the integrated cache for items with infrequent access saves cache space for items with more repeats, increasing RU saving potential and reducing evictions. Requests that bypass the cache are still routed through the dedicated gateway. These requests are served from the backend and cost RUs.
+
+[Learn to bypass the integrated cache.](how-to-configure-integrated-cache.md#bypass-the-integrated-cache-preview)
+
 ## Metrics
 
 It's helpful to monitor some key metrics for the integrated cache. These metrics include:
@@ -161,11 +167,11 @@ The below examples show how to debug some common scenarios:
 
 ### I can’t tell if my application is using the dedicated gateway
 
-Check the `DedicatedGatewayRequests`. This metric includes all requests that use the dedicated gateway, regardless of whether they hit the integrated cache. If your application uses the standard gateway or direct mode with your original connection string, you won't see an error message, but the `DedicatedGatewayRequests` will be zero. If your application uses direct mode with your dedicated gateway connection string, you may still see a small number of `DedicatedGatewayRequests`.
+Check the `DedicatedGatewayRequests`. This metric includes all requests that use the dedicated gateway, regardless of whether they hit the integrated cache. If your application uses the standard gateway or direct mode with your original connection string, you will not see an error message, but the `DedicatedGatewayRequests` will be zero. If your application uses direct mode with your dedicated gateway connection string, you may still see a few `DedicatedGatewayRequests`.
 
 ### I can’t tell if my requests are hitting the integrated cache
 
-Check the `IntegratedCacheItemHitRate` and `IntegratedCacheQueryHitRate`. If both of these values are zero, then requests aren't hitting the integrated cache. Check that you're using the dedicated gateway connection string, [connecting with gateway mode](nosql/sdk-connection-modes.md), and [have set session or eventual consistency](consistency-levels.md#configure-the-default-consistency-level).
+Check the `IntegratedCacheItemHitRate` and `IntegratedCacheQueryHitRate`. If both of these values are zero, then requests aren't hitting the integrated cache. Check that you're using the dedicated gateway connection string, [connecting with gateway mode](nosql/sdk-connection-modes.md), and [are using session or eventual consistency](consistency-levels.md#configure-the-default-consistency-level).
 
 ### I want to understand if my dedicated gateway is too small