Merge pull request #219532 from TheovanKraay/patch-22

prmerger-automator[bot] · web-flow · commit 2623e91486dd · 2022-11-25T14:48:50.000Z
Update throughput-control-spark.md
diff --git a/articles/cosmos-db/nosql/throughput-control-spark.md b/articles/cosmos-db/nosql/throughput-control-spark.md
@@ -15,6 +15,9 @@ ms.author: thvankra
 
 The [Spark Connector](quickstart-spark.md) allows you to communicate with Azure Cosmos DB using [Apache Spark](https://spark.apache.org/). This article describes how the throughput control feature works. Check out our [Spark samples in GitHub](https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/cosmos/azure-cosmos-spark_3_2-12/Samples) to get started using throughput control.
 
+> [!TIP]
+> This article documents the use of global throughput control groups in the Azure Cosmos DB Spark Connector, but the functionality is also available in the [Java SDK](/azure/cosmos-db/nosql/sdk-java-v4). In the SDK, you can also use Local Throughput Control groups to limit the RU consumption in the context of a single client connection instance. For example, you can apply this to different operations within a single microservice, or maybe to a single data loading program. Take a look at a code snippet [here](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos/src/samples/java/com/azure/cosmos/ThroughputControlCodeSnippet.java) for how to build a CosmosAsyncClient with both local and global control groups.
+
 ## Why is throughput control important?
 
  Having throughput control helps to isolate the performance needs of applications running against a container, by limiting the amount of [request units](../request-units.md) that can be consumed by a given Spark client. 
@@ -76,7 +79,7 @@ In the above example, the `targetThroughputThreshold` is defined as **0.95**, so
     }
 ```
 > [!NOTE]
-> Throughput control does not do RU pre-calculation of each operation. Instead, it tracks the RU usages after the operation based on the response header. As such, throughput control is based on an approximation - and does not guarantee that amount of throughput will be available for the group at any given time. 
+> Throughput control does not do RU pre-calculation of each operation. Instead, it tracks the RU usages *after* the operation based on the response header. As such, throughput control is based on an approximation - and **does not guarantee** that amount of throughput will be available for the group at any given time. For example, if the configured RU is so low that a single operation can use it all, then throughput control cannot avoid the RU exceeding the configured limit. Therefore, throughput control works best when the configured limit is higher than any single operation that can be executed by a client in the given control group.
 
 > [!WARNING]
 > The `targetThroughputThreshold` is **immutable**. If you change the target throughput threshold value, this will create a new throughput control group (but as long as you use Version 4.10.0 or later it can have the same name). You need to restart all Spark jobs that are using the group if you want to ensure they all consume the new threshold immediately (otherwise they will pick-up the new threshold after the next restart).