Skip to content

Commit 2623e91

Browse files
Merge pull request #219532 from TheovanKraay/patch-22
Update throughput-control-spark.md
2 parents e6ce1f3 + d9edd92 commit 2623e91

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

articles/cosmos-db/nosql/throughput-control-spark.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,9 @@ ms.author: thvankra
1515

1616
The [Spark Connector](quickstart-spark.md) allows you to communicate with Azure Cosmos DB using [Apache Spark](https://spark.apache.org/). This article describes how the throughput control feature works. Check out our [Spark samples in GitHub](https://github.com/Azure/azure-sdk-for-java/tree/main/sdk/cosmos/azure-cosmos-spark_3_2-12/Samples) to get started using throughput control.
1717

18+
> [!TIP]
19+
> This article documents the use of global throughput control groups in the Azure Cosmos DB Spark Connector, but the functionality is also available in the [Java SDK](/azure/cosmos-db/nosql/sdk-java-v4). In the SDK, you can also use Local Throughput Control groups to limit the RU consumption in the context of a single client connection instance. For example, you can apply this to different operations within a single microservice, or maybe to a single data loading program. Take a look at a code snippet [here](https://github.com/Azure/azure-sdk-for-java/blob/main/sdk/cosmos/azure-cosmos/src/samples/java/com/azure/cosmos/ThroughputControlCodeSnippet.java) for how to build a CosmosAsyncClient with both local and global control groups.
20+
1821
## Why is throughput control important?
1922

2023
Having throughput control helps to isolate the performance needs of applications running against a container, by limiting the amount of [request units](../request-units.md) that can be consumed by a given Spark client.
@@ -76,7 +79,7 @@ In the above example, the `targetThroughputThreshold` is defined as **0.95**, so
7679
}
7780
```
7881
> [!NOTE]
79-
> Throughput control does not do RU pre-calculation of each operation. Instead, it tracks the RU usages after the operation based on the response header. As such, throughput control is based on an approximation - and does not guarantee that amount of throughput will be available for the group at any given time.
82+
> Throughput control does not do RU pre-calculation of each operation. Instead, it tracks the RU usages *after* the operation based on the response header. As such, throughput control is based on an approximation - and **does not guarantee** that amount of throughput will be available for the group at any given time. For example, if the configured RU is so low that a single operation can use it all, then throughput control cannot avoid the RU exceeding the configured limit. Therefore, throughput control works best when the configured limit is higher than any single operation that can be executed by a client in the given control group.
8083
8184
> [!WARNING]
8285
> The `targetThroughputThreshold` is **immutable**. If you change the target throughput threshold value, this will create a new throughput control group (but as long as you use Version 4.10.0 or later it can have the same name). You need to restart all Spark jobs that are using the group if you want to ensure they all consume the new threshold immediately (otherwise they will pick-up the new threshold after the next restart).

0 commit comments

Comments
 (0)