You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/managed-instance-apache-cassandra/best-practice-performance.md
+13-14Lines changed: 13 additions & 14 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,6 @@
1
1
---
2
2
title: Best practices for optimal performance in Azure Managed Instance for Apache Cassandra
3
-
description: Learn about best practices to ensure optimal performance from Azure Managed Instance for Apache Cassandra
3
+
description: Learn about best practices to ensure optimal performance from Azure Managed Instance for Apache Cassandra.
4
4
author: IriaOsara
5
5
ms.service: managed-instance-apache-cassandra
6
6
ms.topic: how-to
@@ -36,7 +36,7 @@ Transactional workloads typically need a data center optimized for low latency,
36
36
37
37
### Optimizing for analytical workloads
38
38
39
-
We recommend customers apply the following `cassandra.yaml` settings for analytical workloads (see [here](create-cluster-portal.md#update-cassandra-configuration) on how to apply)
39
+
We recommend customers apply the following `cassandra.yaml` settings for analytical workloads (see [here](create-cluster-portal.md#update-cassandra-configuration) on how to apply).
40
40
41
41
42
42
@@ -78,7 +78,7 @@ We recommend boosting Cassandra client driver timeouts in accordance with the ti
78
78
79
79
### Optimizing for low latency
80
80
81
-
Our default settings are already suitable for low latency workloads. To ensure best performance for tail latencies we highly recommend using a client driver that supports [speculative execution](https://docs.datastax.com/en/developer/java-driver/4.10/manual/core/speculative_execution/) and configuring your client accordingly. For Java V4 driver, you can find a demo illustrating how this works and how to enable the policy [here](https://github.com/Azure-Samples/azure-cassandra-mi-java-v4-speculative-execution).
81
+
Our default settings are already suitable for low latency workloads. To ensure best performance for tail latencies, we highly recommend using a client driver that supports [speculative execution](https://docs.datastax.com/en/developer/java-driver/4.10/manual/core/speculative_execution/) and configuring your client accordingly. For Java V4 driver, you can find a demo illustrating how this works and how to enable the policy [here](https://github.com/Azure-Samples/azure-cassandra-mi-java-v4-speculative-execution).
82
82
83
83
84
84
@@ -93,7 +93,7 @@ Like every database system, Cassandra works best if the CPU utilization is aroun
93
93
:::image type="content" source="./media/best-practice-performance/metrics.png" alt-text="Screenshot of CPU metrics." lightbox="./media/best-practice-performance/metrics.png" border="true":::
94
94
95
95
96
-
If the CPU is permanently above 80% for most nodes the database will become overloaded manifesting in multiple client timeouts. In this scenario, we recommend taking the following actions:
96
+
If the CPU is permanently above 80% for most nodes the database becomes overloaded manifesting in multiple client timeouts. In this scenario, we recommend taking the following actions:
97
97
98
98
* vertically scale up to a SKU with more CPU cores (especially if the cores are only 8 or less).
99
99
* horizontally scale by adding more nodes (as mentioned earlier, the number of nodes should be multiple of the replication factor).
@@ -151,7 +151,7 @@ If your IOPS max out what your SKU supports, you can:
151
151
*[Scale up the data center(s)](create-cluster-portal.md#scale-a-datacenter) by adding more nodes.
152
152
153
153
154
-
For more information refer to [Virtual Machine and disk performance](../virtual-machines/disks-performance.md).
154
+
For more information, refer to [Virtual Machine and disk performance](../virtual-machines/disks-performance.md).
155
155
156
156
### Network performance
157
157
@@ -161,7 +161,7 @@ In most cases network performance is sufficient. However, if you're frequently s
161
161
:::image type="content" source="./media/best-practice-performance/metrics-network.png" alt-text="Screenshot of network metrics." lightbox="./media/best-practice-performance/metrics-network.png" border="true":::
162
162
163
163
164
-
If you only see the network elevated for a small number of nodes, you might have a hot partition and need to review your data distribution and/or access patterns for a potential skew.
164
+
If you only see the network elevated for a few nodes, you might have a hot partition and need to review your data distribution and/or access patterns for a potential skew.
165
165
166
166
* Vertically scale up to a different SKU supporting more network I/O.
167
167
* Horizontally scale up the cluster by adding more nodes.
@@ -177,7 +177,7 @@ Deployments should be planned and provisioned to support the maximum number of p
177
177
178
178
### Disk space
179
179
180
-
In most cases, there's sufficient disk space as default deployments are optimized for IOPS, which leads to low utilization of the disk. Nevertheless, we advise occasionally reviewing disk space metrics. Cassandra accumulates a lot of disk and then reduces it when compaction is triggered. Hence it is important to review disk usage over longer periods to establish trends - like compaction unable to recoup space.
180
+
In most cases, there's sufficient disk space as default deployments are optimized for IOPS, which leads to low utilization of the disk. Nevertheless, we advise occasionally reviewing disk space metrics. Cassandra accumulates a lot of disks and then reduces it when compaction is triggered. Hence it's important to review disk usage over longer periods to establish trends - like compaction unable to recoup space.
181
181
182
182
> [!NOTE]
183
183
> In order to ensure available space for compaction, disk utilization should be kept to around 50%.
@@ -192,11 +192,11 @@ If you only see this behavior for a few nodes, you might have a hot partition an
192
192
193
193
### JVM memory
194
194
195
-
Our default formula assigns half the VM's memory to the JVM with an upper limit of 31 GB - which in most cases is a good balance between performance and memory. Some workloads, especially ones which have frequent cross-partition reads or range scans might be memory challenged.
195
+
Our default formula assigns half the VM's memory to the JVM with an upper limit of 31 GB - which in most cases is a good balance between performance and memory. Some workloads, especially ones that have frequent cross-partition reads or range scans might be memory challenged.
196
196
197
197
In most cases memory gets reclaimed effectively by the Java garbage collector, but especially if the CPU is often above 80% there aren't enough CPU cycles for the garbage collector left. So any CPU performance problems should be addresses before memory problems.
198
198
199
-
If the CPU hovers below 70%, and the garbage collection isn't able to reclaim memory, you might need more JVM memory. This is especially the case if you're on a SKU with limited memory. In most cases, you'll need to review your queries and client settings and reduce `fetch_size` along with what is chosen in `limit` within your CQL query.
199
+
If the CPU hovers below 70%, and the garbage collection isn't able to reclaim memory, you might need more JVM memory. This is especially the case if you're on a SKU with limited memory. In most cases, you need to review your queries and client settings and reduce `fetch_size` along with what is chosen in `limit` within your CQL query.
200
200
201
201
If you indeed need more memory, you can:
202
202
@@ -206,7 +206,7 @@ If you indeed need more memory, you can:
206
206
207
207
### Tombstones
208
208
209
-
We run repairs every seven days with reaper which removes rows whose TTL has expired (called "tombstone"). Some workloads have more frequent deletes and see warnings like `Read 96 live rows and 5035 tombstone cells for query SELECT ...; token <token> (see tombstone_warn_threshold)` in the Cassandra logs, or even errors indicating that a query couldn't be fulfilled due to excessive tombstones.
209
+
We run repairs every seven days with reaper, which removes rows whose TTL has expired (called "tombstone"). Some workloads have more frequent deletes and see warnings like `Read 96 live rows and 5035 tombstone cells for query SELECT ...; token <token> (see tombstone_warn_threshold)` in the Cassandra logs, or even errors indicating that a query couldn't be fulfilled due to excessive tombstones.
210
210
211
211
A short term mitigation if queries don't get fulfilled is to increase the `tombstone_failure_threshold` in the [Cassandra config](create-cluster-portal.md#update-cassandra-configuration) from the default 100,000 to a higher value.
212
212
@@ -234,16 +234,15 @@ This indicates a problem in the data model. Here's a [stack overflow article](ht
234
234
235
235
## Specialized optimizations
236
236
### Compression
237
-
Cassandra allows the selection of an appropriate compression algorithm when a table is created (see [Compression](https://cassandra.apache.org/doc/latest/cassandra/operating/compression.html)) The default is LZ4 which is excellent
238
-
for throughput and CPU but consumes more space on disk. Using Zstd (Cassandra 4.0 and up) saves about ~12% space with
237
+
Cassandra allows the selection of an appropriate compression algorithm when a table is created (see [Compression](https://cassandra.apache.org/doc/latest/cassandra/operating/compression.html)) The default is LZ4, which is excellent for throughput and CPU but consumes more space on disk. Using Zstd (Cassandra 4.0 and up) saves about ~12% space with
239
238
minimal CPU overhead.
240
239
241
240
### Optimizing memtable heap space
242
241
Our default is to use 1/4 of the JVM heap for [memtable_heap_space](https://cassandra.apache.org/doc/latest/cassandra/configuration/cass_yaml_file.html#memtable_heap_space)
243
242
in the cassandra.yaml. For write oriented application and/or on SKUs with small memory
244
243
this can lead to frequent flushing and fragmented sstables thus requiring more compaction.
245
-
In such cases increasing it to at least 4048 might be beneficial but requires careful benchmarking
246
-
to make sure other operations (e.g. reads) aren't affected.
244
+
In such cases increasing, it to at least 4048 might be beneficial but requires careful benchmarking
245
+
to make sure other operations (for example, reads) aren't affected.
0 commit comments