You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/managed-instance-apache-cassandra/best-practice-performance.md
+19-6Lines changed: 19 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -147,7 +147,7 @@ For more information refer to [Virtual Machine and disk performance](../virtual-
147
147
148
148
### Network performance
149
149
150
-
In most cases network performance is sufficient. However, if you are frequently streaming data (such as frequent horizontal scale-up/scale down) or there are huge ingress/egress data movements, this can become a problem. You may need to evaluate the network performance of your SKU. For example, the `Standard_DS14_v2` SKU supports 12,000 Mb/s, compare this to the byte-in/out in the metrics:
150
+
In most cases network performance is sufficient. However, if you're frequently streaming data (such as frequent horizontal scale-up/scale down) or there are huge ingress/egress data movements, this can become a problem. You may need to evaluate the network performance of your SKU. For example, the `Standard_DS14_v2` SKU supports 12,000 Mb/s, compare this to the byte-in/out in the metrics:
151
151
152
152
153
153
:::image type="content" source="./media/best-practice-performance/metrics-network.png" alt-text="Screenshot of network metrics." lightbox="./media/best-practice-performance/metrics-network.png" border="true":::
@@ -162,14 +162,14 @@ If you only see the network elevated for a small number of nodes, you might have
162
162
163
163
### Too many connected clients
164
164
165
-
Deployments should be planned and provisioned to support the maximum number of parallel requests required for the desired latency of an application. For a given deployment, introducing more load to the system above a minimum threshold increases overall latency. Monitor the number of connected clients to ensure this does not exceed tolerable limits.
165
+
Deployments should be planned and provisioned to support the maximum number of parallel requests required for the desired latency of an application. For a given deployment, introducing more load to the system above a minimum threshold increases overall latency. Monitor the number of connected clients to ensure this doesn't exceed tolerable limits.
166
166
167
167
:::image type="content" source="./media/best-practice-performance/metrics-connections.png" alt-text="Screenshot of connected client metrics." lightbox="./media/best-practice-performance/metrics-connections.png" border="true":::
168
168
169
169
170
170
### Disk space
171
171
172
-
In most cases, there is sufficient disk space as default deployments are optimized for IOPS, which leads to low utilization of the disk. Nevertheless, we advise occasionally reviewing disk space metrics. Cassandra accumulates a lot of disk and then reduces it when compaction is triggered. Hence it is important to review disk usage over longer periods to establish trends - like compaction unable to recoup space.
172
+
In most cases, there's sufficient disk space as default deployments are optimized for IOPS, which leads to low utilization of the disk. Nevertheless, we advise occasionally reviewing disk space metrics. Cassandra accumulates a lot of disk and then reduces it when compaction is triggered. Hence it is important to review disk usage over longer periods to establish trends - like compaction unable to recoup space.
173
173
174
174
> [!NOTE]
175
175
> In order to ensure available space for compaction, disk utilization should be kept to around 50%.
@@ -188,7 +188,7 @@ Our default formula assigns half the VM's memory to the JVM with an upper limit
188
188
189
189
In most cases memory gets reclaimed effectively by the Java garbage collector, but especially if the CPU is often above 80% there aren't enough CPU cycles for the garbage collector left. So any CPU performance problems should be addresses before memory problems.
190
190
191
-
If the CPU hovers below 70%, and the garbage collection isn't able to reclaim memory, you might need more JVM memory. This is especially the case if you are on a SKU with limited memory. In most cases, you will need to review your queries and client settings and reduce `fetch_size` along with what is chosen in `limit` within your CQL query.
191
+
If the CPU hovers below 70%, and the garbage collection isn't able to reclaim memory, you might need more JVM memory. This is especially the case if you're on a SKU with limited memory. In most cases, you'll need to review your queries and client settings and reduce `fetch_size` along with what is chosen in `limit` within your CQL query.
192
192
193
193
If you indeed need more memory, you can:
194
194
@@ -222,11 +222,24 @@ You might encounter this warning in the [CassandraLogs](monitor-clusters.md#crea
222
222
223
223
`Writing large partition <table> (105.426MiB) to sstable <file>`
224
224
225
-
This indicates a problem in the data model. Here is a [stack overflow article](https://stackoverflow.com/questions/74024443/how-do-i-analyse-and-solve-writing-large-partition-warnings-in-cassandra) that goes into more detail. This can cause severe performance issues and needs to be addressed.
225
+
This indicates a problem in the data model. Here's a [stack overflow article](https://stackoverflow.com/questions/74024443/how-do-i-analyse-and-solve-writing-large-partition-warnings-in-cassandra) that goes into more detail. This can cause severe performance issues and needs to be addressed.
226
+
227
+
## Specialized optimizations
228
+
### Compression
229
+
Cassandra allows the selection of an appropriate compression algorithm when a table is created (see [Compression](https://cassandra.apache.org/doc/latest/cassandra/operating/compression.html)) The default is LZ4 which is excellent
230
+
for throughput and CPU but consumes more space on disk. Using Zstd (Cassandra 4.0 and up) saves about ~12% space with
231
+
minimal CPU overhead.
232
+
233
+
### Optimizing memtable heap space
234
+
Our default is to use 1/4 of the JVM heap for [memtable_heap_space](https://cassandra.apache.org/doc/latest/cassandra/configuration/cass_yaml_file.html#memtable_heap_space)
235
+
in the cassandra.yaml. For write oriented application and/or on SKUs with small memory
236
+
this can lead to frequent flushing and fragmented sstables thus requiring more compaction.
237
+
In such cases increasing it to at least 4048 might be beneficial but requires careful benchmarking
238
+
to make sure other operations (e.g. reads) aren't affected.
226
239
227
240
## Next steps
228
241
229
242
In this article, we laid out some best practices for optimal performance. You can now start working with the cluster:
230
243
231
244
> [!div class="nextstepaction"]
232
-
> [Create a cluster using Azure Portal](create-cluster-portal.md)
245
+
> [Create a cluster using Azure Portal](create-cluster-portal.md)
0 commit comments