You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/managed-instance-apache-cassandra/best-practice-performance.md
+18-17Lines changed: 18 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,11 +1,11 @@
1
1
---
2
2
title: Best practices for optimal performance in Azure Managed Instance for Apache Cassandra
3
-
description: Learn about best practices to ensure optimal performance from Azure Managed Instance for Apache Cassandra
4
-
author: TheovanKraay
3
+
description: Learn about best practices to ensure optimal performance from Azure Managed Instance for Apache Cassandra.
4
+
author: IriaOsara
5
5
ms.service: managed-instance-apache-cassandra
6
6
ms.topic: how-to
7
7
ms.date: 04/05/2023
8
-
ms.author: thvankra
8
+
ms.author: iriaosara
9
9
keywords: azure performance cassandra
10
10
---
11
11
@@ -36,7 +36,7 @@ Transactional workloads typically need a data center optimized for low latency,
36
36
37
37
### Optimizing for analytical workloads
38
38
39
-
We recommend customers apply the following `cassandra.yaml` settings for analytical workloads (see [here](create-cluster-portal.md#update-cassandra-configuration) on how to apply)
39
+
We recommend customers apply the following `cassandra.yaml` settings for analytical workloads (see [here](create-cluster-portal.md#update-cassandra-configuration) on how to apply).
40
40
41
41
42
42
@@ -78,7 +78,7 @@ We recommend boosting Cassandra client driver timeouts in accordance with the ti
78
78
79
79
### Optimizing for low latency
80
80
81
-
Our default settings are already suitable for low latency workloads. To ensure best performance for tail latencies we highly recommend using a client driver that supports [speculative execution](https://docs.datastax.com/en/developer/java-driver/4.10/manual/core/speculative_execution/) and configuring your client accordingly. For Java V4 driver, you can find a demo illustrating how this works and how to enable the policy [here](https://github.com/Azure-Samples/azure-cassandra-mi-java-v4-speculative-execution).
81
+
Our default settings are already suitable for low latency workloads. To ensure best performance for tail latencies, we highly recommend using a client driver that supports [speculative execution](https://docs.datastax.com/en/developer/java-driver/4.10/manual/core/speculative_execution/) and configuring your client accordingly. For Java V4 driver, you can find a demo illustrating how this works and how to enable the policy [here](https://github.com/Azure-Samples/azure-cassandra-mi-java-v4-speculative-execution).
82
82
83
83
84
84
@@ -93,7 +93,7 @@ Like every database system, Cassandra works best if the CPU utilization is aroun
93
93
:::image type="content" source="./media/best-practice-performance/metrics.png" alt-text="Screenshot of CPU metrics." lightbox="./media/best-practice-performance/metrics.png" border="true":::
94
94
95
95
96
-
If the CPU is permanently above 80% for most nodes the database will become overloaded manifesting in multiple client timeouts. In this scenario, we recommend taking the following actions:
96
+
If the CPU is permanently above 80% for most nodes the database becomes overloaded manifesting in multiple client timeouts. In this scenario, we recommend taking the following actions:
97
97
98
98
* vertically scale up to a SKU with more CPU cores (especially if the cores are only 8 or less).
99
99
* horizontally scale by adding more nodes (as mentioned earlier, the number of nodes should be multiple of the replication factor).
@@ -103,7 +103,7 @@ If the CPU is only high for a few nodes, but low for the others, it indicates a
103
103
104
104
105
105
> [!NOTE]
106
-
> Currently changing SKU is only supported via ARM template deployment. You can deploy/edit ARM template, and replace SKU with one of the following.
106
+
> Changing SKU is supported via Azure Portal, Azure CLI and ARM template deployment. You can deploy/edit ARM template, and replace SKU with one of the following.
107
107
>
108
108
> - Standard_E8s_v4
109
109
> - Standard_E16s_v4
@@ -120,6 +120,8 @@ If the CPU is only high for a few nodes, but low for the others, it indicates a
120
120
> - Standard_L8as_v3
121
121
> - Standard_L16as_v3
122
122
> - Standard_L32as_v3
123
+
>
124
+
> Please note that currently, we do not support transitioning across SKU families. For instance, if you currently possess a `Standard_DS13_v2` and are interested in upgrading to a larger SKU such as `Standard_DS14_v2`, this option is not available. However, you can open a support ticket to request an upgrade to the higher SKU.
123
125
124
126
125
127
@@ -149,7 +151,7 @@ If your IOPS max out what your SKU supports, you can:
149
151
*[Scale up the data center(s)](create-cluster-portal.md#scale-a-datacenter) by adding more nodes.
150
152
151
153
152
-
For more information refer to [Virtual Machine and disk performance](../virtual-machines/disks-performance.md).
154
+
For more information, refer to [Virtual Machine and disk performance](../virtual-machines/disks-performance.md).
153
155
154
156
### Network performance
155
157
@@ -159,7 +161,7 @@ In most cases network performance is sufficient. However, if you're frequently s
159
161
:::image type="content" source="./media/best-practice-performance/metrics-network.png" alt-text="Screenshot of network metrics." lightbox="./media/best-practice-performance/metrics-network.png" border="true":::
160
162
161
163
162
-
If you only see the network elevated for a small number of nodes, you might have a hot partition and need to review your data distribution and/or access patterns for a potential skew.
164
+
If you only see the network elevated for a few nodes, you might have a hot partition and need to review your data distribution and/or access patterns for a potential skew.
163
165
164
166
* Vertically scale up to a different SKU supporting more network I/O.
165
167
* Horizontally scale up the cluster by adding more nodes.
@@ -175,7 +177,7 @@ Deployments should be planned and provisioned to support the maximum number of p
175
177
176
178
### Disk space
177
179
178
-
In most cases, there's sufficient disk space as default deployments are optimized for IOPS, which leads to low utilization of the disk. Nevertheless, we advise occasionally reviewing disk space metrics. Cassandra accumulates a lot of disk and then reduces it when compaction is triggered. Hence it is important to review disk usage over longer periods to establish trends - like compaction unable to recoup space.
180
+
In most cases, there's sufficient disk space as default deployments are optimized for IOPS, which leads to low utilization of the disk. Nevertheless, we advise occasionally reviewing disk space metrics. Cassandra accumulates a lot of disks and then reduces it when compaction is triggered. Hence it's important to review disk usage over longer periods to establish trends - like compaction unable to recoup space.
179
181
180
182
> [!NOTE]
181
183
> In order to ensure available space for compaction, disk utilization should be kept to around 50%.
@@ -190,11 +192,11 @@ If you only see this behavior for a few nodes, you might have a hot partition an
190
192
191
193
### JVM memory
192
194
193
-
Our default formula assigns half the VM's memory to the JVM with an upper limit of 31 GB - which in most cases is a good balance between performance and memory. Some workloads, especially ones which have frequent cross-partition reads or range scans might be memory challenged.
195
+
Our default formula assigns half the VM's memory to the JVM with an upper limit of 31 GB - which in most cases is a good balance between performance and memory. Some workloads, especially ones that have frequent cross-partition reads or range scans might be memory challenged.
194
196
195
197
In most cases memory gets reclaimed effectively by the Java garbage collector, but especially if the CPU is often above 80% there aren't enough CPU cycles for the garbage collector left. So any CPU performance problems should be addresses before memory problems.
196
198
197
-
If the CPU hovers below 70%, and the garbage collection isn't able to reclaim memory, you might need more JVM memory. This is especially the case if you're on a SKU with limited memory. In most cases, you'll need to review your queries and client settings and reduce `fetch_size` along with what is chosen in `limit` within your CQL query.
199
+
If the CPU hovers below 70%, and the garbage collection isn't able to reclaim memory, you might need more JVM memory. This is especially the case if you're on a SKU with limited memory. In most cases, you need to review your queries and client settings and reduce `fetch_size` along with what is chosen in `limit` within your CQL query.
198
200
199
201
If you indeed need more memory, you can:
200
202
@@ -204,7 +206,7 @@ If you indeed need more memory, you can:
204
206
205
207
### Tombstones
206
208
207
-
We run repairs every seven days with reaper which removes rows whose TTL has expired (called "tombstone"). Some workloads have more frequent deletes and see warnings like `Read 96 live rows and 5035 tombstone cells for query SELECT ...; token <token> (see tombstone_warn_threshold)` in the Cassandra logs, or even errors indicating that a query couldn't be fulfilled due to excessive tombstones.
209
+
We run repairs every seven days with reaper, which removes rows whose TTL has expired (called "tombstone"). Some workloads have more frequent deletes and see warnings like `Read 96 live rows and 5035 tombstone cells for query SELECT ...; token <token> (see tombstone_warn_threshold)` in the Cassandra logs, or even errors indicating that a query couldn't be fulfilled due to excessive tombstones.
208
210
209
211
A short term mitigation if queries don't get fulfilled is to increase the `tombstone_failure_threshold` in the [Cassandra config](create-cluster-portal.md#update-cassandra-configuration) from the default 100,000 to a higher value.
210
212
@@ -232,16 +234,15 @@ This indicates a problem in the data model. Here's a [stack overflow article](ht
232
234
233
235
## Specialized optimizations
234
236
### Compression
235
-
Cassandra allows the selection of an appropriate compression algorithm when a table is created (see [Compression](https://cassandra.apache.org/doc/latest/cassandra/operating/compression.html)) The default is LZ4 which is excellent
236
-
for throughput and CPU but consumes more space on disk. Using Zstd (Cassandra 4.0 and up) saves about ~12% space with
237
+
Cassandra allows the selection of an appropriate compression algorithm when a table is created (see [Compression](https://cassandra.apache.org/doc/latest/cassandra/operating/compression.html)) The default is LZ4, which is excellent for throughput and CPU but consumes more space on disk. Using Zstd (Cassandra 4.0 and up) saves about ~12% space with
237
238
minimal CPU overhead.
238
239
239
240
### Optimizing memtable heap space
240
241
Our default is to use 1/4 of the JVM heap for [memtable_heap_space](https://cassandra.apache.org/doc/latest/cassandra/configuration/cass_yaml_file.html#memtable_heap_space)
241
242
in the cassandra.yaml. For write oriented application and/or on SKUs with small memory
242
243
this can lead to frequent flushing and fragmented sstables thus requiring more compaction.
243
-
In such cases increasing it to at least 4048 might be beneficial but requires careful benchmarking
244
-
to make sure other operations (e.g. reads) aren't affected.
244
+
In such cases increasing, it to at least 4048 might be beneficial but requires careful benchmarking
245
+
to make sure other operations (for example, reads) aren't affected.
Copy file name to clipboardExpand all lines: articles/managed-instance-apache-cassandra/manage-resources-cli.md
+9-1Lines changed: 9 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -164,8 +164,16 @@ az managed-cassandra datacenter create \
164
164
> - Standard_DS14_v2
165
165
> - Standard_D8s_v4
166
166
> - Standard_D16s_v4
167
-
> - Standard_D32s_v4
167
+
> - Standard_D32s_v4
168
+
> - Standard_L8s_v3
169
+
> - Standard_L16s_v3
170
+
> - Standard_L32s_v3
171
+
> - Standard_L8as_v3
172
+
> - Standard_L16as_v3
173
+
> - Standard_L32as_v3
168
174
>
175
+
> Currently, we do not support transitioning across SKU families. For instance, if you currently possess a `Standard_DS13_v2` and are interested in upgrading to a larger SKU such as `Standard_DS14_v2`, this option is not available. However, you can open a support ticket to request an upgrade to the higher SKU.
176
+
>
169
177
> Note also that `--availability-zone` is set to `false`. To enable availability zones, set this to `true`. Availability zones increase the availability SLA of the service. For more details, review the full SLA details [here](https://azure.microsoft.com/support/legal/sla/managed-instance-apache-cassandra/v1_0/).
0 commit comments