Skip to content

Commit 07482f3

Browse files
Merge pull request #267267 from iriaosara/issue_120204
Updating doc
2 parents 9539118 + d508c4d commit 07482f3

File tree

2 files changed

+27
-18
lines changed

2 files changed

+27
-18
lines changed

articles/managed-instance-apache-cassandra/best-practice-performance.md

Lines changed: 18 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
22
title: Best practices for optimal performance in Azure Managed Instance for Apache Cassandra
3-
description: Learn about best practices to ensure optimal performance from Azure Managed Instance for Apache Cassandra
4-
author: TheovanKraay
3+
description: Learn about best practices to ensure optimal performance from Azure Managed Instance for Apache Cassandra.
4+
author: IriaOsara
55
ms.service: managed-instance-apache-cassandra
66
ms.topic: how-to
77
ms.date: 04/05/2023
8-
ms.author: thvankra
8+
ms.author: iriaosara
99
keywords: azure performance cassandra
1010
---
1111

@@ -36,7 +36,7 @@ Transactional workloads typically need a data center optimized for low latency,
3636

3737
### Optimizing for analytical workloads
3838

39-
We recommend customers apply the following `cassandra.yaml` settings for analytical workloads (see [here](create-cluster-portal.md#update-cassandra-configuration) on how to apply)
39+
We recommend customers apply the following `cassandra.yaml` settings for analytical workloads (see [here](create-cluster-portal.md#update-cassandra-configuration) on how to apply).
4040

4141

4242

@@ -78,7 +78,7 @@ We recommend boosting Cassandra client driver timeouts in accordance with the ti
7878

7979
### Optimizing for low latency
8080

81-
Our default settings are already suitable for low latency workloads. To ensure best performance for tail latencies we highly recommend using a client driver that supports [speculative execution](https://docs.datastax.com/en/developer/java-driver/4.10/manual/core/speculative_execution/) and configuring your client accordingly. For Java V4 driver, you can find a demo illustrating how this works and how to enable the policy [here](https://github.com/Azure-Samples/azure-cassandra-mi-java-v4-speculative-execution).
81+
Our default settings are already suitable for low latency workloads. To ensure best performance for tail latencies, we highly recommend using a client driver that supports [speculative execution](https://docs.datastax.com/en/developer/java-driver/4.10/manual/core/speculative_execution/) and configuring your client accordingly. For Java V4 driver, you can find a demo illustrating how this works and how to enable the policy [here](https://github.com/Azure-Samples/azure-cassandra-mi-java-v4-speculative-execution).
8282

8383

8484

@@ -93,7 +93,7 @@ Like every database system, Cassandra works best if the CPU utilization is aroun
9393
:::image type="content" source="./media/best-practice-performance/metrics.png" alt-text="Screenshot of CPU metrics." lightbox="./media/best-practice-performance/metrics.png" border="true":::
9494

9595

96-
If the CPU is permanently above 80% for most nodes the database will become overloaded manifesting in multiple client timeouts. In this scenario, we recommend taking the following actions:
96+
If the CPU is permanently above 80% for most nodes the database becomes overloaded manifesting in multiple client timeouts. In this scenario, we recommend taking the following actions:
9797

9898
* vertically scale up to a SKU with more CPU cores (especially if the cores are only 8 or less).
9999
* horizontally scale by adding more nodes (as mentioned earlier, the number of nodes should be multiple of the replication factor).
@@ -103,7 +103,7 @@ If the CPU is only high for a few nodes, but low for the others, it indicates a
103103

104104

105105
> [!NOTE]
106-
> Currently changing SKU is only supported via ARM template deployment. You can deploy/edit ARM template, and replace SKU with one of the following.
106+
> Changing SKU is supported via Azure Portal, Azure CLI and ARM template deployment. You can deploy/edit ARM template, and replace SKU with one of the following.
107107
>
108108
> - Standard_E8s_v4
109109
> - Standard_E16s_v4
@@ -120,6 +120,8 @@ If the CPU is only high for a few nodes, but low for the others, it indicates a
120120
> - Standard_L8as_v3
121121
> - Standard_L16as_v3
122122
> - Standard_L32as_v3
123+
>
124+
> Please note that currently, we do not support transitioning across SKU families. For instance, if you currently possess a `Standard_DS13_v2` and are interested in upgrading to a larger SKU such as `Standard_DS14_v2`, this option is not available. However, you can open a support ticket to request an upgrade to the higher SKU.
123125
124126

125127

@@ -149,7 +151,7 @@ If your IOPS max out what your SKU supports, you can:
149151
* [Scale up the data center(s)](create-cluster-portal.md#scale-a-datacenter) by adding more nodes.
150152

151153

152-
For more information refer to [Virtual Machine and disk performance](../virtual-machines/disks-performance.md).
154+
For more information, refer to [Virtual Machine and disk performance](../virtual-machines/disks-performance.md).
153155

154156
### Network performance
155157

@@ -159,7 +161,7 @@ In most cases network performance is sufficient. However, if you're frequently s
159161
:::image type="content" source="./media/best-practice-performance/metrics-network.png" alt-text="Screenshot of network metrics." lightbox="./media/best-practice-performance/metrics-network.png" border="true":::
160162

161163

162-
If you only see the network elevated for a small number of nodes, you might have a hot partition and need to review your data distribution and/or access patterns for a potential skew.
164+
If you only see the network elevated for a few nodes, you might have a hot partition and need to review your data distribution and/or access patterns for a potential skew.
163165

164166
* Vertically scale up to a different SKU supporting more network I/O.
165167
* Horizontally scale up the cluster by adding more nodes.
@@ -175,7 +177,7 @@ Deployments should be planned and provisioned to support the maximum number of p
175177

176178
### Disk space
177179

178-
In most cases, there's sufficient disk space as default deployments are optimized for IOPS, which leads to low utilization of the disk. Nevertheless, we advise occasionally reviewing disk space metrics. Cassandra accumulates a lot of disk and then reduces it when compaction is triggered. Hence it is important to review disk usage over longer periods to establish trends - like compaction unable to recoup space.
180+
In most cases, there's sufficient disk space as default deployments are optimized for IOPS, which leads to low utilization of the disk. Nevertheless, we advise occasionally reviewing disk space metrics. Cassandra accumulates a lot of disks and then reduces it when compaction is triggered. Hence it's important to review disk usage over longer periods to establish trends - like compaction unable to recoup space.
179181

180182
> [!NOTE]
181183
> In order to ensure available space for compaction, disk utilization should be kept to around 50%.
@@ -190,11 +192,11 @@ If you only see this behavior for a few nodes, you might have a hot partition an
190192

191193
### JVM memory
192194

193-
Our default formula assigns half the VM's memory to the JVM with an upper limit of 31 GB - which in most cases is a good balance between performance and memory. Some workloads, especially ones which have frequent cross-partition reads or range scans might be memory challenged.
195+
Our default formula assigns half the VM's memory to the JVM with an upper limit of 31 GB - which in most cases is a good balance between performance and memory. Some workloads, especially ones that have frequent cross-partition reads or range scans might be memory challenged.
194196

195197
In most cases memory gets reclaimed effectively by the Java garbage collector, but especially if the CPU is often above 80% there aren't enough CPU cycles for the garbage collector left. So any CPU performance problems should be addresses before memory problems.
196198

197-
If the CPU hovers below 70%, and the garbage collection isn't able to reclaim memory, you might need more JVM memory. This is especially the case if you're on a SKU with limited memory. In most cases, you'll need to review your queries and client settings and reduce `fetch_size` along with what is chosen in `limit` within your CQL query.
199+
If the CPU hovers below 70%, and the garbage collection isn't able to reclaim memory, you might need more JVM memory. This is especially the case if you're on a SKU with limited memory. In most cases, you need to review your queries and client settings and reduce `fetch_size` along with what is chosen in `limit` within your CQL query.
198200

199201
If you indeed need more memory, you can:
200202

@@ -204,7 +206,7 @@ If you indeed need more memory, you can:
204206

205207
### Tombstones
206208

207-
We run repairs every seven days with reaper which removes rows whose TTL has expired (called "tombstone"). Some workloads have more frequent deletes and see warnings like `Read 96 live rows and 5035 tombstone cells for query SELECT ...; token <token> (see tombstone_warn_threshold)` in the Cassandra logs, or even errors indicating that a query couldn't be fulfilled due to excessive tombstones.
209+
We run repairs every seven days with reaper, which removes rows whose TTL has expired (called "tombstone"). Some workloads have more frequent deletes and see warnings like `Read 96 live rows and 5035 tombstone cells for query SELECT ...; token <token> (see tombstone_warn_threshold)` in the Cassandra logs, or even errors indicating that a query couldn't be fulfilled due to excessive tombstones.
208210

209211
A short term mitigation if queries don't get fulfilled is to increase the `tombstone_failure_threshold` in the [Cassandra config](create-cluster-portal.md#update-cassandra-configuration) from the default 100,000 to a higher value.
210212

@@ -232,16 +234,15 @@ This indicates a problem in the data model. Here's a [stack overflow article](ht
232234

233235
## Specialized optimizations
234236
### Compression
235-
Cassandra allows the selection of an appropriate compression algorithm when a table is created (see [Compression](https://cassandra.apache.org/doc/latest/cassandra/operating/compression.html)) The default is LZ4 which is excellent
236-
for throughput and CPU but consumes more space on disk. Using Zstd (Cassandra 4.0 and up) saves about ~12% space with
237+
Cassandra allows the selection of an appropriate compression algorithm when a table is created (see [Compression](https://cassandra.apache.org/doc/latest/cassandra/operating/compression.html)) The default is LZ4, which is excellent for throughput and CPU but consumes more space on disk. Using Zstd (Cassandra 4.0 and up) saves about ~12% space with
237238
minimal CPU overhead.
238239

239240
### Optimizing memtable heap space
240241
Our default is to use 1/4 of the JVM heap for [memtable_heap_space](https://cassandra.apache.org/doc/latest/cassandra/configuration/cass_yaml_file.html#memtable_heap_space)
241242
in the cassandra.yaml. For write oriented application and/or on SKUs with small memory
242243
this can lead to frequent flushing and fragmented sstables thus requiring more compaction.
243-
In such cases increasing it to at least 4048 might be beneficial but requires careful benchmarking
244-
to make sure other operations (e.g. reads) aren't affected.
244+
In such cases increasing, it to at least 4048 might be beneficial but requires careful benchmarking
245+
to make sure other operations (for example, reads) aren't affected.
245246

246247
## Next steps
247248

articles/managed-instance-apache-cassandra/manage-resources-cli.md

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -164,8 +164,16 @@ az managed-cassandra datacenter create \
164164
> - Standard_DS14_v2
165165
> - Standard_D8s_v4
166166
> - Standard_D16s_v4
167-
> - Standard_D32s_v4
167+
> - Standard_D32s_v4
168+
> - Standard_L8s_v3
169+
> - Standard_L16s_v3
170+
> - Standard_L32s_v3
171+
> - Standard_L8as_v3
172+
> - Standard_L16as_v3
173+
> - Standard_L32as_v3
168174
>
175+
> Currently, we do not support transitioning across SKU families. For instance, if you currently possess a `Standard_DS13_v2` and are interested in upgrading to a larger SKU such as `Standard_DS14_v2`, this option is not available. However, you can open a support ticket to request an upgrade to the higher SKU.
176+
>
169177
> Note also that `--availability-zone` is set to `false`. To enable availability zones, set this to `true`. Availability zones increase the availability SLA of the service. For more details, review the full SLA details [here](https://azure.microsoft.com/support/legal/sla/managed-instance-apache-cassandra/v1_0/).
170178
171179
> [!WARNING]

0 commit comments

Comments
 (0)