|
| 1 | +--- |
| 2 | +title: Frequently asked questions on Azure Cosmos DB API for Cassandra. |
| 3 | +description: Get answers to frequently asked questions about Azure Cosmos DB API for Cassandra. |
| 4 | +author: TheovanKraay |
| 5 | +ms.service: cosmos-db |
| 6 | +ms.topic: conceptual |
| 7 | +ms.date: 04/09/2020 |
| 8 | +ms.author: thvankra |
| 9 | +--- |
| 10 | +# Frequently asked questions about the Azure Cosmos DB API for Cassandra |
| 11 | + |
| 12 | +## What are some key differences between Apache Cassandra and Cassandra API? |
| 13 | + |
| 14 | +- Apache Cassandra recommends a 100MB limit on the size of a partition key. Cassandra API allows up to 10GB per partition. |
| 15 | +- Apache Cassandra allows you to disable durable commits - i.e. skip writing to the commit log and go directly to the Memtable(s). This can lead to data loss if the node goes down prior to Memtables being flushed to SStables on disk. Cosmos DB always does durable commits so you will never have data loss. |
| 16 | +- Apache Cassandra can see diminished performance if the workload involves a lot of replaces and/or deletes. The reason for this is tombstones that the read workload needs to skip over to fetch the latest data. Cassandra API will not see diminished read performance when the workload has a lot of replaces and/or deletes. |
| 17 | +- During high replace workload scenarios, compaction needs to run to merge SSTables on disk (merge is needed because Apache Cassandra's writes are append only, thus multiple updates are stored as individual SSTable entries that need to be periodically merged). This can also lead to lowered read performance during compaction. This does not occur in Cassandra API as it does not implement compaction. |
| 18 | +- Setting a replication factor of 1 is possible with Apache Cassandra. However, it leads to low availability if the only node with the data goes down. This is not an issue with Azure Cosmos DB Cassandra API because there is always a replication factor of 4 (quorum of 3). |
| 19 | +- Adding/removing nodes in Apache Cassandra requires a lot of manual intervention, but also high CPU on the new node while existing nodes move some of their token ranges to the new node. This is the same when decommissioning an existing node. However, scaling out is done seamlessly under the hood in Azure Cosmos DB Cassandra API, without any issues observed in the service/application. |
| 20 | +- There is no need to set num_tokens on each node in the cluster as in Apache Cassandra. Nodes and token ranges are fully managed by Cosmos DB. |
| 21 | +- Azure Cosmos DB Cassandra API is fully managed so you don't require the nodetool commands such as repair, decommission etc. that are used in Apache Cassandra. |
| 22 | + |
| 23 | +## Other frequently asked questions |
| 24 | + |
| 25 | +### What is the protocol version supported by Azure Cosmos DB Cassandra API? Is there a plan to support other protocols? |
| 26 | + |
| 27 | +Azure Cosmos DB Cassandra API supports CQL version 3.x. It's CQL compatibility is based on the public [Apache Cassandra GitHub repository ](https://github.com/apache/cassandra/blob/trunk/doc/cql3/CQL.textile). If you have feedback about supporting other protocols, let us know via [user voice feedback ](https://feedback.azure.com/forums/263030-azure-cosmos-db) or send an email to [[email protected]](mailto:[email protected]). |
| 28 | + |
| 29 | +### Why is choosing a throughput for a table a requirement? |
| 30 | + |
| 31 | +Azure Cosmos DB sets default throughput for your container based on where you create the table from - portal or CQL. |
| 32 | +Azure Cosmos DB provides guarantees for performance and latency, with upper bounds on operation. This guarantee is possible when the engine can enforce governance on the tenant's operations. Setting throughput ensures that you get the guaranteed throughput and latency, because the platform reserves this capacity and guarantees operation success. |
| 33 | +You can [elastically change throughput](manage-scale-cassandra.md) to benefit from the seasonality of your application and save costs. |
| 34 | + |
| 35 | +The throughput concept is explained in the [Request Units in Azure Cosmos DB](request-units.md) article. The throughput for a table is distributed across the underlying physical partitions equally. |
| 36 | + |
| 37 | +### What is the default RU/s of table when created through CQL? What If I need to change it? |
| 38 | + |
| 39 | +Azure Cosmos DB uses request units per second (RU/s) as a currency for providing throughput. Tables created through CQL have 400 RU. You can change the RU from the portal. |
| 40 | + |
| 41 | +CQL |
| 42 | + |
| 43 | +```shell |
| 44 | +CREATE TABLE keyspaceName.tablename (user_id int PRIMARY KEY, lastname text) WITH cosmosdb_provisioned_throughput=1200 |
| 45 | +``` |
| 46 | + |
| 47 | +.NET |
| 48 | + |
| 49 | +```csharp |
| 50 | +int provisionedThroughput = 400; |
| 51 | +var simpleStatement = new SimpleStatement($"CREATE TABLE {keyspaceName}.{tableName} (user_id int PRIMARY KEY, lastname text)"); |
| 52 | +var outgoingPayload = new Dictionary<string, byte[]>(); |
| 53 | +outgoingPayload["cosmosdb_provisioned_throughput"] = Encoding.UTF8.GetBytes(provisionedThroughput.ToString()); |
| 54 | +simpleStatement.SetOutgoingPayload(outgoingPayload); |
| 55 | +``` |
| 56 | + |
| 57 | +### What happens when throughput is used up? |
| 58 | + |
| 59 | +Azure Cosmos DB provides guarantees for performance and latency, with upper bounds on operation. This guarantee is possible when the engine can enforce governance on the tenant's operations. This is possible based on setting the throughput, which ensures that you get the guaranteed throughput and latency, because platform reserves this capacity and guarantees operation success. |
| 60 | +When you go over this capacity, you get overloaded error message indicating your capacity was used up. |
| 61 | +0x1001 Overloaded: the request can't be processed because "Request Rate is large". At this juncture, it's essential to see what operations and their volume causes this issue. You can get an idea about consumed capacity going over the provisioned capacity with metrics on the portal. Then you need to ensure capacity is consumed nearly equally across all underlying partitions. If you see most of the throughput is consumed by one partition, you have skew of workload. |
| 62 | + |
| 63 | +Metrics are available that show you how throughput is used over hours, days, and per seven days, across partitions or in aggregate. For more information, see [Monitoring and debugging with metrics in Azure Cosmos DB](use-metrics.md). |
| 64 | + |
| 65 | +Diagnostic logs are explained in the [Azure Cosmos DB diagnostic logging](logging.md) article. |
| 66 | + |
| 67 | +### Does the primary key map to the partition key concept of Azure Cosmos DB? |
| 68 | + |
| 69 | +Yes, the partition key is used to place the entity in right location. In Azure Cosmos DB, it's used to find right logical partition that's stored on a physical partition. The partitioning concept is well explained in the [Partition and scale in Azure Cosmos DB](partition-data.md) article. The essential take away here is that a logical partition shouldn't go over the 10-GB limit today. |
| 70 | + |
| 71 | +### What happens when I get a quota full" notification indicating that a partition is full? |
| 72 | + |
| 73 | +Azure Cosmos DB is a SLA-based system that provides unlimited scale, with guarantees for latency, throughput, availability, and consistency. This unlimited storage is based on horizontal scale out of data using partitioning as the key concept. The partitioning concept is well explained in the [Partition and scale in Azure Cosmos DB](partition-data.md) article. |
| 74 | + |
| 75 | +The 10-GB limit on the number of entities or items per logical partition you should adhere to. To ensure that your application scales well, we recommend that you *not* create a hot partition by storing all information in one partition and querying it. This error can only come if your data is skewed: that is, you have lot of data for one partition key (more than 10 GB). You can find the distribution of data using the storage portal. Way to fix this error is to recreate the table and choose a granular primary (partition key), which allows better distribution of data. |
| 76 | + |
| 77 | +### Is it possible to use Cassandra API as key value store with millions or billions of individual partition keys? |
| 78 | + |
| 79 | +Azure Cosmos DB can store unlimited data by scaling out the storage. This is independent of the throughput. Yes you can always just use Cassandra API to store and retrieve key/values by specifying right primary/partition key. These individual keys get their own logical partition and sit atop physical partition without issues. |
| 80 | + |
| 81 | +### Is it possible to create more than one table with Apache Cassandra API of Azure Cosmos DB? |
| 82 | + |
| 83 | +Yes, it's possible to create more than one table with Apache Cassandra API. Each of those tables is treated as unit for throughput and storage. |
| 84 | + |
| 85 | +### Is it possible to create more than one table in succession? |
| 86 | + |
| 87 | +Azure Cosmos DB is resource governed system for both data and control plane activities. Containers like collections, tables are runtime entities that are provisioned for given throughput capacity. The creation of these containers in quick succession isn't expected activity and throttled. If you have tests that drop/create tables immediately, try to space them out. |
| 88 | + |
| 89 | +### What is maximum number of tables that can be created? |
| 90 | + |
| 91 | +There's no physical limit on number of tables, send an email at [[email protected]](mailto:[email protected]) if you have large number of tables (where the total steady size goes over 10 TB of data) that need to be created from usual 10s or 100s. |
| 92 | + |
| 93 | +### What is the maximum # of keyspace that we can create? |
| 94 | + |
| 95 | +There's no physical limit on number of keyspaces as they're metadata containers, send an email at [[email protected]](mailto:[email protected]) if you have large number of keyspaces for some reason. |
| 96 | + |
| 97 | +### Is it possible to bring in lot of data after starting from normal table? |
| 98 | + |
| 99 | +Yes, assuming uniformly distributed partitions, the storage capacity is automatically managed and increases as you push in more data. So you can confidently import as much data as you need without managing and provisioning nodes, and more. However, if you are anticipating a lot of immediate data growth, it makes more sense to directly [provision for the anticipated throughput](set-throughput.md) rather than starting lower and increasing it immediately. |
| 100 | + |
| 101 | +### Is it possible to supply yaml file settings to configure Apache Casssandra API of Azure Cosmos DB behavior? |
| 102 | + |
| 103 | +Apache Cassandra API of Azure Cosmos DB is a platform service. It provides protocol level compatibility for executing operations. It hides away the complexity of management, monitoring, and configuration. As a developer/user, you don't need to worry about availability, tombstones, key cache, row cache, bloom filter, and multitude of other settings. Azure Cosmos DB's Apache Cassandra API focuses on providing read and write performance that you require without the overhead of configuration and management. |
| 104 | + |
| 105 | +### Will Apache Cassandra API for Azure Cosmos DB support node addition/cluster status/node status commands? |
| 106 | + |
| 107 | +Apache Cassandra API is a platform service that makes capacity planning, responding to the elasticity demands for throughput & storage a breeze. With Azure Cosmos DB you provision throughput, you need. Then you can scale it up and down any number of times through the day without worrying about adding/deleting nodes or managing them. This implies you don't need to use the node, cluster management tool too. |
| 108 | + |
| 109 | +### What happens with respect to various config settings for keyspace creation like simple/network? |
| 110 | + |
| 111 | +Azure Cosmos DB provides global distribution out of the box for availability and low latency reasons. You don't need to setup replicas or other things. All writes are always durably quorum committed in any region where you write while providing performance guarantees. |
| 112 | + |
| 113 | +### What happens with respect to various settings for table metadata like bloom filter, caching, read repair change, gc_grace, compression memtable_flush_period, and more? |
| 114 | + |
| 115 | +Azure Cosmos DB provides performance for reads/writes and throughput without need for touching any of the configuration settings and accidentally manipulating them. |
| 116 | + |
| 117 | +### Is time-to-live (TTL) supported for Cassandra tables? |
| 118 | + |
| 119 | +Yes, TTL is supported. |
| 120 | + |
| 121 | +### Is it possible to monitor node status, replica status, gc, and OS parameters earlier with various tools? What needs to be monitored now? |
| 122 | + |
| 123 | +Azure Cosmos DB is a platform service that helps you increase productivity and not worry about managing and monitoring infrastructure. You just need to take care of throughput that's available on portal metrics to find if you're getting throttled and increase or decrease that throughput. |
| 124 | +Monitor [SLAs](monitor-accounts.md). |
| 125 | +Use [Metrics](use-metrics.md) |
| 126 | +Use [Diagnostic logs](logging.md). |
| 127 | + |
| 128 | +### Which client SDKs can work with Apache Cassandra API of Azure Cosmos DB? |
| 129 | + |
| 130 | +Apache Cassandra SDK's client drivers that use CQLv3 were used for client programs. If you have other drivers that you use or if you're facing issues, send mail to [[email protected]](mailto:[email protected]). |
| 131 | + |
| 132 | +### Is composite partition key supported? |
| 133 | + |
| 134 | +Yes, you can use regular syntax to create composite partition key. |
| 135 | + |
| 136 | +### Can I use sstableloader for data loading? |
| 137 | + |
| 138 | +No, sstableloader isn't supported. |
| 139 | + |
| 140 | +### Can an on-premises Apache Cassandra cluster be paired with Azure Cosmos DB's Cassandra API? |
| 141 | + |
| 142 | +At present Azure Cosmos DB has an optimized experience for cloud environment without overhead of operations. If you require pairing, send mail to [[email protected]](mailto:[email protected]) with a description of your scenario. We are working on offering to help pair the on-premises/different cloud Cassandra cluster to Cosomos DB's Cassandra API. |
| 143 | + |
| 144 | +### Does Cassandra API provide full backups? |
| 145 | + |
| 146 | +Azure Cosmos DB provides two free full backups taken at four hours interval today across all APIs. This ensures you don't need to set up a backup schedule and other things. |
| 147 | +If you want to modify retention and frequency, send an email to [[email protected]](mailto:[email protected]) or raise a support case. Information about backup capability is provided in the [Automatic online backup and restore with Azure Cosmos DB ](../synapse-analytics/sql-data-warehouse/backup-and-restore.md) article. |
| 148 | + |
| 149 | +### How does the Cassandra API account handle failover if a region goes down? |
| 150 | + |
| 151 | +The Azure Cosmos DB Cassandra API borrows from the globally distributed platform of Azure Cosmos DB. To ensure that your application can tolerate datacenter downtime, enable at least one more region for the account in the Azure Cosmos DB portal [Developing with multi-region Azure Cosmos DB accounts](high-availability.md). You can set the priority of the region by using the portal [Developing with multi-region Azure Cosmos DB accounts](high-availability.md). |
| 152 | + |
| 153 | +You can add as many regions as you want for the account and control where it can fail over to by providing a failover priority. To use the database, you need to provide an application there too. When you do so, your customers won't experience downtime. |
| 154 | + |
| 155 | +### Does the Apache Cassandra API index all attributes of an entity by default? |
| 156 | + |
| 157 | +No. Cassandra API supports [secondary indexes](cassandra-secondary-index.md), which behaves in a very similar way to Apache Cassandra. It does not index every attribute by default. |
| 158 | + |
| 159 | + |
| 160 | +### Can I use the new Cassandra API SDK locally with the emulator? |
| 161 | + |
| 162 | +Yes this is supported. You can find details of how to enable this [here](local-emulator.md#cassandra-api) |
| 163 | + |
| 164 | + |
| 165 | +### How can I migrate data from their Apache Cassandra clusters to Cosmos DB? |
| 166 | + |
| 167 | +You can read about migration options [here](cassandra-import-data.md). |
| 168 | + |
| 169 | + |
| 170 | +### Feature x of regular Cassandra API isn't working as today, where can the feedback be provided? |
| 171 | + |
| 172 | +Provide feedback via [user voice feedback](https://feedback.azure.com/forums/263030-azure-cosmos-db). |
| 173 | + |
| 174 | +[azure-portal]: https://portal.azure.com |
| 175 | +[query]: sql-api-sql-query.md |
| 176 | + |
| 177 | +## Next steps |
| 178 | + |
| 179 | +- Get started with [elastically scaling an Azure Cosmos DB Cassandra API Account](manage-scale-cassandra.md). |
0 commit comments