Skip to content

Commit 6d0c41b

Browse files
committed
cleanup-questions
1 parent 0458578 commit 6d0c41b

File tree

1 file changed

+40
-22
lines changed

1 file changed

+40
-22
lines changed

modules/use-cases-architectures/pages/change-data-capture/questions-and-patterns.adoc

Lines changed: 40 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -5,78 +5,96 @@ David Dieruf <[email protected]>
55
:title: CDC questions and patterns
66
:navtitle: CDC questions and patterns
77

8-
We have collected common questions and patterns our Customers are using with CDC. We hope this will help you in your journey of getting the most out of this feature. Please also refer to the "https://docs.datastax.com/en/cdc-for-cassandra/docs/2.2.2/faqs.html[CDC for Cassandra FAQs]" in the official documentation for more information.
8+
We have collected common questions and patterns from our customers that are using CDC. We hope this will help you in your journey of getting the most out of this feature. Please also refer to the https://docs.datastax.com/en/cdc-for-cassandra/docs/2.2.2/faqs.html[CDC for Cassandra FAQs] in the official documentation for more information.
99

1010
.How do I know if CDC is enabled on a table?
1111
[%collapsible]
1212
====
13-
You can check the CDC status of a table by running the following CQL query: `SELECT * FROM system_distributed.cdc_local WHERE keyspace_name = 'keyspace_name' AND table_name = 'table_name';`
13+
You can check the CDC status of a table by running the following CQL query:
1414

15-
If the CDC status is `enabled` then CDC is enabled on the table. If the CDC status is `disabled` then CDC is disabled on the table. If the CDC status is `null` then CDC is not enabled on the table.
15+
`SELECT * FROM system_distributed.cdc_local WHERE keyspace_name = 'keyspace_name' AND table_name = 'table_name';`
1616

17-
If the CDC status is `null` then you can enable CDC on the table by running the following CQL query: `ALTER TABLE keyspace_name.table_name WITH cdc = {'enabled': true};`
17+
If the CDC status is `enabled`, then CDC is enabled on the table. If the CDC status is `disabled` then CDC is disabled on the table. If the CDC status is `null` then CDC is not enabled on the table.
1818

19-
If the CDC status is `enabled` then you can disable CDC on the table by running the following CQL query: `ALTER TABLE keyspace_name.table_name WITH cdc = {'enabled': false};`
19+
If the CDC status is `null`, then you can enable CDC on the table by running the following CQL query:
2020

21-
If the CDC status is `disabled` then you can enable CDC on the table by running the following CQL query: `ALTER TABLE keyspace_name.table_name WITH cdc = {'enabled': true};`
21+
`ALTER TABLE keyspace_name.table_name WITH cdc = {'enabled': true};`
22+
23+
If the CDC status is `enabled`, then you can disable CDC on the table by running the following CQL query:
24+
25+
`ALTER TABLE keyspace_name.table_name WITH cdc = {'enabled': false};`
26+
27+
If the CDC status is `disabled`, then you can enable CDC on the table by running the following CQL query:
28+
29+
`ALTER TABLE keyspace_name.table_name WITH cdc = {'enabled': true};`
2230
====
2331

2432
.How do I know if the Cassandra agent is running?
2533
[%collapsible]
2634
====
27-
You can check the status of the Cassandra agent by running the following CQL query: `SELECT * FROM system_distributed.cdc_local WHERE keyspace_name = 'cdc' AND table_name = 'raw_cdc';` The `status` column will be `running` if the agent is running. If the `status` column is `null` then the agent is not running. If the `status` column is `stopped` then the agent is not running.
35+
You can check the status of the Cassandra agent by running the following CQL query:
36+
37+
`SELECT * FROM system_distributed.cdc_local WHERE keyspace_name = 'cdc' AND table_name = 'raw_cdc';`
38+
39+
The `status` column will be `running` if the agent is running. If the `status` column is `null` then the agent is not running. If the `status` column is `stopped` then the agent is not running.
40+
41+
If the `status` column is `stopped` then you can start the agent by running the following CQL query:
42+
43+
`ALTER TABLE cdc.raw_cdc WITH cdc = {'enabled': true};`
44+
45+
If the `status` column is `null` then you can start the agent by running the following CQL query:
2846

29-
If the `status` column is `stopped` then you can start the agent by running the following CQL query: `ALTER TABLE cdc.raw_cdc WITH cdc = {'enabled': true};`
47+
`ALTER TABLE cdc.raw_cdc WITH cdc = {'enabled': true};`
3048

31-
If the `status` column is `null` then you can start the agent by running the following CQL query: `ALTER TABLE cdc.raw_cdc WITH cdc = {'enabled': true};`
49+
If the `status` column is `running` then you can stop the agent by running the following CQL query:
3250

33-
If the `status` column is `running` then you can stop the agent by running the following CQL query: `ALTER TABLE cdc.raw_cdc WITH cdc = {'enabled': false};`
51+
`ALTER TABLE cdc.raw_cdc WITH cdc = {'enabled': false};`
3452
====
3553

3654
.What happens to unacknowledged event messages the Cassandra agent can’t deliver?
3755
[%collapsible]
3856
====
3957
Unacknowledged messages mean the CDC agent was not able to produce the event message in Pulsar. If this is the case the table row mutation will fail which the Cassandra client will then see an exception. So data will get committed to Cassandra and no event will be created.
4058

41-
Another senario might be the Pulsar Broker is too busy to process messages and a backlog has been created. In this case Pulsar's backlog policies go in to effect and event messages would be handled accordingly. The data will be committed to Cassandra but there might be some latency to the event message getting created.
59+
Another scenario might be the Pulsar broker is too busy to process messages and a backlog has been created. In this case, Pulsar's backlog policies take effect and event messages are handled accordingly. The data will be committed to Cassandra but there might be some additional latency to the event message creation.
4260

43-
The design of CDC in Cassandra is that when table changes are sync’d to the raw_cdc log it is assumed something will be draining that log. There is a max log size settings that will disabled writes to the table when reached. If a connection to the Pulsar Clsuter is needed for the log to be drained, and it’s not responsive, the log will being to fill. It is possible if the exact conditions are met for long enough, the Pulsar Cluster can effect a table’s write availability.
61+
The design of CDC in Cassandra assumed that when table changes are sync’d to the raw_cdc log, another process will be draining that log. There is a max log size setting that will disable writes to the table when the set threshold is reached. If a connection to the Pulsar cluster is needed for the log to be drained, and it’s not responsive, the log will being to fill, which can impact a table’s write availability.
4462

45-
Also read the "https://docs.datastax.com/en/cdc-for-cassandra/docs/2.2.2/install.html#scaling-up-your-configuration[Scaling up your configuration]" section in the official documentation.
63+
For more, see the https://docs.datastax.com/en/cdc-for-cassandra/docs/2.2.2/install.html#scaling-up-your-configuration[Scaling up your configuration] section in the official documentation.
4664
====
4765

4866
.Does the Cassandra Source Connector use a dead-letter topic?
4967
[%collapsible]
5068
====
51-
A dead letter topic can be used when a message can't be delivered to a consumer for a few different reasons. Possibly the message acknowledgment time expired (no consumer acknowledged receipt), or possibly a consumer negatively acknowledged the message, or a retry letter topic is in use and retries were exhausted.
69+
A dead letter topic is used when a message can't be delivered to a consumer. Maybe the message acknowledgment time expired (no consumer acknowledged receipt of the message), or a consumer negatively acknowledged the message, or a retry letter topic is in use and retries were exhausted.
5270

53-
The Cassandra Source Connector creates a consumer to receive new event messages from the CDC agent, but does not configure a dead-letter topic. It is assumed that parallel instances, Broker compute, and function worker compute will be sized to handle the workload.
71+
The Cassandra Source Connector creates a consumer to receive new event messages from the CDC agent, but does not configure a dead letter topic. It is assumed that parallel instances, broker compute, and function worker compute will be sized to handle the workload.
5472
====
5573

5674
.How do I scale CDC to handle my production loads?
5775
[%collapsible]
5876
====
59-
There are 3 areas of scalability to focus on. First are the hosts in the Cassandra cluster. The CDC agent is running on each in its own JVM. If you are administering your own Cassandra cluster, then you can tune the JVM compute properties to handle the appropriate workload. If you are using Cassandra in a serverless environment then the JVM is already set to handle significant load.
77+
There are 3 areas of scalability to focus on. First are the hosts in the Cassandra cluster. The CDC agent is running on each host in its own JVM. If you are administering your own Cassandra cluster, then you can tune the JVM compute properties to handle the appropriate workload. If you are using Cassandra in a serverless environment, then the JVM is already set to handle significant load.
6078

61-
Second area of focus is the number of Cassandra source connector instances running. This is initially set when the connector is created and can be updated through the life of the running connector. Depending on your Pulsar configuration an instance can represent a process thread on the broker or a function worker. If using kubernetes it could be a pod. Each represent different scaling strategies like increasing compute, adding more workers, and more K8s nodes.
79+
The second area of focus is the number of Cassandra Source Connector instances running. This is initially set when the Source Connector is created, and can be updated throughout the life of the running connector. Depending on your Pulsar configuration, an instance can represent a process thread on the broker or a function worker. If using Kubernetes, this could be a pod. Each represents different scaling strategies like increasing compute, adding more workers, and more K8s nodes.
6280

63-
Finally, the third area follows similar decisions to the second. It's around the Broker backlog size and throughput tolerances. There are potentially a large amount of messages being created, so you want to ensure the Pulsar cluster is sized correctly. Our Luna Streaming xref:luna-streaming:install-upgrade:production-cluster-sizing.adoc[] can help understand this better.
81+
Finally, the third area focuses on managing the broker backlog size and throughput tolerances. There are potentially a large amount of messages being created, so you must ensure the Pulsar cluster is sized correctly. Our Luna Streaming xref:luna-streaming:install-upgrade:production-cluster-sizing.adoc[] can help you understand this better.
6482
====
6583

6684
.I want to filter table data by column
6785
[%collapsible]
6886
====
69-
Transformation functions would be a great way to manipulate messages on the CDC data with no-code. Put them inline to watch the data topic and write to another topic. Call it something memorable like "filtered-data" topic.
87+
Transformation functions are a great way to manipulate messages on CDC data (with no code required!) Put them inline to watch the data topic and write to a different topic. Call the topic something memorable like "filtered-data" topic.
7088

7189
Learn more about transformation functions xref:streaming-learning:functions:index.adoc[here].
7290
====
7391

7492
.Multi-region CDC using the Cassandra sink
7593
[%collapsible]
7694
====
77-
One of the requirements of CDC is that both the Cassandra and Pulsar clusters need to be in the same cloud region (or on-premise data center). But if you are using geo-replication, you need the change data to be replicated across multiple clusters. The most manageable way to handle this is to use Pulsar’s Cassandra sink to "watch" the CDC data topic and write the change to a different Cassandra table (in another Org).
95+
One of the requirements of CDC is that both the Cassandra and Pulsar clusters need to be in the same cloud region (or on-premise data center). If you are using geo-replication, you need the change data to be replicated across multiple clusters. The most manageable way to handle this is to use Pulsar’s Cassandra sink to "watch" the CDC data topic and write the change to a different Cassandra table (in another Org).
7896

79-
The Cassandra sink has the provisions needed:
97+
The Cassandra sink requires the following provisions:
8098

8199
- Use the CDC data topic as its source of messages
82100
- Provide a secure bundle (creds) to another Cassandra cluster
@@ -88,5 +106,5 @@ The Cassandra sink has the provisions needed:
88106
.Migrating table data using CDC
89107
[%collapsible]
90108
====
91-
Migrating data between tables could solve quite a few different challenges. The basic approach is to use a Cassandra sink to watch the Cassandra source and write to another table, mapping columns appropriately. As the original table is phased out messages will decrease to none. While consumers are watching the new table's CDC data topic. Refer to the "Multi-region CDC" question above for more detail.
109+
Migrating data between tables solves quite a few different challenges. The basic approach is to use a Cassandra sink to watch the Cassandra source and write to another table while mapping columns appropriately. As the original table is phased out, the number of messages will decrease to none, while consumers are watching the new table's CDC data topic. Refer to the "Multi-region CDC" question above for more detail.
92110
====

0 commit comments

Comments
 (0)