Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 4 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ The DataStax CDC for Apache Cassandra requires:

Supported streaming platform:
* Apache Pulsar 2.8.1+
* DataStax Luna Streaming 2.8.0.1.1.40+
* IBM Elite Support for Apache Pulsar (formerly DataStax Luna Streaming) 2.8.0.1.1.40+

Supported Cassandra version:
* Cassandra 3.11+
Expand All @@ -25,9 +25,9 @@ Note: Only Cassandra 4.0 and DSE 6.8.16+ support the near realtime CDC allowing

## Documentation

All documentation is available online [here](https://docs.datastax.com/en/cdc-for-cassandra/docs/latest/index.html).
To get started, see [QUICKSTART.md](QUICKSTART.md).

See the [QUICKSTART.md](QUICKSTART.md) page.
For the complete documentation, see the [CDC for Apache Cassandra documentation](https://docs.datastax.com/en/cdc-for-cassandra/docs/latest/index.html).

## Demo

Expand All @@ -37,7 +37,6 @@ Cassandra data replicated to Elasticsearch:
* Deploy a Cassandra source and an Elasticsearch sink into Apache Pulsar
* Writes into Cassandra are replicated to Elasticsearch.


[![asciicast](https://asciinema.org/a/kiEYzHQrPWhJR19nZ7tbqrDIX.png)](https://asciinema.org/a/kiEYzHQrPWhJR19nZ7tbqrDIX?speed=2&theme=tango)

## Monitoring
Expand Down Expand Up @@ -90,4 +89,4 @@ Note: Artifacts for DSE agent are excluded by default. To build the `agent-dse4`
## Acknowledgments

Apache Cassandra, Apache Pulsar, Cassandra and Pulsar are trademarks of the Apache Software Foundation.
Elasticsearch, is a trademark of Elasticsearch BV, registered in the U.S. and in other countries.
Elasticsearch, is a trademark of Elasticsearch BV, registered in the U.S. and in other countries.
4 changes: 2 additions & 2 deletions agent-c3/src/test/resources/cassandra/cassandra.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1148,9 +1148,9 @@ transparent_data_encryption_options:
# will use them to make sure other replicas also know about the deleted rows.
# With workloads that generate a lot of tombstones, this can cause performance
# problems and even exaust the server heap.
# (http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets)
# See https://www.datastax.com/blog/cassandra-anti-patterns-queues-and-queue-datasets
# Adjust the thresholds here if you understand the dangers and want to
# scan more tombstones anyway. These thresholds may also be adjusted at runtime
# scan more tombstones anyway. These thresholds may also be adjusted at runtime
# using the StorageService mbean.
tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000
Expand Down
6 changes: 3 additions & 3 deletions agent-c4/src/test/resources/cassandra/cassandra.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1241,10 +1241,10 @@ transparent_data_encryption_options:
# tombstones seen in memory so we can return them to the coordinator, which
# will use them to make sure other replicas also know about the deleted rows.
# With workloads that generate a lot of tombstones, this can cause performance
# problems and even exaust the server heap.
# (http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets)
# problems and even exhaust the server heap.
# See https://www.datastax.com/blog/cassandra-anti-patterns-queues-and-queue-datasets
# Adjust the thresholds here if you understand the dangers and want to
# scan more tombstones anyway. These thresholds may also be adjusted at runtime
# scan more tombstones anyway. These thresholds may also be adjusted at runtime
# using the StorageService mbean.
tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000
Expand Down
4 changes: 2 additions & 2 deletions agent-dse4/src/test/resources/cassandra/cassandra.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1320,9 +1320,9 @@ nodesync:
# will use them to make sure other replicas also know about the deleted rows.
# With workloads that generate a lot of tombstones, this can cause performance
# problems and even exhaust the server heap.
# (http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets)
# See https://www.datastax.com/blog/cassandra-anti-patterns-queues-and-queue-datasets
# Adjust the thresholds here if you understand the dangers and want to
# scan more tombstones anyway. These thresholds may also be adjusted at runtime
# scan more tombstones anyway. These thresholds may also be adjusted at runtime
# using the StorageService mbean.
#
# Default tombstone_warn_threshold is 1000, may differ if emulate_dbaas_defaults is enabled
Expand Down
2 changes: 1 addition & 1 deletion backfill-cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ At a high level, the backfill CLI works as follows:
## Requirements
On the CLI machine side:
* Java 8 or Java 11 Runtime Environment (JRE)
* Optional: If running the CLI as a Pulsar Admin Extension, Datastax Luna Streaming 2.10_3.1 or later required. It is enough to download the standalone Luna Streaming Shell found on https://github.com/datastax/pulsar/releases.
* Optional: If running the CLI as a Pulsar Admin Extension, then IBM Elite Support for Apache Pulsar (formerly Datastax Luna Streaming) version 2.10_3.1 or later is required. It is enough to download the standalone Luna Streaming Shell found on https://github.com/datastax/pulsar/releases.

On the Cassandra/Pulsar side:
* DataStax CDC for Apache Cassandra 2.2.5 or later
Expand Down
2 changes: 1 addition & 1 deletion backfill-cli/src/main/resources/driver-reference.conf
Original file line number Diff line number Diff line change
Expand Up @@ -994,7 +994,7 @@ dsbulk {
#
# Only applicable for unloads, and only if this feature is available in the remote cluster, ignored otherwise.
#
# This section is deprecated except for `continuousPaging.enabled`. Please configure other continuous paging options (such as page size, maximum pages, etc.) directly in the driver configuration section. See [DSE continuous paging tuning and support guide](https://www.datastax.com/blog/2017/04/dse-continuous-paging-tuning-and-support-guide) for more information.
# This section is deprecated except for `continuousPaging.enabled`. Please configure other continuous paging options (such as page size, maximum pages, etc.) directly in the driver configuration section.
continuousPaging {

# Enable or disable continuous paging. If the target cluster does not support continuous paging or if `driver.query.consistency` is not `ONE` or `LOCAL_ONE`, traditional paging will be used regardless of this setting.
Expand Down
4 changes: 2 additions & 2 deletions backfill-cli/src/test/resources/c3/cassandra.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1148,9 +1148,9 @@ transparent_data_encryption_options:
# will use them to make sure other replicas also know about the deleted rows.
# With workloads that generate a lot of tombstones, this can cause performance
# problems and even exaust the server heap.
# (http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets)
# See https://www.datastax.com/blog/cassandra-anti-patterns-queues-and-queue-datasets
# Adjust the thresholds here if you understand the dangers and want to
# scan more tombstones anyway. These thresholds may also be adjusted at runtime
# scan more tombstones anyway. These thresholds may also be adjusted at runtime
# using the StorageService mbean.
tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000
Expand Down
4 changes: 2 additions & 2 deletions backfill-cli/src/test/resources/c4/cassandra.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1242,9 +1242,9 @@ transparent_data_encryption_options:
# will use them to make sure other replicas also know about the deleted rows.
# With workloads that generate a lot of tombstones, this can cause performance
# problems and even exaust the server heap.
# (http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets)
# See https://www.datastax.com/blog/cassandra-anti-patterns-queues-and-queue-datasets
# Adjust the thresholds here if you understand the dangers and want to
# scan more tombstones anyway. These thresholds may also be adjusted at runtime
# scan more tombstones anyway. These thresholds may also be adjusted at runtime
# using the StorageService mbean.
tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000
Expand Down
4 changes: 2 additions & 2 deletions backfill-cli/src/test/resources/dse4/cassandra.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1320,9 +1320,9 @@ nodesync:
# will use them to make sure other replicas also know about the deleted rows.
# With workloads that generate a lot of tombstones, this can cause performance
# problems and even exhaust the server heap.
# (http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets)
# See https://www.datastax.com/blog/cassandra-anti-patterns-queues-and-queue-datasets
# Adjust the thresholds here if you understand the dangers and want to
# scan more tombstones anyway. These thresholds may also be adjusted at runtime
# scan more tombstones anyway. These thresholds may also be adjusted at runtime
# using the StorageService mbean.
#
# Default tombstone_warn_threshold is 1000, may differ if emulate_dbaas_defaults is enabled
Expand Down
4 changes: 2 additions & 2 deletions connector/src/test/resources/cassandra/cassandra.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1242,9 +1242,9 @@ transparent_data_encryption_options:
# will use them to make sure other replicas also know about the deleted rows.
# With workloads that generate a lot of tombstones, this can cause performance
# problems and even exaust the server heap.
# (http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets)
# See https://www.datastax.com/blog/cassandra-anti-patterns-queues-and-queue-datasets
# Adjust the thresholds here if you understand the dangers and want to
# scan more tombstones anyway. These thresholds may also be adjusted at runtime
# scan more tombstones anyway. These thresholds may also be adjusted at runtime
# using the StorageService mbean.
tombstone_warn_threshold: 1000
tombstone_failure_threshold: 100000
Expand Down

This file was deleted.

2 changes: 0 additions & 2 deletions docs/modules/ROOT/examples/extension-start.sh

This file was deleted.

2 changes: 0 additions & 2 deletions docs/modules/ROOT/examples/java-start.sh

This file was deleted.

34 changes: 27 additions & 7 deletions docs/modules/ROOT/pages/backfill-cli.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ Developers can also use the backfill CLI to trigger change events for downstream
== Installation

The CDC backfill CLI is distributed both as a JAR file and as a Pulsar-admin extension NAR file.
The Pulsar-admin extension is packaged with the DataStax Luna Streaming distribution in the /cliextensions folder, so you don't need to build from source unless you want to make changes to the code.
The Pulsar-admin extension is packaged with the IBM Elite Support for Apache Pulsar distribution in the `/cliextensions` folder, so you don't need to build from source unless you want to make changes to the code.

Both artifacts are built with Gradle.
To build the CLI, run the following commands:
Expand Down Expand Up @@ -52,17 +52,23 @@ Java standalone::
--
[source,shell,subs="attributes+"]
----
include::example$java-start.sh[]
java -jar backfill-cli/build/libs/backfill-cli-{version}-all.jar --data-dir target/export --export-host 127.0.0.1:9042 \
--export-username cassandra --export-password cassandra --keyspace ks1 --table table1
----
--

Pulsar-admin extension::
+
--
include::partial$extension.adoc[]
The Pulsar-admin extension is packaged with the IBM Elite Support for Apache Pulsar (formerly DataStax Luna Streaming) distribution in the /cliextensions folder, so you don't need to build from source unless you want to make changes to the code.

. Move the generated NAR archive to the /cliextensions folder of your Pulsar installation (e.g. /pulsar/cliextensions).
. Modify the client.conf file of your Pulsar installation to include: `customCommandFactories=cassandra-cdc`.
. Run the following command (this assumes the https://docs.datastax.com/en/installing/docs/installTARdse.html[default installation] of DSE Cassandra):
+
----
include::example$extension-start.sh[]
./bin/pulsar-admin cassandra-cdc backfill --data-dir target/export --export-host 127.0.0.1:9042 \
--export-username cassandra --export-password cassandra --keyspace ks1 --table table1
----
--
====
Expand Down Expand Up @@ -255,64 +261,78 @@ be exported in subdirectories of the data directory specified here;
there will be one subdirectory per keyspace inside the data
directory, then one subdirectory per table inside each keyspace
directory.

|--help, -h
|Displays this help message

|--dsbulk-log-dir=PATH, -l
|The directory where DSBulk should store its logs. The default is a
'logs' subdirectory in the current working directory. This
subdirectory will be created if it does not exist. Each DSBulk
operation will create a subdirectory inside the log directory
specified here. This command is not available in the Pulsar-admin extension.

|--export-bundle=PATH
|The path to a secure connect bundle to connect to the Cassandra
cluster, if that cluster is a DataStax Astra cluster. Options
--export-host and --export-bundle are mutually exclusive.
|The path to a Secure Connect Bundle (SCB) to connect to an Astra DB database. Options --export-host and --export-bundle are mutually exclusive.

|--export-consistency=CONSISTENCY
|The consistency level to use when exporting data. The default is
LOCAL_QUORUM.

|--export-max-concurrent-files=NUM\|AUTO
|The maximum number of concurrent files to write to. Must be a positive
number or the special value AUTO. The default is AUTO.

|--export-max-concurrent-queries=NUM\|AUTO
|The maximum number of concurrent queries to execute. Must be a
positive number or the special value AUTO. The default is AUTO.

|--export-splits=NUM\|NC
|The maximum number of token range queries to generate. Use the NC
syntax to specify a multiple of the number of available cores, e.g.
8C = 8 times the number of available cores. The default is 8C. This
is an advanced setting; you should rarely need to modify the default
value.

|--export-dsbulk-option=OPT=VALUE
|An extra DSBulk option to use when exporting. Any valid DSBulk option
can be specified here, and it will be passed as-is to the DSBulk
process. DSBulk options, including driver options, must be passed as
'--long.option.name=<value>'. Short options are not supported. For more DSBulk options, see https://docs.datastax.com/en/dsbulk/docs/reference/commonOptions.html[here].

|--export-host=HOST[:PORT]
|The host name or IP and, optionally, the port of a node from the
Cassandra cluster. If the port is not specified, it will default to
9042. This option can be specified multiple times. Options
--export-host and --export-bundle are mutually exclusive.

|--export-password
|The password to use to authenticate against the origin cluster.
Options --export-username and --export-password must be provided
together, or not at all. Omit the parameter value to be prompted for
the password interactively.

|--export-protocol-version=VERSION
|The protocol version to use to connect to the Cassandra cluster, e.g.
'V4'. If not specified, the driver will negotiate the highest
version supported by both the client and the server.

|--export-username=STRING
|The username to use to authenticate against the origin cluster.
Options --export-username and --export-password must be provided
together, or not at all.

|--keyspace=<keyspace>, -k
|The name of the keyspace where the table to be exported exists

|--max-rows-per-second=PATH
|The maximum number of rows per second to read from the Cassandra
table. Setting this option to any negative value or zero will
disable it. The default is -1.

|--table=<table>, -t
|The name of the table to export data from for cdc back filling

|--version, -v
|Displays version info.
|===
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/ROOT/pages/cdc-cassandra-events.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
= CDC for Cassandra Events

The DataStax CDC for Cassandra agent pushes the mutation primary key for the CDC-enabled table into the Apache Pulsar events topic (also called the dirty topic). The messages in the data topic (or clean topic) are keyed messages where both the key and the payload are https://avro.apache.org/docs/current/spec.html#schema_record[AVRO records]: +
The {cdc_cass_first} agent pushes the mutation primary key for the CDC-enabled table into the Apache Pulsar events topic (also called the dirty topic). The messages in the data topic (or clean topic) are keyed messages where both the key and the payload are https://avro.apache.org/docs/current/spec.html#schema_record[AVRO records]: +

* The message key is an AVRO record including all the primary key columns of your Cassandra table.
* The message payload is an AVRO record including regular columns from your Cassandra table.
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/ROOT/pages/cdcExample.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ This installation requires the following. Latest version artifacts are available
** DSE - use `agent-dse4-<version>-all.jar`
** OSS C* - use `agent-c4-<version>-all.jar`
* Pulsar
** DataStax Luna Streaming - use `agent-dse4-<version>-all.jar`
** IBM Elite Support for Apache Pulsar - use `agent-dse4-<version>-all.jar`
* Pulsar C* source connector (CSC)
** Pulsar Cassandra Source NAR - use `pulsar-cassandra-source-<version>.nar`

Expand Down
Loading
Loading