Skip to content
1 change: 0 additions & 1 deletion modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@
* {cstar-data-migrator}
** xref:cdm-overview.adoc[]
** xref:cdm-steps.adoc[Migrate data]
** xref:cdm-parameters.adoc[Parameters]

* {dsbulk-loader}
** https://docs.datastax.com/en/dsbulk/overview/dsbulk-about.html[Overview]
Expand Down
55 changes: 4 additions & 51 deletions modules/ROOT/pages/cassandra-data-migrator.adoc
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
= {cstar-data-migrator}
:page-aliases: cdm-parameters.adoc

Use {cstar-data-migrator} to migrate and validate tables between origin and target Cassandra clusters, with available logging and reconciliation support.

Expand Down Expand Up @@ -42,55 +43,7 @@ include::partial$cdm-partition-ranges.adoc[]

include::partial$cdm-guardrail-checks.adoc[]

[[cdm-next-steps]]
== Next steps

[[cdm-reference]]
== {cstar-data-migrator} references

=== Common connection parameters for Origin and Target

include::partial$common-connection-parameters.adoc[]

=== Origin schema parameters

include::partial$origin-schema-parameters.adoc[]

=== Target schema parameters

include::partial$target-schema-parameters.adoc[]

=== Auto-correction parameters

include::partial$auto-correction-parameters.adoc[]

=== Performance and operations parameters

include::partial$performance-and-operations-parameters.adoc[]

=== Transformation parameters

include::partial$transformation-parameters.adoc[]

=== Cassandra filter parameters

include::partial$cassandra-filter-parameters.adoc[]

=== Java filter parameters

include::partial$java-filter-parameters.adoc[]

=== Constant column feature parameters

include::partial$constant-column-feature-parameters.adoc[]

=== Explode map feature parameters

include::partial$explode-map-feature-parameters.adoc[]

=== Guardrail feature parameter

include::partial$guardrail-feature-parameters.adoc[]

=== TLS (SSL) connection parameters

include::partial$tls-ssl-connection-parameters.adoc[]

For advanced operations, see documentation at https://github.com/datastax/cassandra-data-migrator[the repository].
70 changes: 0 additions & 70 deletions modules/ROOT/pages/cdm-parameters.adoc

This file was deleted.

2 changes: 1 addition & 1 deletion modules/ROOT/partials/cdm-guardrail-checks.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,5 @@ Example:
--conf spark.cdm.schema.origin.keyspaceTable="<keyspacename>.<tablename>" \
--conf spark.cdm.feature.guardrail.colSizeInKB=10000 \
--master "local[*]" --driver-memory 25G --executor-memory 25G \
--class com.datastax.cdm.job.GuardrailCheck cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
--class com.datastax.cdm.job.GuardrailCheck cassandra-data-migrator-x.y.z.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
----
34 changes: 4 additions & 30 deletions modules/ROOT/partials/cdm-partition-ranges.adoc
Original file line number Diff line number Diff line change
@@ -1,35 +1,9 @@
You can also use {cstar-data-migrator} to migrate or validate specific partition ranges. Use a **partition-file** with the name `./<keyspacename>.<tablename>_partitions.csv`.
Use the following format in the CSV file, in the current folder as input.
Example:

[source,csv]
----
-507900353496146534,-107285462027022883
-506781526266485690,1506166634797362039
2637884402540451982,4638499294009575633
798869613692279889,8699484505161403540
----

Each line in the CSV represents a partition-range (`min,max`).

Alternatively, you can also pass the partition-file with a command-line parameter.
Example:
You can also use {cstar-data-migrator} to xref:cdm-steps.adoc#cdm-steps[migrate] or xref:cdm-steps.adoc#cdm-validation-steps[validate] specific partition ranges by passing the below additional parameters.

[source,bash]
----
./spark-submit --properties-file cdm.properties \
--conf spark.cdm.schema.origin.keyspaceTable="<keyspacename>.<tablename>" \
--conf spark.cdm.tokenrange.partitionFile.input="/<path-to-file>/<csv-input-filename>" \
--master "local[*]" --driver-memory 25G --executor-memory 25G \
--class com.datastax.cdm.job.<Migrate|DiffData> cassandra-data-migrator-x.y.z.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
--conf spark.cdm.filter.cassandra.partition.min=<token-range-min>
--conf spark.cdm.filter.cassandra.partition.max=<token-range-max>
----

This mode is specifically useful to process a subset of partition-ranges that may have failed during a previous run.

[NOTE]
====
In the format shown above, the migration and validation jobs autogenerate a file named `./<keyspacename>.<tablename>_partitions.csv`.
The file contains any failed partition ranges.
No file is created if there were no failed partitions.
You can use the CSV as input to process any failed partition in a subsequent run.
====
This mode is specifically useful to process a subset of partition-ranges.
10 changes: 5 additions & 5 deletions modules/ROOT/partials/cdm-prerequisites.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@ Read the prerequisites below before using the Cassandra Data Migrator.

* Install or switch to Java 11.
The Spark binaries are compiled with this version of Java.
* Select a single VM to run this job and install https://archive.apache.org/dist/spark/spark-3.5.1/[Spark 3.5.1] there.
No cluster is necessary.
* Optionally, install https://maven.apache.org/download.cgi[Maven] 3.9.x if you want to build the JAR for local development.
* Select a single VM to run this job and install https://archive.apache.org/dist/spark/spark-3.5.3/[Spark 3.5.3] there.
No cluster is necessary for most one-time migrations however Spark cluster mode is also supported for complex migrations.
* Optionally, install https://maven.apache.org/download.cgi[Maven] `3.9.x` if you want to build the JAR for local development.

Run the following commands to install Apache Spark:

[source,bash]
----
wget https://archive.apache.org/dist/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3-scala2.13.tgz
wget https://archive.apache.org/dist/spark/spark-3.5.3/spark-3.5.3-bin-hadoop3-scala2.13.tgz

tar -xvzf spark-3.5.1-bin-hadoop3-scala2.13.tgz
tar -xvzf spark-3.5.3-bin-hadoop3-scala2.13.tgz
----
2 changes: 1 addition & 1 deletion modules/ROOT/partials/cdm-validation-steps.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,6 @@ spark.cdm.autocorrect.mismatch false|true

[IMPORTANT]
====
The {cstar-data-migrator} validation job never deletes records from the target cluster.
The {cstar-data-migrator} validation job never deletes records from the source or target clusters.
The job only adds or updates data on the target cluster.
====
29 changes: 0 additions & 29 deletions modules/ROOT/partials/constant-column-feature-parameters.adoc

This file was deleted.

19 changes: 0 additions & 19 deletions modules/ROOT/partials/explode-map-feature-parameters.adoc

This file was deleted.

16 changes: 0 additions & 16 deletions modules/ROOT/partials/guardrail-feature-parameters.adoc

This file was deleted.

46 changes: 0 additions & 46 deletions modules/ROOT/partials/java-filter-parameters.adoc

This file was deleted.

Loading
Loading