Skip to content
2 changes: 1 addition & 1 deletion modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@
* {cstar-data-migrator}
** xref:cdm-overview.adoc[]
** xref:cdm-steps.adoc[Migrate data]
** xref:cdm-parameters.adoc[Parameters]
//** xref:cdm-parameters.adoc[Parameters]

* {dsbulk-loader}
** https://docs.datastax.com/en/dsbulk/overview/dsbulk-about.html[Overview]
Expand Down
59 changes: 3 additions & 56 deletions modules/ROOT/pages/cassandra-data-migrator.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -32,65 +32,12 @@ include::partial$use-cdm-migrator.adoc[]

include::partial$cdm-validation-steps.adoc[]

[[cdm-partition-ranges]]
== Migrate or validate specific partition ranges

include::partial$cdm-partition-ranges.adoc[]

[[cdm-guardrail-checks]]
== Perform large-field guardrail violation checks

include::partial$cdm-guardrail-checks.adoc[]

[[cdm-next-steps]]
== Next steps

[[cdm-reference]]
== {cstar-data-migrator} references

=== Common connection parameters for Origin and Target

include::partial$common-connection-parameters.adoc[]

=== Origin schema parameters

include::partial$origin-schema-parameters.adoc[]

=== Target schema parameters

include::partial$target-schema-parameters.adoc[]

=== Auto-correction parameters

include::partial$auto-correction-parameters.adoc[]

=== Performance and operations parameters

include::partial$performance-and-operations-parameters.adoc[]

=== Transformation parameters

include::partial$transformation-parameters.adoc[]

=== Cassandra filter parameters

include::partial$cassandra-filter-parameters.adoc[]

=== Java filter parameters

include::partial$java-filter-parameters.adoc[]

=== Constant column feature parameters

include::partial$constant-column-feature-parameters.adoc[]

=== Explode map feature parameters

include::partial$explode-map-feature-parameters.adoc[]

=== Guardrail feature parameter

include::partial$guardrail-feature-parameters.adoc[]

=== TLS (SSL) connection parameters

include::partial$tls-ssl-connection-parameters.adoc[]

For advanced operations, see documentation at https://github.com/datastax/cassandra-data-migrator[the repository].
2 changes: 1 addition & 1 deletion modules/ROOT/partials/cdm-guardrail-checks.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,5 @@ Example:
--conf spark.cdm.schema.origin.keyspaceTable="<keyspacename>.<tablename>" \
--conf spark.cdm.feature.guardrail.colSizeInKB=10000 \
--master "local[*]" --driver-memory 25G --executor-memory 25G \
--class com.datastax.cdm.job.GuardrailCheck cassandra-data-migrator-4.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
--class com.datastax.cdm.job.GuardrailCheck cassandra-data-migrator-5.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
----
10 changes: 5 additions & 5 deletions modules/ROOT/partials/cdm-prerequisites.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,15 @@ Read the prerequisites below before using the Cassandra Data Migrator.

* Install or switch to Java 11.
The Spark binaries are compiled with this version of Java.
* Select a single VM to run this job and install https://archive.apache.org/dist/spark/spark-3.5.1/[Spark 3.5.1] there.
No cluster is necessary.
* Optionally, install https://maven.apache.org/download.cgi[Maven] 3.9.x if you want to build the JAR for local development.
* Select a single VM to run this job and install https://archive.apache.org/dist/spark/spark-3.5.3/[Spark 3.5.3] there.
No cluster is necessary and is recommended for most one-time migrations however cluster mode is also supported for complex migrations.
* Optionally, install https://maven.apache.org/download.cgi[Maven] `3.9.x` if you want to build the JAR for local development.

Run the following commands to install Apache Spark:

[source,bash]
----
wget https://archive.apache.org/dist/spark/spark-3.5.1/spark-3.5.1-bin-hadoop3-scala2.13.tgz
wget https://archive.apache.org/dist/spark/spark-3.5.3/spark-3.5.3-bin-hadoop3-scala2.13.tgz

tar -xvzf spark-3.5.1-bin-hadoop3-scala2.13.tgz
tar -xvzf spark-3.5.3-bin-hadoop3-scala2.13.tgz
----
4 changes: 2 additions & 2 deletions modules/ROOT/partials/cdm-validation-steps.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ Example:
./spark-submit --properties-file cdm.properties \
--conf spark.cdm.schema.origin.keyspaceTable="<keyspacename>.<tablename>" \
--master "local[*]" --driver-memory 25G --executor-memory 25G \
--class com.datastax.cdm.job.DiffData cassandra-data-migrator-x.y.z.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
--class com.datastax.cdm.job.DiffData cassandra-data-migrator-5.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
----

The {cstar-data-migrator} validation job reports differences as `ERROR` entries in the log file.
Expand Down Expand Up @@ -41,6 +41,6 @@ spark.cdm.autocorrect.mismatch false|true

[IMPORTANT]
====
The {cstar-data-migrator} validation job never deletes records from the target cluster.
The {cstar-data-migrator} validation job never deletes records from the source or target clusters.
The job only adds or updates data on the target cluster.
====
9 changes: 4 additions & 5 deletions modules/ROOT/partials/use-cdm-migrator.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,10 @@ The file can have any name.
It does not need to be `cdm.properties` or `cdm-detailed.properties`.
In both versions, the `spark-submit` job processes only the parameters that aren't commented out.
Other parameter values use defaults or are ignored.
+
See the descriptions and defaults in each file.
For more information, see the following:
* The simplified sample properties configuration, https://github.com/datastax/cassandra-data-migrator/blob/main/src/resources/cdm.properties[cdm.properties].
This file contains only those parameters that are commonly configured.
* The complete sample properties configuration, https://github.com/datastax/cassandra-data-migrator/blob/main/src/resources/cdm-detailed.properties[cdm-detailed.properties], for the full set of configurable settings.
For more information about the sample properties configuration, see the https://github.com/datastax/cassandra-data-migrator/blob/main/src/resources/cdm-detailed.properties[cdm-detailed.properties].
This is the full set of configurable settings.

. Place the properties file that you elected to use and customize where it can be accessed while running the job using `spark-submit`.

Expand All @@ -18,7 +17,7 @@ For more information, see the following:
./spark-submit --properties-file cdm.properties \
--conf spark.cdm.schema.origin.keyspaceTable="<keyspacename>.<tablename>" \
--master "local[*]" --driver-memory 25G --executor-memory 25G \
--class com.datastax.cdm.job.Migrate cassandra-data-migrator-x.y.z.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
--class com.datastax.cdm.job.Migrate cassandra-data-migrator-x.x.x.jar &> logfile_name_$(date +%Y%m%d_%H_%M).txt
----

[TIP]
Expand Down
Loading