Skip to content

Commit 1d00d88

Browse files
authored
Merge pull request #130 from datastax/docs_update
Docs update
2 parents 264e5ed + 1104ca9 commit 1d00d88

File tree

4 files changed

+31
-5
lines changed

4 files changed

+31
-5
lines changed

.github/CODEOWNERS

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
* @datastax/cdm-core

README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,14 @@ When running in above mode the tool assumes a `partitions.csv` file to be presen
9797
```
9898
This mode is specifically useful to processes a subset of partition-ranges that may have failed during a previous run.
9999

100+
> **Note:**
101+
> Here is a quick tip to prepare `partitions.csv` from the log file,
102+
103+
```
104+
grep "ERROR CopyJobSession: Error with PartitionRange" /path/to/logfile_name.txt | awk '{print $13","$15}' > partitions.csv
105+
```
106+
107+
100108
# Perform large-field Guardrail violation checks
101109
- The tool can be used to identify large fields from a table that may break you cluster guardrails (e.g. AstraDB has a 10MB limit for a single large field) `--class datastax.astra.migrate.Guardrail` as shown below
102110
```

src/resources/cdm.properties

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -74,6 +74,23 @@ spark.batchSize 10
7474
#spark.counterTable.cql
7575
#spark.counterTable.cql.index 0
7676

77+
############################### EXAMPLE MAPPING USING A DEMO counter column TABLE ###########################
78+
# CREATE TABLE cycling.cyclist_count (
79+
# pk1 uuid,
80+
# pk2 date,
81+
# cc1 boolean,
82+
# c1 counter,
83+
# PRIMARY KEY((pk1,pk2),cc1)
84+
# );
85+
# then, our counter table mapping would look like below,
86+
# spark.counterTable true
87+
# spark.counterTable.cql UPDATE cycling.cyclist_count SET c1 += ? WHERE pk1 = ? AND pk2 = ? AND cc1 = ?
88+
# spark.counterTable.cql.index 3,0,1,2
89+
#
90+
# Remember the above count index order is based on the below column mapping ordering,
91+
# spark.query.origin pk1,pk2,cc1,c
92+
#############################################################################################################
93+
7794
# ENABLE ONLY IF YOU WANT TO FILTER BASED ON WRITE-TIME (values must be in microseconds)
7895
#spark.origin.writeTimeStampFilter false
7996
#spark.origin.minWriteTimeStampFilter 0

src/resources/runCommands.txt

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,16 @@ curl -OL https://downloads.datastax.com/enterprise/cqlsh-astra.tar.gz
44
wget https://archive.apache.org/dist/spark/spark-3.3.1/spark-3.3.1-bin-hadoop3.tgz
55

66
// Migrate
7-
spark-submit --properties-file /<path>/cdm.properties --conf spark.origin.keyspaceTable="keyspace.table" --master "local[*]" --class datastax.astra.migrate.Migrate /<path>/cassandra-data-migrator-3.4.*.jar
8-
spark-submit --properties-file /<path>/cdm.properties --conf spark.origin.keyspaceTable="keyspace.table" --master "local[*]" --driver-memory 25G --executor-memory 25G --class datastax.astra.migrate.Migrate /<path>/cassandra-data-migrator-3.4.*.jar &> table_out.log
7+
spark-submit --properties-file /<path>/cdm.properties --conf spark.origin.keyspaceTable="keyspace.table" --master "local[*]" --class datastax.astra.migrate.Migrate /<path>/cassandra-data-migrator-3.4.*.jar &> log_name_$(date +%Y%m%d_%H_%M).log
8+
spark-submit --properties-file /<path>/cdm.properties --conf spark.origin.keyspaceTable="keyspace.table" --master "local[*]" --driver-memory 25G --executor-memory 25G --class datastax.astra.migrate.Migrate /<path>/cassandra-data-migrator-3.4.*.jar &> table_out_$(date +%Y%m%d_%H_%M).log
99
// If target keyspace oand/or table name is different than origin, then add --conf spark.target.keyspaceTable="keyspace2.table2"
1010
// Add option --verbose for verbose output
1111

1212
// Random Partitioner Run Command
13-
spark-submit --properties-file /<path>/cdm.properties --conf spark.origin.keyspaceTable="keyspace.table" --master "local[*]" --conf spark.origin.minPartition=-1 --conf spark.origin.maxPartition=170141183460469231731687303715884105728 --class datastax.astra.migrate.Migrate /<path>/cassandra-data-migrator-3.4.*.jar
13+
spark-submit --properties-file /<path>/cdm.properties --conf spark.origin.keyspaceTable="keyspace.table" --master "local[*]" --conf spark.origin.minPartition=-1 --conf spark.origin.maxPartition=170141183460469231731687303715884105728 --class datastax.astra.migrate.Migrate /<path>/cassandra-data-migrator-3.4.*.jar &> log_name_$(date +%Y%m%d_%H_%M).log
1414

1515
// Validate
16-
spark-submit --properties-file /<path>/cdm.properties --conf spark.origin.keyspaceTable="keyspace.table" --master "local[*]" --driver-memory 25G --executor-memory 25G --class datastax.astra.migrate.DiffData /<path>/cassandra-data-migrator-3.4.*.jar &> table_out.log
16+
spark-submit --properties-file /<path>/cdm.properties --conf spark.origin.keyspaceTable="keyspace.table" --master "local[*]" --driver-memory 25G --executor-memory 25G --class datastax.astra.migrate.DiffData /<path>/cassandra-data-migrator-3.4.*.jar &> table_out_$(date +%Y%m%d_%H_%M).log
1717

1818
// Guardrail check (identify large fields)
19-
spark-submit --properties-file /<path>/cdmGuardrail.properties --conf spark.origin.keyspaceTable="keyspace.table" --master "local[*]" --driver-memory 25G --executor-memory 25G --class datastax.astra.migrate.Guardrail /<path>/cassandra-data-migrator-3.4.*.jar &> table_out.log
19+
spark-submit --properties-file /<path>/cdmGuardrail.properties --conf spark.origin.keyspaceTable="keyspace.table" --master "local[*]" --driver-memory 25G --executor-memory 25G --class datastax.astra.migrate.Guardrail /<path>/cassandra-data-migrator-3.4.*.jar &> table_out_$(date +%Y%m%d_%H_%M).log

0 commit comments

Comments
 (0)