|
| 1 | +# Documentation |
| 2 | + |
| 3 | +## Submitting Spark applications with Spark Cassandra Connector |
| 4 | + |
| 5 | +Spark Cassandra Connector (SCC) may be included with a submitted Spark application in 3 ways. |
| 6 | +There are other ways too, but the following approaches are the most convenient, and most commonly used. |
| 7 | + |
| 8 | +### Submitting with automatically resolved Spark Cassandra Connector jars |
| 9 | + |
| 10 | +Spark may automatically resolve Spark Cassandra Connector and all of its dependencies (like Cassandra |
| 11 | +Java Driver). The resolved jars are then placed on the Spark application classpath. With this approach |
| 12 | +there is no need to manually download SCC from a repository nor tinker with fat (uber) jar assembly process. |
| 13 | + |
| 14 | +`--packages` option with full SCC coordinate places SCC |
| 15 | +[main artifact](https://search.maven.org/artifact/com.datastax.spark/spark-cassandra-connector_2.12) |
| 16 | +and all of its dependencies on the app's classpath. |
| 17 | +``` |
| 18 | +spark-submit --packages com.datastax.spark:spark-cassandra-connector_<scala_version>:<scc_version> ... |
| 19 | +``` |
| 20 | +See Spark [documentation](https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management) for details. |
| 21 | + |
| 22 | +Note that the application has to be compiled against the matching version of the connector, |
| 23 | +and that the connector classes should not be assembled into the application jar. |
| 24 | + |
| 25 | +Note that this approach works with `spark-shell` as well. |
| 26 | + |
| 27 | +### Submitting with locally available Spark Cassandra Connector jar |
| 28 | + |
| 29 | +Spark places jars provided with `--jars <url>` on the Spark application classpath. The jars are placed |
| 30 | +on the classpath without resolving any the dependencies as jar files do not contain information about the |
| 31 | +dependencies. That is why using the |
| 32 | +[main artifact](https://search.maven.org/artifact/com.datastax.spark/spark-cassandra-connector_2.12) with |
| 33 | +`--jars` is not effective - additional dependencies (like Cassandra Java Driver) are crucial for SCC |
| 34 | +functioning. Using `--jars` with the main artifact results in `NoClassDefFoundError`. |
| 35 | + |
| 36 | +Spark Cassandra Connector 2.5 and newer are released with an alternative artifact - |
| 37 | +[assembly](https://search.maven.org/artifact/com.datastax.spark/spark-cassandra-connector-assembly_2.12). |
| 38 | +It's a single jar with all the needed dependency classes included. It is suitable for using with `--jars` |
| 39 | +option. |
| 40 | + |
| 41 | +``` |
| 42 | +spark-submit --jars com.datastax.spark:spark-cassandra-connector-assembly_<scala_version>:<scc_version> ... |
| 43 | +``` |
| 44 | + |
| 45 | +Some of the dependencies included in the assembly are shaded to avoid classpath conflicts in |
| 46 | +some of the cloud environments. |
| 47 | + |
| 48 | +Note that the application has to be compiled against the matching version of the connector, and that the |
| 49 | +connector classes should not be assembled into the application jar. |
| 50 | + |
| 51 | +Note that this approach works with `spark-shell` as well. |
| 52 | + |
| 53 | +### Building and submitting a fat jar containing the connector |
| 54 | + |
| 55 | +Build tools like Apache Maven™ may create a fat (uber) jar that contain all of the dependencies. |
| 56 | +This functionality may be used to create a Spark application that contains Spark Cassandra Connector main |
| 57 | +artifact and all of its dependencies. The resulting Spark application may be submitted without any |
| 58 | +extra `spark-submit` options. |
| 59 | + |
| 60 | +Refer to your build tools documentation for details. |
| 61 | + |
| 62 | +Note that this approach isn't well suited for `spark-shell`. |
0 commit comments