Skip to content

Commit 440f78f

Browse files
authored
SPARKC-645 doc updates (#1314)
1 parent 675f6c3 commit 440f78f

File tree

5 files changed

+116
-10
lines changed

5 files changed

+116
-10
lines changed

README.md

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Spark Cassandra Connector [![Build Status](https://travis-ci.org/datastax/spark-cassandra-connector.svg)](https://travis-ci.org/datastax/spark-cassandra-connector)
1+
# Spark Cassandra Connector ![CI badge](https://github.com/datastax/spark-cassandra-connector/actions/workflows/main.yml/badge.svg?branch=b2.5)
22

33
## Quick Links
44

@@ -41,10 +41,15 @@ named "bX.Y" where X.Y is the major+minor version; for example the "b1.6" branch
4141
corresponds to the 1.6 release. The "master" branch will normally contain
4242
development for the next connector release in progress.
4343

44+
Currently the following branches are actively supported: 3.1.x ([master](https://github.com/datastax/spark-cassandra-connector/tree/master)),
45+
3.0.x ([b3.0](https://github.com/datastax/spark-cassandra-connector/tree/b3.0)) and 2.5.x
46+
([b2.5](https://github.com/datastax/spark-cassandra-connector/tree/b2.5)).
47+
4448
| Connector | Spark | Cassandra | Cassandra Java Driver | Minimum Java Version | Supported Scala Versions |
4549
| --------- | ------------- | --------- | --------------------- | -------------------- | ----------------------- |
46-
| 3.0 | 3.0 | 2.1.5*, 2.2, 3.x, 4.0 | 4.7 | 8 | 2.12 |
47-
| 2.5 | 2.4 | 2.1.5*, 2.2, 3.x, 4.0 | 4.7 | 8 | 2.11, 2.12 |
50+
| 3.1 | 3.1 | 2.1.5*, 2.2, 3.x, 4.0 | 4.10 | 8 | 2.12 |
51+
| 3.0 | 3.0 | 2.1.5*, 2.2, 3.x, 4.0 | 4.10 | 8 | 2.12 |
52+
| 2.5 | 2.4 | 2.1.5*, 2.2, 3.x, 4.0 | 4.10 | 8 | 2.11, 2.12 |
4853
| 2.4.2 | 2.4 | 2.1.5*, 2.2, 3.x | 3.0 | 8 | 2.11, 2.12 |
4954
| 2.4 | 2.4 | 2.1.5*, 2.2, 3.x | 3.0 | 8 | 2.11 |
5055
| 2.3 | 2.3 | 2.1.5*, 2.2, 3.x | 3.0 | 8 | 2.11 |
@@ -105,6 +110,7 @@ See [Building And Artifacts](doc/12_building_and_artifacts.md)
105110
- [DataFrames](doc/14_data_frames.md)
106111
- [Python](doc/15_python.md)
107112
- [Partitioner](doc/16_partitioning.md)
113+
- [Submitting applications](doc/17_submitting.md)
108114
- [Frequently Asked Questions](doc/FAQ.md)
109115
- [Configuration Parameter Reference Table](doc/reference.md)
110116
- [Tips for Developing the Spark Cassandra Connector](doc/developers.md)

doc/0_quick_start.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,14 @@ Configure a new Scala project with the Apache Spark and dependency.
1515

1616
The dependencies are easily retrieved via Maven Central
1717

18-
libraryDependencies += "com.datastax.spark" % "spark-cassandra-connector_2.11" % "2.5.0"
18+
libraryDependencies += "com.datastax.spark" % "spark-cassandra-connector_2.11" % "2.5.2"
1919

2020
The spark-packages libraries can also be used with spark-submit and spark shell, these
2121
commands will place the connector and all of its dependencies on the path of the
2222
Spark Driver and all Spark Executors.
2323

24-
$SPARK_HOME/bin/spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.0
25-
$SPARK_HOME/bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.0
24+
$SPARK_HOME/bin/spark-shell --packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.2
25+
$SPARK_HOME/bin/spark-submit --packages com.datastax.spark:spark-cassandra-connector_2.11:2.5.2
2626

2727
For the list of available versions, see:
2828
- https://spark-packages.org/package/datastax/spark-cassandra-connector

doc/10_embedded.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22

33
## The `spark-cassandra-connector-embedded` Artifact
44

5+
_Note that this artifact was removed 2.5 and newer. Spark Cassandra Connector integration
6+
tests rely on [ccm](https://github.com/riptano/ccm)._
7+
58
The `spark-cassandra-connector-embedded` artifact can be used as a test
69
or prototype dependency to spin up embedded servers for testing ideas,
710
quickly learning, integration, etc.
@@ -15,7 +18,7 @@ Pulling this dependency in allows you to:
1518
- And of course Cassandra but you currently need to spin up a local instance: [Download Cassandra latest](https://cassandra.apache.org/download/), open the tar, and run `sudo ./apache-cassandra-*/bin/cassandra`
1619

1720
## The Code
18-
See: [https://github.com/datastax/spark-cassandra-connector/tree/master/spark-cassandra-connector-embedded/src/main/scala/com/datastax/spark/connector/embedded](https://github.com/datastax/spark-cassandra-connector/tree/master/spark-cassandra-connector-embedded/src/main/scala/com/datastax/spark/connector/embedded)
21+
See: [https://github.com/datastax/spark-cassandra-connector/tree/b2.4/spark-cassandra-connector-embedded/src/main/scala/com/datastax/spark/connector/embedded](https://github.com/datastax/spark-cassandra-connector/tree/b2.4/spark-cassandra-connector-embedded/src/main/scala/com/datastax/spark/connector/embedded)
1922

2023
## How To Add The Dependency
2124

doc/17_submitting.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# Documentation
2+
3+
## Submitting Spark applications with Spark Cassandra Connector
4+
5+
Spark Cassandra Connector (SCC) may be included with a submitted Spark application in 3 ways.
6+
There are other ways too, but the following approaches are the most convenient, and most commonly used.
7+
8+
### Submitting with automatically resolved Spark Cassandra Connector jars
9+
10+
Spark may automatically resolve Spark Cassandra Connector and all of its dependencies (like Cassandra
11+
Java Driver). The resolved jars are then placed on the Spark application classpath. With this approach
12+
there is no need to manually download SCC from a repository nor tinker with fat (uber) jar assembly process.
13+
14+
`--packages` option with full SCC coordinate places SCC
15+
[main artifact](https://search.maven.org/artifact/com.datastax.spark/spark-cassandra-connector_2.12)
16+
and all of its dependencies on the app's classpath.
17+
```
18+
spark-submit --packages com.datastax.spark:spark-cassandra-connector_<scala_version>:<scc_version> ...
19+
```
20+
See Spark [documentation](https://spark.apache.org/docs/latest/submitting-applications.html#advanced-dependency-management) for details.
21+
22+
Note that the application has to be compiled against the matching version of the connector,
23+
and that the connector classes should not be assembled into the application jar.
24+
25+
Note that this approach works with `spark-shell` as well.
26+
27+
### Submitting with locally available Spark Cassandra Connector jar
28+
29+
Spark places jars provided with `--jars <url>` on the Spark application classpath. The jars are placed
30+
on the classpath without resolving any the dependencies as jar files do not contain information about the
31+
dependencies. That is why using the
32+
[main artifact](https://search.maven.org/artifact/com.datastax.spark/spark-cassandra-connector_2.12) with
33+
`--jars` is not effective - additional dependencies (like Cassandra Java Driver) are crucial for SCC
34+
functioning. Using `--jars` with the main artifact results in `NoClassDefFoundError`.
35+
36+
Spark Cassandra Connector 2.5 and newer are released with an alternative artifact -
37+
[assembly](https://search.maven.org/artifact/com.datastax.spark/spark-cassandra-connector-assembly_2.12).
38+
It's a single jar with all the needed dependency classes included. It is suitable for using with `--jars`
39+
option.
40+
41+
```
42+
spark-submit --jars com.datastax.spark:spark-cassandra-connector-assembly_<scala_version>:<scc_version> ...
43+
```
44+
45+
Some of the dependencies included in the assembly are shaded to avoid classpath conflicts in
46+
some of the cloud environments.
47+
48+
Note that the application has to be compiled against the matching version of the connector, and that the
49+
connector classes should not be assembled into the application jar.
50+
51+
Note that this approach works with `spark-shell` as well.
52+
53+
### Building and submitting a fat jar containing the connector
54+
55+
Build tools like Apache Maven™ may create a fat (uber) jar that contain all of the dependencies.
56+
This functionality may be used to create a Spark application that contains Spark Cassandra Connector main
57+
artifact and all of its dependencies. The resulting Spark application may be submitted without any
58+
extra `spark-submit` options.
59+
60+
Refer to your build tools documentation for details.
61+
62+
Note that this approach isn't well suited for `spark-shell`.

doc/developers.md

Lines changed: 38 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,45 @@ Cassandra and Spark nodes and are the core of our test coverage.
3131

3232
### Merge Path
3333

34-
b2.5 => Master
34+
b2.5 => b3.0 => master
3535

36-
New features can be considered for 2.5 as long as they do not break apis
37-
In general 3.0 should be the target for new features
36+
New features can be considered for 2.5 as long as they do not break apis.
37+
Once a feature is ready for b2.5, create a feature branch for b3.0 and merge
38+
b2.5 feature branch to b3.0 feature branch. Repeat for master.
39+
40+
Example for imaginary SPARKC-9999.
41+
42+
Let's assume that `datastax` is [email protected]:datastax/spark-cassandra-connector.git remote
43+
and origin is your personal clone.
44+
```shell
45+
$ git remote -v
46+
datastax [email protected]:datastax/spark-cassandra-connector.git (fetch)
47+
datastax [email protected]:datastax/spark-cassandra-connector.git (push)
48+
...
49+
```
50+
51+
Here is how the work should look like.
52+
53+
```shell
54+
git fetch datastax
55+
git checkout -b SPARKC-9999-b2.5 datastax/b2.5
56+
# do the work, commit
57+
git push origin SPARKC-9999-b2.5
58+
59+
# Forward merge on the next version:
60+
git checkout -b SPARKC-9999-b3.0 datastax/b3.0
61+
git merge SPARKC-9999-b2.5
62+
# Resolve conflict, if any
63+
# Push the new feature branch:
64+
git push origin SPARKC-9999-b3.0
65+
66+
# Forward merge on the next version:
67+
git checkout -b SPARKC-9999-master datastax/master
68+
git merge SPARKC-9999-b3.0
69+
# Resolve conflict, if any
70+
# Push the new feature branch:
71+
git push origin SPARKC-9999-master
72+
```
3873

3974
### Sub-Projects
4075

0 commit comments

Comments
 (0)