Skip to content

Commit 33b88d5

Browse files
Feature/spark3 module (#196)
* Add scala 2.11 / scala 2.12 cross build * Fix foreachBatch ambiguity with Scala 2.12 * Force version of paranamer * Rename artefacts * Remove directory maven plugin as it assumes a project dir based on the artefact id * Fix archetype readme * Try matrix build script * Fix build name * Fix duplicate dependency declaration * wip * wip offset * Prevent failed by times error * add yet another module to prevent circular dependency * renamings * add compatibility for spark sql mongo * Only build one of the compatibility modules * Update github workflow * Fix pom files * Use mongo db driver compatible with spark 3 * Fix * Add module for spark 3 release package * Add deploy script * Update readme * Fix build script * Simplify deployment * Update readme, build script * Reintroduce separate build profile for spark 2 * Implement trait
1 parent a5f2442 commit 33b88d5

File tree

27 files changed

+638
-44
lines changed

27 files changed

+638
-44
lines changed

.github/workflows/build.yml

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,11 @@ jobs:
1111
fail-fast: false
1212
matrix:
1313
scala: [ 2.11, 2.12 ]
14-
name: Scala ${{ matrix.scala }}, Spark 2.4
14+
spark: [ 2, 3 ]
15+
exclude:
16+
- scala: 2.11
17+
spark: 3
18+
name: Scala ${{ matrix.scala }}, Spark ${{ matrix.spark }}
1519
steps:
1620
- uses: actions/checkout@v2
1721
- name: Set up JDK 1.8
@@ -21,10 +25,10 @@ jobs:
2125
- uses: actions/cache@v2
2226
with:
2327
path: ~/.m2/repository
24-
key: ${{ runner.os }}-${{ matrix.scala }}-${{ hashFiles('**/pom.xml') }}
28+
key: ${{ runner.os }}-${{ matrix.scala }}-${{ matrix.spark }}-${{ hashFiles('**/pom.xml') }}
2529
restore-keys: |
26-
${{ runner.os }}-${{ matrix.scala }}-
30+
${{ runner.os }}-${{ matrix.scala }}-${{ matrix.spark }}-
2731
- name: Switch scala version
28-
run: mvn scala-cross-build:change-version -Pscala-${{ matrix.scala }}
32+
run: mvn scala-cross-build:change-version -Pscala-${{ matrix.scala }},spark-${{ matrix.spark }}
2933
- name: Build and run tests
30-
run: mvn clean test -Pscala-${{ matrix.scala }},all-tests
34+
run: mvn clean test -Pscala-${{ matrix.scala }},spark-${{ matrix.spark }},all-tests

README.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -359,14 +359,19 @@ A key feature of the Workflow Manager are triggers, which define when an ingesti
359359

360360

361361
## How to build
362-
- Scala 2.12 (default)
362+
- Scala 2.12, Spark 2.4 (default)
363363
```
364364
mvn clean install
365365
```
366-
- Scala 2.11
366+
- Scala 2.12, Spark 3.0
367367
```
368-
mvn scala-cross-build:change-version -Pscala-2.11
369-
mvn clean install -Pscala-2.11
368+
mvn clean install -Pscala-2.12,spark-3
369+
```
370+
- Scala 2.11, Spark 2.4
371+
```
372+
mvn scala-cross-build:change-version -Pscala-2.11,spark-2
373+
mvn clean install -Pscala-2.11,spark-2
374+
mvn scala-cross-build:restore-version
370375
```
371376

372377
### E2E tests with Docker

compatibility-api/pom.xml

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<!--
3+
~ Copyright 2018 ABSA Group Limited
4+
~
5+
~ Licensed under the Apache License, Version 2.0 (the "License");
6+
~ you may not use this file except in compliance with the License.
7+
~ You may obtain a copy of the License at
8+
~ http://www.apache.org/licenses/LICENSE-2.0
9+
~
10+
~ Unless required by applicable law or agreed to in writing, software
11+
~ distributed under the License is distributed on an "AS IS" BASIS,
12+
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
~ See the License for the specific language governing permissions and
14+
~ limitations under the License.
15+
-->
16+
17+
<project xmlns="http://maven.apache.org/POM/4.0.0"
18+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
19+
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
20+
<parent>
21+
<artifactId>parent-conf_2.12</artifactId>
22+
<groupId>za.co.absa.hyperdrive</groupId>
23+
<version>4.1.1-SNAPSHOT</version>
24+
<relativePath>../parent-conf/pom.xml</relativePath>
25+
</parent>
26+
<modelVersion>4.0.0</modelVersion>
27+
<artifactId>compatibility-api_2.12</artifactId>
28+
<packaging>jar</packaging>
29+
</project>
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
/*
2+
* Copyright 2018 ABSA Group Limited
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
* http://www.apache.org/licenses/LICENSE-2.0
8+
*
9+
* Unless required by applicable law or agreed to in writing, software
10+
* distributed under the License is distributed on an "AS IS" BASIS,
11+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
* See the License for the specific language governing permissions and
13+
* limitations under the License.
14+
*/
15+
16+
package za.co.absa.hyperdrive.compatibility.api
17+
18+
trait CompatibleOffset {
19+
type Type
20+
}
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
/*
2+
* Copyright 2018 ABSA Group Limited
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
* http://www.apache.org/licenses/LICENSE-2.0
8+
*
9+
* Unless required by applicable law or agreed to in writing, software
10+
* distributed under the License is distributed on an "AS IS" BASIS,
11+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
* See the License for the specific language governing permissions and
13+
* limitations under the License.
14+
*/
15+
16+
package za.co.absa.hyperdrive.compatibility.api
17+
18+
import org.apache.spark.sql.SparkSession
19+
import org.apache.spark.sql.execution.streaming.MetadataLogFileIndex
20+
21+
trait CompatibleSparkUtil {
22+
def createMetadataLogFileIndex(spark: SparkSession, destination: String): MetadataLogFileIndex
23+
def hasMetadata(spark: SparkSession, destination: String): Boolean
24+
}

compatibility-provider/pom.xml

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<!--
3+
~ Copyright 2018 ABSA Group Limited
4+
~
5+
~ Licensed under the Apache License, Version 2.0 (the "License");
6+
~ you may not use this file except in compliance with the License.
7+
~ You may obtain a copy of the License at
8+
~ http://www.apache.org/licenses/LICENSE-2.0
9+
~
10+
~ Unless required by applicable law or agreed to in writing, software
11+
~ distributed under the License is distributed on an "AS IS" BASIS,
12+
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
~ See the License for the specific language governing permissions and
14+
~ limitations under the License.
15+
-->
16+
17+
<project xmlns="http://maven.apache.org/POM/4.0.0"
18+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
19+
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
20+
<parent>
21+
<artifactId>parent-conf_2.12</artifactId>
22+
<groupId>za.co.absa.hyperdrive</groupId>
23+
<version>4.1.1-SNAPSHOT</version>
24+
<relativePath>../parent-conf/pom.xml</relativePath>
25+
</parent>
26+
<modelVersion>4.0.0</modelVersion>
27+
<artifactId>compatibility-provider_2.12</artifactId>
28+
<packaging>jar</packaging>
29+
30+
<dependencies>
31+
<dependency>
32+
<groupId>za.co.absa.hyperdrive</groupId>
33+
<artifactId>compatibility-api_${scala.compat.version}</artifactId>
34+
<version>${project.version}</version>
35+
</dependency>
36+
<dependency>
37+
<groupId>za.co.absa.hyperdrive</groupId>
38+
<artifactId>compatibility_${spark.compat.version}_${scala.compat.version}</artifactId>
39+
<version>${project.version}</version>
40+
</dependency>
41+
</dependencies>
42+
43+
</project>
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
/*
2+
* Copyright 2018 ABSA Group Limited
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
* http://www.apache.org/licenses/LICENSE-2.0
8+
*
9+
* Unless required by applicable law or agreed to in writing, software
10+
* distributed under the License is distributed on an "AS IS" BASIS,
11+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
* See the License for the specific language governing permissions and
13+
* limitations under the License.
14+
*/
15+
16+
package za.co.absa.hyperdrive.compatibility.provider
17+
18+
import za.co.absa.hyperdrive.compatibility.api.CompatibleOffset
19+
import za.co.absa.hyperdrive.compatibility.impl.Offset
20+
21+
object CompatibleOffsetProvider extends CompatibleOffset {
22+
override type Type = Offset.Type
23+
}
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
/*
2+
* Copyright 2018 ABSA Group Limited
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
* http://www.apache.org/licenses/LICENSE-2.0
8+
*
9+
* Unless required by applicable law or agreed to in writing, software
10+
* distributed under the License is distributed on an "AS IS" BASIS,
11+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
* See the License for the specific language governing permissions and
13+
* limitations under the License.
14+
*/
15+
16+
package za.co.absa.hyperdrive.compatibility.provider
17+
18+
import org.apache.spark.sql.SparkSession
19+
import org.apache.spark.sql.execution.streaming.MetadataLogFileIndex
20+
import za.co.absa.hyperdrive.compatibility.api.CompatibleSparkUtil
21+
import za.co.absa.hyperdrive.compatibility.impl.SparkUtil
22+
23+
object CompatibleSparkUtilProvider extends CompatibleSparkUtil {
24+
def createMetadataLogFileIndex(spark: SparkSession, destination: String): MetadataLogFileIndex =
25+
SparkUtil.createMetadataLogFileIndex(spark, destination)
26+
27+
def hasMetadata(spark: SparkSession, destination: String): Boolean =
28+
SparkUtil.hasMetadata(spark, destination)
29+
}

compatibility_spark-2/pom.xml

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
<?xml version="1.0" encoding="UTF-8"?>
2+
<!--
3+
~ Copyright 2018 ABSA Group Limited
4+
~
5+
~ Licensed under the Apache License, Version 2.0 (the "License");
6+
~ you may not use this file except in compliance with the License.
7+
~ You may obtain a copy of the License at
8+
~ http://www.apache.org/licenses/LICENSE-2.0
9+
~
10+
~ Unless required by applicable law or agreed to in writing, software
11+
~ distributed under the License is distributed on an "AS IS" BASIS,
12+
~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
~ See the License for the specific language governing permissions and
14+
~ limitations under the License.
15+
-->
16+
17+
<project xmlns="http://maven.apache.org/POM/4.0.0"
18+
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
19+
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
20+
<parent>
21+
<artifactId>parent-conf_2.12</artifactId>
22+
<groupId>za.co.absa.hyperdrive</groupId>
23+
<version>4.1.1-SNAPSHOT</version>
24+
<relativePath>../parent-conf/pom.xml</relativePath>
25+
</parent>
26+
<modelVersion>4.0.0</modelVersion>
27+
<artifactId>compatibility_spark-2_2.12</artifactId>
28+
<packaging>jar</packaging>
29+
30+
<properties>
31+
<spark.version>${spark_2.version}</spark.version>
32+
</properties>
33+
34+
<dependencies>
35+
<dependency>
36+
<groupId>za.co.absa.hyperdrive</groupId>
37+
<artifactId>compatibility-api_${scala.compat.version}</artifactId>
38+
<version>${project.version}</version>
39+
</dependency>
40+
<dependency>
41+
<groupId>org.apache.spark</groupId>
42+
<artifactId>spark-sql_${scala.compat.version}</artifactId>
43+
<version>${spark.version}</version>
44+
</dependency>
45+
<dependency>
46+
<groupId>org.apache.spark</groupId>
47+
<artifactId>spark-core_${scala.compat.version}</artifactId>
48+
<version>${spark.version}</version>
49+
</dependency>
50+
</dependencies>
51+
</project>
Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
/*
2+
* Copyright 2018 ABSA Group Limited
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
* http://www.apache.org/licenses/LICENSE-2.0
8+
*
9+
* Unless required by applicable law or agreed to in writing, software
10+
* distributed under the License is distributed on an "AS IS" BASIS,
11+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
* See the License for the specific language governing permissions and
13+
* limitations under the License.
14+
*/
15+
16+
package za.co.absa.hyperdrive.compatibility.impl
17+
import org.apache.spark.sql.execution.streaming.{Offset => OffsetV1}
18+
import za.co.absa.hyperdrive.compatibility.api.CompatibleOffset
19+
20+
object Offset extends CompatibleOffset {
21+
type Type = OffsetV1
22+
}

0 commit comments

Comments
 (0)