Skip to content

Commit 0fd4c7f

Browse files
committed
Merge remote-tracking branch 'origin/main' into fix/hive-reduce-image-size
2 parents b32979f + a317716 commit 0fd4c7f

File tree

12 files changed

+342
-123
lines changed

12 files changed

+342
-123
lines changed

CHANGELOG.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,32 @@ All notable changes to this project will be documented in this file.
44

55
## [Unreleased]
66

7+
### Added
8+
9+
- hive: check for correct permissions and ownerships in /stackable folder via
10+
`check-permissions-ownership.sh` provided in stackable-base image ([#1040]).
11+
- spark-connect-client: A new image for Spark connect tests and demos ([#1034])
12+
- nifi: check for correct permissions and ownerships in /stackable folder via
13+
`check-permissions-ownership.sh` provided in stackable-base image ([#1027]).
14+
15+
### Changed
16+
17+
- ubi-rust-builder: Bump Rust toolchain to 1.85.0, cargo-cyclonedx to 0.5.7, and cargo-auditable to 0.6.6 ([#1050]).
18+
- spark-k8s: Include spark-connect jars. Replace OpenJDK with Temurin JDK. Cleanup. ([#1034])
19+
720
### Fixed
821

922
- hive: reduce docker image size by removing the recursive chown/chmods in the final image ([#1040]).
23+
- nifi: reduce docker image size by removing the recursive chown/chmods in the final image ([#1027]).
24+
- spark-k8s: reduce docker image size by removing the recursive chown/chmods in the final image ([#1042]).
25+
- Add `--locked` flag to `cargo install` commands for reproducible builds ([#1044]).
1026

27+
[#1027]: https://github.com/stackabletech/docker-images/pull/1027
28+
[#1034]: https://github.com/stackabletech/docker-images/pull/1034
1129
[#1040]: https://github.com/stackabletech/docker-images/pull/1040
30+
[#1042]: https://github.com/stackabletech/docker-images/pull/1042
31+
[#1044]: https://github.com/stackabletech/docker-images/pull/1044
32+
[#1050]: https://github.com/stackabletech/docker-images/pull/1050
1233

1334
## [25.3.0] - 2025-03-21
1435

conf.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,7 @@
3636
zookeeper = importlib.import_module("zookeeper.versions")
3737
tools = importlib.import_module("tools.versions")
3838
statsd_exporter = importlib.import_module("statsd_exporter.versions")
39+
spark_connect_client = importlib.import_module("spark-connect-client.versions")
3940

4041
products = [
4142
{"name": "airflow", "versions": airflow.versions},
@@ -64,6 +65,7 @@
6465
{"name": "zookeeper", "versions": zookeeper.versions},
6566
{"name": "tools", "versions": tools.versions},
6667
{"name": "statsd_exporter", "versions": statsd_exporter.versions},
68+
{"name": "spark-connect-client", "versions": spark_connect_client.versions},
6769
]
6870

6971
open_shift_projects = {

nifi/Dockerfile

Lines changed: 81 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -7,59 +7,78 @@ ARG PRODUCT
77
ARG MAVEN_VERSION="3.9.8"
88
ARG STACKABLE_USER_UID
99

10-
RUN microdnf update && \
11-
microdnf clean all && \
12-
rm -rf /var/cache/yum
10+
RUN <<EOF
11+
microdnf update
12+
microdnf clean all
13+
rm -rf /var/cache/yum
14+
EOF
1315

1416
# NOTE: From NiFi 2.0.0 upwards Apache Maven 3.9.6+ is required. As of 2024-07-04 the java-devel image
1517
# ships 3.6.3. This will update maven accordingly depending on the version. The error is due to the maven-enforer-plugin.
1618
#
1719
# [ERROR] Rule 2: org.apache.maven.enforcer.rules.version.RequireMavenVersion failed with message:
1820
# [ERROR] Detected Maven Version: 3.6.3 is not in the allowed range [3.9.6,).
1921
#
20-
WORKDIR /tmp
21-
RUN if [[ "${PRODUCT}" != 1.* ]] ; then \
22-
curl "https://repo.stackable.tech/repository/packages/maven/apache-maven-${MAVEN_VERSION}-bin.tar.gz" | tar -xzC . && \
23-
ln -sf /tmp/apache-maven-${MAVEN_VERSION}/bin/mvn /usr/bin/mvn ; \
24-
fi
22+
RUN <<EOF
23+
if [[ "${PRODUCT}" != 1.* ]] ; then
24+
cd /tmp
25+
curl "https://repo.stackable.tech/repository/packages/maven/apache-maven-${MAVEN_VERSION}-bin.tar.gz" | tar -xzC .
26+
ln -sf /tmp/apache-maven-${MAVEN_VERSION}/bin/mvn /usr/bin/mvn
27+
fi
28+
EOF
2529

2630
USER ${STACKABLE_USER_UID}
2731
WORKDIR /stackable
2832

2933
COPY --chown=${STACKABLE_USER_UID}:0 nifi/stackable/patches /stackable/patches
3034

31-
RUN curl 'https://repo.stackable.tech/repository/m2/tech/stackable/nifi/stackable-bcrypt/1.0-SNAPSHOT/stackable-bcrypt-1.0-20240508.153334-1-jar-with-dependencies.jar' \
32-
# This used to be located in /bin/stackable-bcrypt.jar. We create a softlink for /bin/stackable-bcrypt.jar in the main container for backwards compatibility.
33-
-o /stackable/stackable-bcrypt.jar && \
34-
# Get the source release from nexus
35-
curl "https://repo.stackable.tech/repository/packages/nifi/nifi-${PRODUCT}-source-release.zip" -o "/stackable/nifi-${PRODUCT}-source-release.zip" && \
36-
unzip "nifi-${PRODUCT}-source-release.zip" && \
37-
# Clean up downloaded source after unzipping
38-
rm -rf "nifi-${PRODUCT}-source-release.zip" && \
39-
# The NiFi "binary" ends up in a folder named "nifi-${PRODUCT}" which should be copied to /stackable
40-
# from /stackable/nifi-${PRODUCT}-src/nifi-assembly/target/nifi-${PRODUCT}-bin/nifi-${PRODUCT} (see later steps)
41-
# Therefore we add the suffix "-src" to be able to copy the binary and remove the unzipped sources afterwards.
42-
mv nifi-${PRODUCT} nifi-${PRODUCT}-src && \
43-
# Apply patches
44-
chmod +x patches/apply_patches.sh && \
45-
patches/apply_patches.sh ${PRODUCT} && \
46-
# Build NiFi
47-
cd /stackable/nifi-${PRODUCT}-src/ && \
48-
# NOTE: Since NiFi 2.0.0 PutIceberg Processor and services were removed, so including the `include-iceberg` profile does nothing.
49-
# Additionally some modules were moved to optional build profiles, so we need to add `include-hadoop` to get `nifi-parquet-nar` for example.
50-
if [[ "${PRODUCT}" != 1.* ]] ; then \
51-
mvn --batch-mode --no-transfer-progress clean install -Dmaven.javadoc.skip=true -DskipTests --activate-profiles include-hadoop,include-hadoop-aws,include-hadoop-azure,include-hadoop-gcp ; \
52-
else \
53-
mvn --batch-mode --no-transfer-progress clean install -Dmaven.javadoc.skip=true -DskipTests --activate-profiles include-iceberg,include-hadoop-aws,include-hadoop-azure,include-hadoop-gcp ; \
54-
fi && \
55-
# Copy the binaries to the /stackable folder
56-
mv /stackable/nifi-${PRODUCT}-src/nifi-assembly/target/nifi-${PRODUCT}-bin/nifi-${PRODUCT} /stackable/nifi-${PRODUCT} && \
57-
# Copy the SBOM as well
58-
mv /stackable/nifi-${PRODUCT}-src/nifi-assembly/target/bom.json /stackable/nifi-${PRODUCT}/nifi-${PRODUCT}.cdx.json && \
59-
# Remove the unzipped sources
60-
rm -rf /stackable/nifi-${PRODUCT}-src && \
61-
# Remove generated docs in binary
62-
rm -rf /stackable/nifi-${PRODUCT}/docs
35+
RUN <<EOF
36+
# This used to be located in /bin/stackable-bcrypt.jar. We create a softlink for /bin/stackable-bcrypt.jar in the main container for backwards compatibility.
37+
curl 'https://repo.stackable.tech/repository/m2/tech/stackable/nifi/stackable-bcrypt/1.0-SNAPSHOT/stackable-bcrypt-1.0-20240508.153334-1-jar-with-dependencies.jar' \
38+
-o /stackable/stackable-bcrypt.jar
39+
40+
# Get the source release from nexus
41+
curl "https://repo.stackable.tech/repository/packages/nifi/nifi-${PRODUCT}-source-release.zip" -o "/stackable/nifi-${PRODUCT}-source-release.zip"
42+
unzip "nifi-${PRODUCT}-source-release.zip"
43+
44+
# Clean up downloaded source after unzipping
45+
rm -rf "nifi-${PRODUCT}-source-release.zip"
46+
47+
# The NiFi "binary" ends up in a folder named "nifi-${PRODUCT}" which should be copied to /stackable
48+
# from /stackable/nifi-${PRODUCT}-src/nifi-assembly/target/nifi-${PRODUCT}-bin/nifi-${PRODUCT} (see later steps)
49+
# Therefore we add the suffix "-src" to be able to copy the binary and remove the unzipped sources afterwards.
50+
mv nifi-${PRODUCT} nifi-${PRODUCT}-src
51+
52+
# Apply patches
53+
chmod +x patches/apply_patches.sh
54+
patches/apply_patches.sh ${PRODUCT}
55+
56+
# Build NiFi
57+
cd /stackable/nifi-${PRODUCT}-src/
58+
59+
# NOTE: Since NiFi 2.0.0 PutIceberg Processor and services were removed, so including the `include-iceberg` profile does nothing.
60+
# Additionally some modules were moved to optional build profiles, so we need to add `include-hadoop` to get `nifi-parquet-nar` for example.
61+
if [[ "${PRODUCT}" != 1.* ]] ; then
62+
mvn --batch-mode --no-transfer-progress clean install -Dmaven.javadoc.skip=true -DskipTests --activate-profiles include-hadoop,include-hadoop-aws,include-hadoop-azure,include-hadoop-gcp
63+
else
64+
mvn --batch-mode --no-transfer-progress clean install -Dmaven.javadoc.skip=true -DskipTests --activate-profiles include-iceberg,include-hadoop-aws,include-hadoop-azure,include-hadoop-gcp
65+
fi
66+
67+
# Copy the binaries to the /stackable folder
68+
mv /stackable/nifi-${PRODUCT}-src/nifi-assembly/target/nifi-${PRODUCT}-bin/nifi-${PRODUCT} /stackable/nifi-${PRODUCT}
69+
70+
# Copy the SBOM as well
71+
mv /stackable/nifi-${PRODUCT}-src/nifi-assembly/target/bom.json /stackable/nifi-${PRODUCT}/nifi-${PRODUCT}.cdx.json
72+
73+
# Remove the unzipped sources
74+
rm -rf /stackable/nifi-${PRODUCT}-src
75+
76+
# Remove generated docs in binary
77+
rm -rf /stackable/nifi-${PRODUCT}/docs
78+
79+
# Set correct permissions
80+
chmod -R g=u /stackable
81+
EOF
6382

6483
FROM stackable/image/java-base AS final
6584

@@ -83,8 +102,6 @@ COPY --chown=${STACKABLE_USER_UID}:0 nifi/licenses /licenses
83102
COPY --chown=${STACKABLE_USER_UID}:0 nifi/python /stackable/python
84103

85104
RUN <<EOF
86-
ln -s /stackable/nifi-${PRODUCT} /stackable/nifi
87-
88105
microdnf update
89106

90107
# python-pip: Required to install Python packages
@@ -96,24 +113,38 @@ microdnf clean all
96113
rm -rf /var/cache/yum
97114

98115
# The nipyapi is required until NiFi 2.0.x for the ReportingTaskJob
116+
# This can be removed once the 1.x.x line is removed
99117
pip install --no-cache-dir \
100118
nipyapi==0.19.1
101119

102120
# For backwards compatibility we create a softlink in /bin where the jar used to be as long as we are root
103121
# This can be removed once older versions / operators using this are no longer supported
104122
ln -s /stackable/stackable-bcrypt.jar /bin/stackable-bcrypt.jar
105123

106-
# All files and folders owned by root group to support running as arbitrary users.
107-
# This is best practice as all container users will belong to the root group (0).
108-
chown -R ${STACKABLE_USER_UID}:0 /stackable
109-
chmod -R g=u /stackable
124+
ln -s /stackable/nifi-${PRODUCT} /stackable/nifi
125+
126+
# fix missing permissions / ownership
127+
chown --no-dereference ${STACKABLE_USER_UID}:0 /stackable/nifi
128+
chmod --recursive g=u /stackable/python
129+
chmod --recursive g=u /stackable/bin
130+
chmod g=u /stackable/nifi-${PRODUCT}
131+
EOF
132+
133+
# ----------------------------------------
134+
# Checks
135+
# This section is to run final checks to ensure the created final images
136+
# adhere to several minimal requirements like:
137+
# - check file permissions and ownerships
138+
# ----------------------------------------
139+
140+
# Check that permissions and ownership in /stackable are set correctly
141+
# This will fail and stop the build if any mismatches are found.
142+
RUN <<EOF
143+
/bin/check-permissions-ownership.sh /stackable ${STACKABLE_USER_UID} 0
110144
EOF
111145

112146
# ----------------------------------------
113-
# Attention: We are changing the group of all files in /stackable directly above
114-
# If you do any file based actions (copying / creating etc.) below this comment you
115-
# absolutely need to make sure that the correct permissions are applied!
116-
# chown ${STACKABLE_USER_UID}:0
147+
# Attention: Do not perform any file based actions (copying/creating etc.) below this comment because the permissions would not be checked.
117148
# ----------------------------------------
118149

119150
USER ${STACKABLE_USER_UID}
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
#!/bin/bash
2+
#
3+
# Purpose
4+
#
5+
# Checks that permissions and ownership in the provided directory are set according to:
6+
#
7+
# chown -R ${STACKABLE_USER_UID}:0 /stackable
8+
# chmod -R g=u /stackable
9+
#
10+
# Will error out and print directories / files that do not match the required permissions or ownership.
11+
#
12+
# Usage
13+
#
14+
# ./check-permissions-ownership.sh <directory> <uid> <gid>
15+
# ./check-permissions-ownership.sh /stackable ${STACKABLE_USER_UID} 0
16+
#
17+
18+
if [[ $# -ne 3 ]]; then
19+
echo "Wrong number of parameters supplied. Usage:"
20+
echo "$0 <directory> <uid> <gid>"
21+
echo "$0 /stackable 1000 0"
22+
exit 1
23+
fi
24+
25+
DIRECTORY=$1
26+
EXPECTED_UID=$2
27+
EXPECTED_GID=$3
28+
29+
error_flag=0
30+
31+
# Check ownership
32+
while IFS= read -r -d '' file; do
33+
uid=$(stat -c "%u" "$file")
34+
gid=$(stat -c "%g" "$file")
35+
36+
if [[ "$uid" -ne "$EXPECTED_UID" || "$gid" -ne "$EXPECTED_GID" ]]; then
37+
echo "Ownership mismatch: $file (Expected: $EXPECTED_UID:$EXPECTED_GID, Found: $uid:$gid)"
38+
error_flag=1
39+
fi
40+
done < <(find "$DIRECTORY" -print0)
41+
42+
# Check permissions
43+
while IFS= read -r -d '' file; do
44+
perms=$(stat -c "%A" "$file")
45+
owner_perms="${perms:1:3}"
46+
group_perms="${perms:4:3}"
47+
48+
if [[ "$owner_perms" != "$group_perms" ]]; then
49+
echo "Permission mismatch: $file (Owner: $owner_perms, Group: $group_perms)"
50+
error_flag=1
51+
fi
52+
done < <(find "$DIRECTORY" -print0)
53+
54+
if [[ $error_flag -ne 0 ]]; then
55+
echo "Permission and Ownership checks failed for $DIRECTORY!"
56+
exit 1
57+
fi
58+
59+
echo "Permission and Ownership checks succeeded for $DIRECTORY!"

spark-connect-client/Dockerfile

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# syntax=docker/dockerfile:1.10.0@sha256:865e5dd094beca432e8c0a1d5e1c465db5f998dca4e439981029b3b81fb39ed5
2+
3+
# spark-builder: provides client libs for spark-connect
4+
FROM stackable/image/spark-k8s AS spark-builder
5+
6+
FROM stackable/image/java-base
7+
8+
ARG PRODUCT
9+
ARG PYTHON
10+
ARG RELEASE
11+
ARG STACKABLE_USER_UID
12+
13+
LABEL name="Stackable Spark Connect Examples" \
14+
maintainer="[email protected]" \
15+
vendor="Stackable GmbH" \
16+
version="${PRODUCT}" \
17+
release="${RELEASE}" \
18+
summary="Spark Connect Examples" \
19+
description="Spark Connect client libraries for Python and the JVM, including some examples."
20+
21+
22+
ENV HOME=/stackable
23+
24+
COPY spark-connect-client/stackable/spark-connect-examples /stackable/spark-connect-examples
25+
COPY --chown=${STACKABLE_USER_UID}:0 --from=spark-builder /stackable/spark/connect /stackable/spark/connect
26+
27+
RUN <<EOF
28+
microdnf update
29+
# python{version}-setuptools: needed to build the pyspark[connect] package
30+
microdnf install --nodocs \
31+
"python${PYTHON}" \
32+
"python${PYTHON}-pip" \
33+
"python${PYTHON}-setuptools"
34+
microdnf clean all
35+
rm -rf /var/cache/yum
36+
37+
ln -s /usr/bin/python${PYTHON} /usr/bin/python
38+
ln -s /usr/bin/pip-${PYTHON} /usr/bin/pip
39+
40+
# Install python libraries for the spark connect client
41+
# shellcheck disable=SC2102
42+
pip install --no-cache-dir pyspark[connect]==${PRODUCT}
43+
44+
# All files and folders owned by root group to support running as arbitrary users.
45+
# This is best practice as all container users will belong to the root group (0).
46+
chown -R ${STACKABLE_USER_UID}:0 /stackable
47+
chmod -R g=u /stackable
48+
EOF
49+
50+
# ----------------------------------------
51+
# Attention: We are changing the group of all files in /stackable directly above
52+
# If you do any file based actions (copying / creating etc.) below this comment you
53+
# absolutely need to make sure that the correct permissions are applied!
54+
# chown ${STACKABLE_USER_UID}:0
55+
# ----------------------------------------
56+
57+
USER ${STACKABLE_USER_UID}
58+
59+
WORKDIR /stackable/spark-connect-examples/python
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
import sys
2+
3+
from pyspark.sql import SparkSession
4+
5+
if __name__ == "__main__":
6+
remote: str = sys.argv[1]
7+
spark = (
8+
SparkSession.builder.appName("SimpleSparkConnectApp")
9+
.remote(remote)
10+
.getOrCreate()
11+
)
12+
13+
# See https://issues.apache.org/jira/browse/SPARK-46032
14+
spark.addArtifacts("/stackable/spark/connect/spark-connect_2.12-3.5.5.jar")
15+
16+
logFile = "/stackable/spark/README.md"
17+
logData = spark.read.text(logFile).cache()
18+
19+
numAs = logData.filter(logData.value.contains("a")).count()
20+
numBs = logData.filter(logData.value.contains("b")).count()
21+
22+
print("Lines with a: %i, lines with b: %i" % (numAs, numBs))
23+
24+
spark.stop()

spark-connect-client/versions.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
versions = [
2+
{
3+
"product": "3.5.5",
4+
"spark-k8s": "3.5.5",
5+
"java-base": "17",
6+
"python": "3.11",
7+
},
8+
]

0 commit comments

Comments
 (0)