Skip to content

Commit e35bb73

Browse files
authored
Fix: hive reduce image size (#1040)
* reduce image size * adapted changelog * add permissions check * consolidation
1 parent fe6b07a commit e35bb73

File tree

2 files changed

+53
-31
lines changed

2 files changed

+53
-31
lines changed

CHANGELOG.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,14 @@ All notable changes to this project will be documented in this file.
88

99
- airflow: check for correct permissions and ownerships in /stackable folder via
1010
`check-permissions-ownership.sh` provided in stackable-base image ([#1054]).
11+
- druid: check for correct permissions and ownerships in /stackable folder via
12+
`check-permissions-ownership.sh` provided in stackable-base image ([#1039]).
1113
- hadoop: check for correct permissions and ownerships in /stackable folder via
1214
`check-permissions-ownership.sh` provided in stackable-base image ([#1029]).
1315
- hbase: check for correct permissions and ownerships in /stackable folder via
1416
`check-permissions-ownership.sh` provided in stackable-base image ([#1028]).
15-
- druid: check for correct permissions and ownerships in /stackable folder via
16-
`check-permissions-ownership.sh` provided in stackable-base image ([#1039]).
17+
- hive: check for correct permissions and ownerships in /stackable folder via
18+
`check-permissions-ownership.sh` provided in stackable-base image ([#1040]).
1719
- spark-connect-client: A new image for Spark connect tests and demos ([#1034])
1820
- nifi: check for correct permissions and ownerships in /stackable folder via
1921
`check-permissions-ownership.sh` provided in stackable-base image ([#1027]).
@@ -31,9 +33,10 @@ All notable changes to this project will be documented in this file.
3133

3234
### Fixed
3335

36+
- druid: reduce docker image size by removing the recursive chown/chmods in the final image ([#1039]).
3437
- hadoop: reduce docker image size by removing the recursive chown/chmods in the final image ([#1029]).
3538
- hbase: reduce docker image size by removing the recursive chown/chmods in the final image ([#1028]).
36-
- druid: reduce docker image size by removing the recursive chown/chmods in the final image ([#1039]).
39+
- hive: reduce docker image size by removing the recursive chown/chmods in the final image ([#1040]).
3740
- nifi: reduce docker image size by removing the recursive chown/chmods in the final image ([#1027]).
3841
- opa: reduce docker image size by removing the recursive chown/chmods in the final image ([#1038]).
3942
- spark-k8s: reduce docker image size by removing the recursive chown/chmods in the final image ([#1042]).
@@ -47,6 +50,7 @@ All notable changes to this project will be documented in this file.
4750
[#1034]: https://github.com/stackabletech/docker-images/pull/1034
4851
[#1038]: https://github.com/stackabletech/docker-images/pull/1038
4952
[#1039]: https://github.com/stackabletech/docker-images/pull/1039
53+
[#1040]: https://github.com/stackabletech/docker-images/pull/1040
5054
[#1042]: https://github.com/stackabletech/docker-images/pull/1042
5155
[#1044]: https://github.com/stackabletech/docker-images/pull/1044
5256
[#1050]: https://github.com/stackabletech/docker-images/pull/1050

hive/Dockerfile

Lines changed: 46 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -13,12 +13,20 @@ FROM stackable/image/java-devel AS hive-builder
1313
ARG PRODUCT
1414
ARG HADOOP
1515
ARG JMX_EXPORTER
16+
ARG AWS_JAVA_SDK_BUNDLE
17+
ARG AZURE_STORAGE
18+
ARG AZURE_KEYVAULT_CORE
1619
ARG STACKABLE_USER_UID
1720

1821
# Setting this to anything other than "true" will keep the cache folders around (e.g. for Maven, NPM etc.)
1922
# This can be used to speed up builds when disk space is of no concern.
2023
ARG DELETE_CACHES="true"
2124

25+
# It is useful to see which version of Hadoop is used at a glance
26+
# Therefore the use of the full name here
27+
# TODO: Do we really need all of Hadoop in here?
28+
COPY --chown=${STACKABLE_USER_UID}:0 --from=hadoop-builder /stackable/hadoop /stackable/hadoop-${HADOOP}
29+
2230
COPY --chown=${STACKABLE_USER_UID}:0 hive/stackable /stackable
2331

2432
USER ${STACKABLE_USER_UID}
@@ -58,6 +66,18 @@ rm -rf /stackable/apache-hive-${PRODUCT}-src
5866
curl "https://repo.stackable.tech/repository/packages/jmx-exporter/jmx_prometheus_javaagent-${JMX_EXPORTER}.jar" -o "/stackable/jmx/jmx_prometheus_javaagent-${JMX_EXPORTER}.jar"
5967
ln -s "/stackable/jmx/jmx_prometheus_javaagent-${JMX_EXPORTER}.jar" /stackable/jmx/jmx_prometheus_javaagent.jar
6068

69+
# The next two sections for S3 and Azure use hardcoded version numbers on purpose instead of wildcards
70+
# This way the build will fail should one of the files not be available anymore in a later Hadoop version!
71+
72+
# Add S3 Support for Hive (support for s3a://)
73+
cp /stackable/hadoop-${HADOOP}/share/hadoop/tools/lib/hadoop-aws-${HADOOP}.jar /stackable/apache-hive-metastore-${PRODUCT}-bin/lib/
74+
cp /stackable/hadoop-${HADOOP}/share/hadoop/tools/lib/aws-java-sdk-bundle-${AWS_JAVA_SDK_BUNDLE}.jar /stackable/apache-hive-metastore-${PRODUCT}-bin/lib/
75+
76+
# Add Azure ABFS support (support for abfs://)
77+
cp /stackable/hadoop-${HADOOP}/share/hadoop/tools/lib/hadoop-azure-${HADOOP}.jar /stackable/apache-hive-metastore-${PRODUCT}-bin/lib/
78+
cp /stackable/hadoop-${HADOOP}/share/hadoop/tools/lib/azure-storage-${AZURE_STORAGE}.jar /stackable/apache-hive-metastore-${PRODUCT}-bin/lib/
79+
cp /stackable/hadoop-${HADOOP}/share/hadoop/tools/lib/azure-keyvault-core-${AZURE_KEYVAULT_CORE}.jar /stackable/apache-hive-metastore-${PRODUCT}-bin/lib/
80+
6181
# We're removing these to make the intermediate layer smaller
6282
# This can be necessary even though it's only a builder image because the GitHub Action Runners only have very limited space available
6383
# and we are sometimes running into errors because we're out of space.
@@ -67,6 +87,9 @@ if [ "${DELETE_CACHES}" = "true" ] ; then
6787
rm -rf /stackable/.npm/*
6888
rm -rf /stackable/.cache/*
6989
fi
90+
91+
# change groups
92+
chmod --recursive g=u /stackable
7093
EOF
7194

7295

@@ -75,9 +98,6 @@ FROM stackable/image/java-base AS final
7598
ARG PRODUCT
7699
ARG HADOOP
77100
ARG RELEASE
78-
ARG AWS_JAVA_SDK_BUNDLE
79-
ARG AZURE_STORAGE
80-
ARG AZURE_KEYVAULT_CORE
81101
ARG STACKABLE_USER_UID
82102

83103

@@ -106,47 +126,45 @@ LABEL io.k8s.display-name="${NAME}"
106126
WORKDIR /stackable
107127

108128
COPY --chown=${STACKABLE_USER_UID}:0 --from=hive-builder /stackable/apache-hive-metastore-${PRODUCT}-bin /stackable/apache-hive-metastore-${PRODUCT}-bin
129+
COPY --chown=${STACKABLE_USER_UID}:0 --from=hive-builder /stackable/hadoop-${HADOOP} /stackable/hadoop-${HADOOP}
130+
COPY --chown=${STACKABLE_USER_UID}:0 --from=hive-builder /stackable/jmx /stackable/jmx
109131

110-
# It is useful to see which version of Hadoop is used at a glance
111-
# Therefore the use of the full name here
112-
# TODO: Do we really need all of Hadoop in here?
113-
COPY --chown=${STACKABLE_USER_UID}:0 --from=hadoop-builder /stackable/hadoop /stackable/hadoop-${HADOOP}
132+
COPY hive/licenses /licenses
114133

115134
RUN <<EOF
116135
microdnf update
117136
microdnf clean all
118137
rpm -qa --qf "%{NAME}-%{VERSION}-%{RELEASE}\n" | sort > /stackable/package_manifest.txt
138+
chown ${STACKABLE_USER_UID}:0 /stackable/package_manifest.txt
139+
chmod g=u /stackable/package_manifest.txt
119140
rm -rf /var/cache/yum
120141

121142
ln -s /stackable/apache-hive-metastore-${PRODUCT}-bin /stackable/hive-metastore
143+
chown -h ${STACKABLE_USER_UID}:0 /stackable/hive-metastore
144+
chmod g=u /stackable/hive-metastore
122145
ln -s /stackable/hadoop-${HADOOP} /stackable/hadoop
146+
chown -h ${STACKABLE_USER_UID}:0 /stackable/hadoop
147+
chmod g=u /stackable/hadoop
123148

124-
# The next two sections for S3 and Azure use hardcoded version numbers on purpose instead of wildcards
125-
# This way the build will fail should one of the files not be available anymore in a later Hadoop version!
149+
# fix missing permissions
150+
chmod --recursive g=u /stackable/jmx
151+
EOF
126152

127-
# Add S3 Support for Hive (support for s3a://)
128-
cp /stackable/hadoop/share/hadoop/tools/lib/hadoop-aws-${HADOOP}.jar /stackable/hive-metastore/lib/
129-
cp /stackable/hadoop/share/hadoop/tools/lib/aws-java-sdk-bundle-${AWS_JAVA_SDK_BUNDLE}.jar /stackable/hive-metastore/lib/
153+
# ----------------------------------------
154+
# Checks
155+
# This section is to run final checks to ensure the created final images
156+
# adhere to several minimal requirements like:
157+
# - check file permissions and ownerships
158+
# ----------------------------------------
130159

131-
# Add Azure ABFS support (support for abfs://)
132-
cp /stackable/hadoop/share/hadoop/tools/lib/hadoop-azure-${HADOOP}.jar /stackable/hive-metastore/lib/
133-
cp /stackable/hadoop/share/hadoop/tools/lib/azure-storage-${AZURE_STORAGE}.jar /stackable/hive-metastore/lib/
134-
cp /stackable/hadoop/share/hadoop/tools/lib/azure-keyvault-core-${AZURE_KEYVAULT_CORE}.jar /stackable/hive-metastore/lib/
135-
136-
# All files and folders owned by root group to support running as arbitrary users.
137-
# This is best practice as all container users will belong to the root group (0).
138-
chown -R ${STACKABLE_USER_UID}:0 /stackable
139-
chmod -R g=u /stackable
160+
# Check that permissions and ownership in /stackable are set correctly
161+
# This will fail and stop the build if any mismatches are found.
162+
RUN <<EOF
163+
/bin/check-permissions-ownership.sh /stackable ${STACKABLE_USER_UID} 0
140164
EOF
141165

142-
COPY --chown=${STACKABLE_USER_UID}:0 --from=hive-builder /stackable/jmx /stackable/jmx
143-
COPY hive/licenses /licenses
144-
145166
# ----------------------------------------
146-
# Attention: We are changing the group of all files in /stackable directly above
147-
# If you do any file based actions (copying / creating etc.) below this comment you
148-
# absolutely need to make sure that the correct permissions are applied!
149-
# chown ${STACKABLE_USER_UID}:0
167+
# Attention: Do not perform any file based actions (copying/creating etc.) below this comment because the permissions would not be checked.
150168
# ----------------------------------------
151169

152170
USER ${STACKABLE_USER_UID}

0 commit comments

Comments
 (0)