Skip to content
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -18,58 +18,63 @@

FROM eclipse-temurin:21-jre-alpine AS build-jsa

USER root

# Get Kafka from https://archive.apache.org/dist/kafka, url passed as env var, for version 3.7.0
ENV kafka_url https://archive.apache.org/dist/kafka/3.7.0/kafka_2.13-3.7.0.tgz
# Get Kafka from https://archive.apache.org/dist/kafka, url passed as env var, for version 3.8.0
ENV kafka_url https://archive.apache.org/dist/kafka/3.8.0/kafka_2.13-3.8.0.tgz
ENV GPG_KEY CF9500821E9557AEB04E026C05EEA67F87749E61

COPY jsa_launch /etc/kafka/docker/jsa_launch

RUN set -eux ; \
apk update ; \
apk upgrade ; \
apk add --no-cache wget gcompat gpg gpg-agent procps bash; \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gcompat is a strange inclusion here -- what in this process needs glibc compatibility?

mkdir opt/kafka; \
wget -nv -O kafka.tgz "$kafka_url"; \
wget -nv -O kafka.tgz.asc "$kafka_url.asc"; \
tar xfz kafka.tgz -C /opt/kafka --strip-components 1; \
wget -nv -O KEYS https://downloads.apache.org/kafka/KEYS; \
gpg --import KEYS; \
for server in ha.pool.sks-keyservers.net $(shuf -e \
hkp://p80.pool.sks-keyservers.net:80 \
keyserver.ubuntu.com \
hkp://keyserver.ubuntu.com:80 \
pgp.mit.edu \
hkp://keys.openpgp.org) ; do \
gpg --batch --keyserver "$server" --recv-keys "$GPG_KEY" && break || : ; \
done && \

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

@KrishVora2912 KrishVora2912 Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @whalelines !

I went through the example, and made relevant changes to the Dockerfile:

  1. used GNUPGHOME
  2. Used only 2 keyservers - hkp://keys.openpgp.org and keyserver.ubuntu.com and removed the rest outdated keyservers
  3. Hardcoded the GPG_KEY inside the command itself
  4. Like flink-docker, used the practice of adding gpgconf --kill all as part of verification commands.
  5. wget uses kafka_url env variable, which downloads from a https source

Please let us know if these changes are okay, and if any more are needed.
Thank you again!

gpg --batch --verify kafka.tgz.asc kafka.tgz

# Generate jsa files using dynamic CDS for kafka server start command and kafka storage format command
RUN /etc/kafka/docker/jsa_launch
RUN mkdir opt/kafka; \
tar xfz kafka.tgz -C /opt/kafka --strip-components 1; \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's odd to mix "traditional" flags and Unix/GNU-style flags, and I'd recommend avoiding mixing them (https://manpages.debian.org/bookworm/tar/tar.1.en.html)

# Generate jsa files using dynamic CDS for kafka server start command and kafka storage format command
/etc/kafka/docker/jsa_launch
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about this script; it appears to start a server in the background, does not track the PID of that started server, waits for a file to exist, and then exits, and hopes that the file existing means the server has shut down successfully and cleanly, so there could absolutely be a race here where the file gets created, gets filled up partway and thus exists, the script exits, and the server is killed before it finishes writing the file.

Is this actually necessary? Does it dramatically improve something like performance, startup time, etc? Is it an artifact that could be shipped with the Kafka releases instead?

(This script seems to be the primary justification for the multi-stage build, and I'm sorry but I'm not seeing it. 🙈)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.



FROM eclipse-temurin:21-jre-alpine

# exposed ports
EXPOSE 9092

USER root

# Get Kafka from https://archive.apache.org/dist/kafka, url passed as env var, for version 3.7.0
ENV kafka_url https://archive.apache.org/dist/kafka/3.7.0/kafka_2.13-3.7.0.tgz
ENV build_date 2024-06-11
# Get Kafka from https://archive.apache.org/dist/kafka, url passed as env var, for version 3.8.0
ENV kafka_url https://archive.apache.org/dist/kafka/3.8.0/kafka_2.13-3.8.0.tgz
ENV build_date 2024-08-01
ENV GPG_KEY CF9500821E9557AEB04E026C05EEA67F87749E61


LABEL org.label-schema.name="kafka" \
org.label-schema.description="Apache Kafka" \
org.label-schema.build-date="${build_date}" \
org.label-schema.vcs-url="https://github.com/apache/kafka" \
LABEL org.opencontainers.image.title="kafka" \
org.opencontainers.image.description="Apache Kafka" \
org.opencontainers.image.created="${build_date}" \
org.opencontainers.image.source="https://github.com/apache/kafka" \
maintainer="Apache Kafka"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See docker-library/official-images#3540, especially docker-library/official-images#3540 (comment):

We don't actively recommend using labels. If an image maintainer wants to have labels, that is fine, but label names should adhere to the image spec: https://github.com/opencontainers/image-spec/blob/v1.0.1/annotations.md

(ie, maintainer is an unacceptable label)


RUN set -eux ; \
apk update ; \
apk upgrade ; \
apk add --no-cache wget gcompat gpg gpg-agent procps bash; \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, why gcompat? Also, I'd suggest splitting this up so that you can use --virtual and have an easier and more accurate time uninstalling "download-only" dependencies.

mkdir opt/kafka; \
wget -nv -O kafka.tgz "$kafka_url"; \
wget -nv -O kafka.tgz.asc "$kafka_url.asc"; \
tar xfz kafka.tgz -C /opt/kafka --strip-components 1; \
wget -nv -O KEYS https://downloads.apache.org/kafka/KEYS; \
gpg --import KEYS; \
for server in ha.pool.sks-keyservers.net $(shuf -e \
hkp://p80.pool.sks-keyservers.net:80 \
keyserver.ubuntu.com \
hkp://keyserver.ubuntu.com:80 \
pgp.mit.edu \
hkp://keys.openpgp.org) ; do \
gpg --batch --keyserver "$server" --recv-keys "$GPG_KEY" && break || : ; \
done && \
gpg --batch --verify kafka.tgz.asc kafka.tgz; \
mkdir opt/kafka; \
tar xfz kafka.tgz -C /opt/kafka --strip-components 1; \
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a single stage would pretty dramatically decrease the amount of duplication here.

mkdir -p /var/lib/kafka/data /etc/kafka/secrets; \
mkdir -p /etc/kafka/docker /usr/logs /mnt/shared/config; \
adduser -h /home/appuser -D --shell /bin/bash appuser; \
Expand All @@ -79,9 +84,8 @@ RUN set -eux ; \
cp /opt/kafka/config/log4j.properties /etc/kafka/docker/log4j.properties; \
cp /opt/kafka/config/tools-log4j.properties /etc/kafka/docker/tools-log4j.properties; \
cp /opt/kafka/config/kraft/server.properties /etc/kafka/docker/server.properties; \
rm kafka.tgz kafka.tgz.asc KEYS; \
apk del wget gpg gpg-agent; \
apk cache clean;
rm kafka.tgz kafka.tgz.asc; \
apk del wget gpg gpg-agent;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a non-blocking suggestion, is there some simple command that could be run here to verify the installation? One we often use is some-command --version, but I'm not sure if anything in Kafka has something like that?


COPY --from=build-jsa kafka.jsa /opt/kafka/kafka.jsa
COPY --from=build-jsa storage.jsa /opt/kafka/storage.jsa
Expand Down