Skip to content

Commit 9b48107

Browse files
ramaddepallymccheah
authored andcommitted
[SPARK-25957][K8S] Make building alternate language binding docker images optional
## What changes were proposed in this pull request? bin/docker-image-tool.sh tries to build all docker images (JVM, PySpark and SparkR) by default. But not all spark distributions are built with SparkR and hence this script will fail on such distros. With this change, we make building alternate language binding docker images (PySpark and SparkR) optional. User has to specify dockerfile for those language bindings using -p and -R flags accordingly, to build the binding docker images. ## How was this patch tested? Tested following scenarios. *bin/docker-image-tool.sh -r <repo> -t <tag> build* --> Builds only JVM docker image (default behavior) *bin/docker-image-tool.sh -r <repo> -t <tag> -p kubernetes/dockerfiles/spark/bindings/python/Dockerfile build* --> Builds both JVM and PySpark docker images *bin/docker-image-tool.sh -r <repo> -t <tag> -p kubernetes/dockerfiles/spark/bindings/python/Dockerfile -R kubernetes/dockerfiles/spark/bindings/R/Dockerfile build* --> Builds JVM, PySpark and SparkR docker images. Author: Nagaram Prasad Addepally <[email protected]> Closes apache#23053 from ramaddepally/SPARK-25957.
1 parent 4aa9ccb commit 9b48107

File tree

3 files changed

+59
-28
lines changed

3 files changed

+59
-28
lines changed

bin/docker-image-tool.sh

Lines changed: 38 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,18 @@ function image_ref {
4141
echo "$image"
4242
}
4343

44+
function docker_push {
45+
local image_name="$1"
46+
if [ ! -z $(docker images -q "$(image_ref ${image_name})") ]; then
47+
docker push "$(image_ref ${image_name})"
48+
if [ $? -ne 0 ]; then
49+
error "Failed to push $image_name Docker image."
50+
fi
51+
else
52+
echo "$(image_ref ${image_name}) image not found. Skipping push for this image."
53+
fi
54+
}
55+
4456
function build {
4557
local BUILD_ARGS
4658
local IMG_PATH
@@ -92,8 +104,8 @@ function build {
92104
base_img=$(image_ref spark)
93105
)
94106
local BASEDOCKERFILE=${BASEDOCKERFILE:-"$IMG_PATH/spark/Dockerfile"}
95-
local PYDOCKERFILE=${PYDOCKERFILE:-"$IMG_PATH/spark/bindings/python/Dockerfile"}
96-
local RDOCKERFILE=${RDOCKERFILE:-"$IMG_PATH/spark/bindings/R/Dockerfile"}
107+
local PYDOCKERFILE=${PYDOCKERFILE:-false}
108+
local RDOCKERFILE=${RDOCKERFILE:-false}
97109

98110
docker build $NOCACHEARG "${BUILD_ARGS[@]}" \
99111
-t $(image_ref spark) \
@@ -102,33 +114,29 @@ function build {
102114
error "Failed to build Spark JVM Docker image, please refer to Docker build output for details."
103115
fi
104116

105-
docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \
106-
-t $(image_ref spark-py) \
107-
-f "$PYDOCKERFILE" .
117+
if [ "${PYDOCKERFILE}" != "false" ]; then
118+
docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \
119+
-t $(image_ref spark-py) \
120+
-f "$PYDOCKERFILE" .
121+
if [ $? -ne 0 ]; then
122+
error "Failed to build PySpark Docker image, please refer to Docker build output for details."
123+
fi
124+
fi
125+
126+
if [ "${RDOCKERFILE}" != "false" ]; then
127+
docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \
128+
-t $(image_ref spark-r) \
129+
-f "$RDOCKERFILE" .
108130
if [ $? -ne 0 ]; then
109-
error "Failed to build PySpark Docker image, please refer to Docker build output for details."
131+
error "Failed to build SparkR Docker image, please refer to Docker build output for details."
110132
fi
111-
docker build $NOCACHEARG "${BINDING_BUILD_ARGS[@]}" \
112-
-t $(image_ref spark-r) \
113-
-f "$RDOCKERFILE" .
114-
if [ $? -ne 0 ]; then
115-
error "Failed to build SparkR Docker image, please refer to Docker build output for details."
116133
fi
117134
}
118135

119136
function push {
120-
docker push "$(image_ref spark)"
121-
if [ $? -ne 0 ]; then
122-
error "Failed to push Spark JVM Docker image."
123-
fi
124-
docker push "$(image_ref spark-py)"
125-
if [ $? -ne 0 ]; then
126-
error "Failed to push PySpark Docker image."
127-
fi
128-
docker push "$(image_ref spark-r)"
129-
if [ $? -ne 0 ]; then
130-
error "Failed to push SparkR Docker image."
131-
fi
137+
docker_push "spark"
138+
docker_push "spark-py"
139+
docker_push "spark-r"
132140
}
133141

134142
function usage {
@@ -143,8 +151,10 @@ Commands:
143151
144152
Options:
145153
-f file Dockerfile to build for JVM based Jobs. By default builds the Dockerfile shipped with Spark.
146-
-p file Dockerfile to build for PySpark Jobs. Builds Python dependencies and ships with Spark.
147-
-R file Dockerfile to build for SparkR Jobs. Builds R dependencies and ships with Spark.
154+
-p file (Optional) Dockerfile to build for PySpark Jobs. Builds Python dependencies and ships with Spark.
155+
Skips building PySpark docker image if not specified.
156+
-R file (Optional) Dockerfile to build for SparkR Jobs. Builds R dependencies and ships with Spark.
157+
Skips building SparkR docker image if not specified.
148158
-r repo Repository address.
149159
-t tag Tag to apply to the built image, or to identify the image to be pushed.
150160
-m Use minikube's Docker daemon.
@@ -164,6 +174,9 @@ Examples:
164174
- Build image in minikube with tag "testing"
165175
$0 -m -t testing build
166176
177+
- Build PySpark docker image
178+
$0 -r docker.io/myrepo -t v2.3.0 -p kubernetes/dockerfiles/spark/bindings/python/Dockerfile build
179+
167180
- Build and push image with tag "v2.3.0" to docker.io/myrepo
168181
$0 -r docker.io/myrepo -t v2.3.0 build
169182
$0 -r docker.io/myrepo -t v2.3.0 push

docs/running-on-kubernetes.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,18 @@ $ ./bin/docker-image-tool.sh -r <repo> -t my-tag build
8888
$ ./bin/docker-image-tool.sh -r <repo> -t my-tag push
8989
```
9090

91+
By default `bin/docker-image-tool.sh` builds docker image for running JVM jobs. You need to opt-in to build additional
92+
language binding docker images.
93+
94+
Example usage is
95+
```bash
96+
# To build additional PySpark docker image
97+
$ ./bin/docker-image-tool.sh -r <repo> -t my-tag -p ./kubernetes/dockerfiles/spark/bindings/python/Dockerfile build
98+
99+
# To build additional SparkR docker image
100+
$ ./bin/docker-image-tool.sh -r <repo> -t my-tag -R ./kubernetes/dockerfiles/spark/bindings/R/Dockerfile build
101+
```
102+
91103
## Cluster Mode
92104

93105
To launch Spark Pi in cluster mode,

resource-managers/kubernetes/integration-tests/scripts/setup-integration-test-env.sh

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -72,10 +72,16 @@ then
7272
IMAGE_TAG=$(uuidgen);
7373
cd $UNPACKED_SPARK_TGZ
7474

75+
# Build PySpark image
76+
LANGUAGE_BINDING_BUILD_ARGS="-p $UNPACKED_SPARK_TGZ/kubernetes/dockerfiles/spark/bindings/python/Dockerfile"
77+
78+
# Build SparkR image
79+
LANGUAGE_BINDING_BUILD_ARGS="$LANGUAGE_BINDING_BUILD_ARGS -R $UNPACKED_SPARK_TGZ/kubernetes/dockerfiles/spark/bindings/R/Dockerfile"
80+
7581
case $DEPLOY_MODE in
7682
cloud)
7783
# Build images
78-
$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t $IMAGE_TAG build
84+
$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t $IMAGE_TAG $LANGUAGE_BINDING_BUILD_ARGS build
7985

8086
# Push images appropriately
8187
if [[ $IMAGE_REPO == gcr.io* ]] ;
@@ -89,13 +95,13 @@ then
8995
docker-for-desktop)
9096
# Only need to build as this will place it in our local Docker repo which is all
9197
# we need for Docker for Desktop to work so no need to also push
92-
$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t $IMAGE_TAG build
98+
$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -r $IMAGE_REPO -t $IMAGE_TAG $LANGUAGE_BINDING_BUILD_ARGS build
9399
;;
94100

95101
minikube)
96102
# Only need to build and if we do this with the -m option for minikube we will
97103
# build the images directly using the minikube Docker daemon so no need to push
98-
$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -m -r $IMAGE_REPO -t $IMAGE_TAG build
104+
$UNPACKED_SPARK_TGZ/bin/docker-image-tool.sh -m -r $IMAGE_REPO -t $IMAGE_TAG $LANGUAGE_BINDING_BUILD_ARGS build
99105
;;
100106
*)
101107
echo "Unrecognized deploy mode $DEPLOY_MODE" && exit 1

0 commit comments

Comments
 (0)