Skip to content

Commit 7a4421b

Browse files
authored
Add the example about how to use Flex Template with RunInference (#18)
* add the flex template * updated the dockerfile * updated the parameters * update the dockerfile * added PYTHONPATH * remove the pubsub option * remove time
1 parent b64d0ef commit 7a4421b

File tree

6 files changed

+199
-6
lines changed

6 files changed

+199
-6
lines changed

Makefile

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -207,3 +207,49 @@ check-torch-gpu: ## Check whether PyTorch works on GPU using VM with Custom Cont
207207

208208
check-pipeline: ## Check whether the Beam pipeline can run on GPU using VM with Custom Container and DirectRunner
209209
@./scripts/check-pipeline.sh
210+
211+
create-flex-template: ## Create a Flex Template file using a Flex Template custom container
212+
gcloud dataflow flex-template build $(TEMPLATE_FILE_GCS_PATH) \
213+
--image $(CUSTOM_CONTAINER_IMAGE) \
214+
--metadata-file ./flex/metadata.json \
215+
--sdk-language "PYTHON" \
216+
--staging-location $(STAGING_LOCATION) \
217+
--temp-location $(TEMP_LOCATION) \
218+
--project $(PROJECT_ID) \
219+
--worker-region $(REGION) \
220+
--worker-machine-type $(MACHINE_TYPE)
221+
222+
run-df-gpu-flex: ## Run a Dataflow job using the Flex Template
223+
$(eval JOB_NAME := beam-ml-starter-gpu-flex-$(shell date +%s)-$(shell echo $$$$))
224+
ifeq ($(MODEL_ENV), "TORCH")
225+
gcloud dataflow flex-template run $(JOB_NAME) \
226+
--template-file-gcs-location $(TEMPLATE_FILE_GCS_PATH) \
227+
--project $(PROJECT_ID) \
228+
--region $(REGION) \
229+
--worker-machine-type $(MACHINE_TYPE) \
230+
--additional-experiments disable_worker_container_image_prepull \
231+
--parameters number_of_worker_harness_threads=1 \
232+
--parameters sdk_location=container \
233+
--parameters sdk_container_image=$(CUSTOM_CONTAINER_IMAGE) \
234+
--parameters dataflow_service_option=$(SERVICE_OPTIONS) \
235+
--parameters input=$(INPUT_DATA) \
236+
--parameters output=$(OUTPUT_DATA) \
237+
--parameters device=GPU \
238+
--parameters model_state_dict_path=$(MODEL_STATE_DICT_PATH) \
239+
--parameters model_name=$(MODEL_NAME)
240+
else
241+
gcloud dataflow flex-template run $(JOB_NAME) \
242+
--template-file-gcs-location $(TEMPLATE_FILE_GCS_PATH) \
243+
--project $(PROJECT_ID) \
244+
--region $(REGION) \
245+
--worker-machine-type $(MACHINE_TYPE) \
246+
--additional-experiments disable_worker_container_image_prepull \
247+
--parameters number_of_worker_harness_threads=1 \
248+
--parameters sdk_location=container \
249+
--parameters sdk_container_image=$(CUSTOM_CONTAINER_IMAGE) \
250+
--parameters dataflow_service_option=$(SERVICE_OPTIONS) \
251+
--parameters input=$(INPUT_DATA) \
252+
--parameters output=$(OUTPUT_DATA) \
253+
--parameters device=GPU \
254+
--parameters tf_model_uri=$(TF_MODEL_URI)
255+
endif

README.md

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ newgrp docker
5858
### Step 1: Clone this repo and edit .env
5959

6060
```bash
61-
git clone https://github.com/liferoad/df-ml-starter.git
61+
git clone https://github.com/google/dataflow-ml-starter.git
6262
cd df-ml-starter
6363
cp .env.template .env
6464
```
@@ -271,9 +271,23 @@ Note the cost and time depends on your job settings and the regions.
271271
### Run the Beam pipeline with the Pub/Sub source
272272
When `INPUT_DATA` from the `.env` file defines a valid Pub/Sub topic (e.g., `projects/apache-beam-testing/topics/Imagenet_openimage_50k_benchmark`),
273273
the Beam pipeline is created using the Pub/Sub source with `FixedWindows` and switches to `beam.io.fileio.WriteToFiles` that supports the streaming pipeline.
274-
We use `shards=0` here since 0 shards is the recommended approach and Dataflow would decide how many files it should write.
274+
Note for this toy example, writing the predictions to a GCS bucket is not efficient since the file size is quite small with few bytes.
275+
In practice, you might tune up [the autoscaling options](https://cloud.google.com/dataflow/docs/guides/troubleshoot-autoscaling) to optimize the streaming pipeline performance.
275276
Note that the streaming job will run forever until it is canceled or drained.
276277

278+
### Run the Beam pipeline with Dataflow Flex Templates
279+
If you prefer to package all your code into a custom container and allow users to easily access your Beam pipeline,
280+
Dataflow Flex Template could be handy to create and run a Flex Template job using Google Cloud CLI or Google Cloud console. (More benefits about templates are [here](https://cloud.google.com/dataflow/docs/concepts/dataflow-templates#benefits).)
281+
282+
Since the custom container is already created, it is straightforward to adapt Dataflow Flex Templates:
283+
1. create a `metadata.json` file that contains the parameters required by your Beam pipeline. In this example, we can add `input`, `output`, `device`, `model_name`, `model_state_dict_path`, and `tf_model_uri` as the parameters that can be passed in by users. [Here](https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates#example-metadata-file) is another example metadata file.
284+
2. convert the custom container to your template container following [this](https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates#use_custom_container_images). `tensorflow_gpu.flex.Dockerfile` is one example converted from `tensorflow_gpu.Dockerfile`. Only two parts are needed: switch to the Dataflow Template launcher entrypoint and package `src` into this container. Change `CUSTOM_CONTAINER_IMAGE` in `.env` and run `make docker` to create the custom container for Flex Templates.
285+
3. `make create-flex-template` creates a template spec file in a Cloud Storage bucket defined by the env `TEMPLATE_FILE_GCS_PATH` that contains all of the necessary information to run the job, such as the SDK information and metadata. This calls the CLI `gcloud dataflow flex-template build`.
286+
4. `make run-df-gpu-flex` runs a Flex Template pipeline using the spec file from `TEMPLATE_FILE_GCS_PATH`. This calls the CLI `gcloud dataflow flex-template run`.
287+
288+
More information about Flex Templates can be found [here](https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates).
289+
290+
277291
## FAQ
278292

279293
### Permission error when using any GCP command
@@ -328,4 +342,7 @@ exec /opt/apache/beam/boot: no such file or directory
328342
* https://cloud.google.com/dataflow/docs/gpu/use-gpus#custom-container
329343
* https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/
330344
* https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_InferenceBenchmarkTests_Python.groovy
331-
* https://cloud.google.com/dataflow/docs/gpu/troubleshoot-gpus#debug-vm
345+
* https://cloud.google.com/dataflow/docs/gpu/troubleshoot-gpus#debug-vm
346+
* https://github.com/GoogleCloudPlatform/python-docs-samples/tree/main/dataflow/flex-templates/streaming_beam
347+
* https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates
348+
* https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates#use_custom_container_images

flex/metadata.json

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
{
2+
"name": "Beam RunInference Python flex template",
3+
"description": "Beam RunInference example for python flex template.",
4+
"parameters": [
5+
{
6+
"name": "input",
7+
"label": "Input data",
8+
"helpText": "Input image URI data that could be a GCS bucket or pub/sub topic"
9+
},
10+
{
11+
"name": "output",
12+
"label": "Output GCS bucket path",
13+
"helpText": "A GCS bucket that stores the model predictions"
14+
},
15+
{
16+
"name": "tf_model_uri",
17+
"label": "TensorFlow model URI",
18+
"helpText": "A valid TensorFlow model URI",
19+
"isOptional": true
20+
},
21+
{
22+
"name": "model_name",
23+
"label": "a Pytorch model name",
24+
"helpText": "A model name, e.g. resnet101",
25+
"isOptional": true
26+
},
27+
{
28+
"name": "model_state_dict_path",
29+
"label": "a Pytorch model state path",
30+
"helpText": "Path to the model's state_dict",
31+
"isOptional": true
32+
},
33+
{
34+
"name": "device",
35+
"label": "device to run models",
36+
"helpText": "device could be either CPU or GPU",
37+
"isOptional": true
38+
},
39+
{
40+
"name": "disk_size_gb",
41+
"label": "disk_size_gb",
42+
"helpText": "disk_size_gb for worker",
43+
"isOptional": true
44+
},
45+
{
46+
"name": "dataflow_service_option",
47+
"label": "dataflow_service_option",
48+
"helpText": "dataflow_service_option for worker",
49+
"isOptional": true
50+
}
51+
]
52+
}

src/pipeline.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,10 @@
3131
from PIL import Image
3232
from torchvision import models, transforms
3333

34-
from .config import ModelConfig, ModelName, SinkConfig, SourceConfig
34+
try:
35+
from .config import ModelConfig, ModelName, SinkConfig, SourceConfig
36+
except ImportError:
37+
from config import ModelConfig, ModelName, SinkConfig, SourceConfig
3538

3639
import tensorflow as tf # isort:skip
3740

src/run.py

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,12 @@
2323
from apache_beam.options.pipeline_options import PipelineOptions, SetupOptions
2424
from apache_beam.runners.runner import PipelineResult
2525

26-
from .config import ModelConfig, SinkConfig, SourceConfig
27-
from .pipeline import build_pipeline
26+
try:
27+
from .config import ModelConfig, SinkConfig, SourceConfig
28+
from .pipeline import build_pipeline
29+
except ImportError:
30+
from config import ModelConfig, SinkConfig, SourceConfig
31+
from pipeline import build_pipeline
2832

2933

3034
def parse_known_args(argv):

tensorflow_gpu.flex.Dockerfile

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# Copyright 2023 Google LLC
2+
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
7+
# https://www.apache.org/licenses/LICENSE-2.0
8+
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
# This needs Python 3.8 for your local runtime environment
16+
17+
FROM gcr.io/dataflow-templates-base/flex-template-launcher-image:latest as template_launcher
18+
19+
# Select an NVIDIA base image with desired GPU stack from https://ngc.nvidia.com/catalog/containers/nvidia:cuda
20+
FROM nvidia/cuda:11.8.0-cudnn8-runtime-ubuntu20.04
21+
22+
WORKDIR /workspace
23+
24+
COPY requirements.txt requirements.txt
25+
26+
RUN \
27+
# Add Deadsnakes repository that has a variety of Python packages for Ubuntu.
28+
# See: https://launchpad.net/~deadsnakes/+archive/ubuntu/ppa
29+
apt-key adv --keyserver keyserver.ubuntu.com --recv-keys F23C5A6CF475977595C89F51BA6932366A755776 \
30+
&& echo "deb http://ppa.launchpad.net/deadsnakes/ppa/ubuntu focal main" >> /etc/apt/sources.list.d/custom.list \
31+
&& echo "deb-src http://ppa.launchpad.net/deadsnakes/ppa/ubuntu focal main" >> /etc/apt/sources.list.d/custom.list \
32+
&& apt-get update \
33+
&& apt-get install -y curl \
34+
python3.8 \
35+
python3.8-venv \
36+
python3-venv \
37+
# With python3.8 package, distutils need to be installed separately.
38+
python3-distutils \
39+
&& rm -rf /var/lib/apt/lists/* \
40+
&& update-alternatives --install /usr/bin/python python /usr/bin/python3.8 10 \
41+
&& curl https://bootstrap.pypa.io/get-pip.py | python \
42+
&& pip install --upgrade pip \
43+
&& pip install --no-cache-dir -r requirements.txt \
44+
&& pip install --no-cache-dir tensorflow==2.12.1 \
45+
&& pip install --no-cache-dir torch==2.0.0+cu118 torchvision==0.15.1+cu118 torchaudio==2.0.1 --index-url https://download.pytorch.org/whl/cu118
46+
47+
# Copy the run module
48+
COPY src/ /workspace/src
49+
RUN rm -fr /workspace/src/__pycache__
50+
51+
#Specifies which Python file to run to launch the Flex Template.
52+
ENV FLEX_TEMPLATE_PYTHON_PY_FILE="src/run.py"
53+
54+
# Since we already downloaded all the dependencies, there's no need to rebuild everything.
55+
ENV PIP_NO_DEPS=True
56+
57+
ENV PYTHONPATH "${PYTHONPATH}:/workspace/src/"
58+
59+
# Copy the Dataflow Template launcher
60+
COPY --from=template_launcher /opt/google/dataflow/python_template_launcher /opt/google/dataflow/python_template_launcher
61+
62+
# Copy files from official SDK image, including script/dependencies.
63+
# Note Python 3.8 is used due to the base image from nvidia
64+
COPY --from=apache/beam_python3.8_sdk:${BEAM_VERSION} /opt/apache/beam /opt/apache/beam
65+
66+
# Set the entrypoint to the Dataflow Template launcher
67+
# Use this if the launcher image is different with the custom container image
68+
# ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"]
69+
70+
# Set the entrypoint to Apache Beam SDK launcher.
71+
ENTRYPOINT ["/opt/apache/beam/boot"]

0 commit comments

Comments
 (0)