Skip to content

Commit aedddc0

Browse files
authored
Update the readme (#19)
* update the readme * update the notes * updated the words
1 parent 7a4421b commit aedddc0

File tree

2 files changed

+10
-5
lines changed

2 files changed

+10
-5
lines changed

README.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -76,14 +76,15 @@ All the useful actions can be triggered using `make`:
7676
```console
7777
$ make
7878

79-
make targets:
79+
make targets:
8080

8181
check-beam Check whether Beam is installed on GPU using VM with Custom Container
8282
check-pipeline Check whether the Beam pipeline can run on GPU using VM with Custom Container and DirectRunner
8383
check-tf-gpu Check whether Tensorflow works on GPU using VM with Custom Container
8484
check-torch-gpu Check whether PyTorch works on GPU using VM with Custom Container
8585
clean Remove virtual environment, downloaded models, etc
8686
clean-lite Remove pycache files, pytest files, etc
87+
create-flex-template Create a Flex Template file using a Flex Template custom container
8788
create-vm Create a VM with GPU to test the docker image
8889
delete-vm Delete a VM
8990
docker Build a custom docker image and push it to Artifact Registry
@@ -94,8 +95,10 @@ $ make
9495
lint Run linter on source code
9596
run-df-cpu Run a Dataflow job with CPUs and without Custom Container
9697
run-df-gpu Run a Dataflow job using the custom container with GPUs
98+
run-df-gpu-flex Run a Dataflow job using the Flex Template
9799
run-direct Run a local test with DirectRunner
98100
test Run tests
101+
test-latest-env Replace the Beam vesion with the latest version (including release candidates)
99102
```
100103

101104
### Pipeline Details
@@ -277,11 +280,13 @@ Note that the streaming job will run forever until it is canceled or drained.
277280

278281
### Run the Beam pipeline with Dataflow Flex Templates
279282
If you prefer to package all your code into a custom container and allow users to easily access your Beam pipeline,
280-
Dataflow Flex Template could be handy to create and run a Flex Template job using Google Cloud CLI or Google Cloud console. (More benefits about templates are [here](https://cloud.google.com/dataflow/docs/concepts/dataflow-templates#benefits).)
283+
Dataflow Flex Template could be handy to create and run a Flex Template job using Google Cloud CLI or Google Cloud console.
284+
More importantly, building the flex templates container from the custom SDK container image can produce a reproducible launch environment that is [compatible with the runtime environment](https://beam.apache.org/documentation/sdks/python-pipeline-dependencies/#make-the-launch-environment-compatible-with-the-runtime-environment).
285+
(More benefits about templates are [here](https://cloud.google.com/dataflow/docs/concepts/dataflow-templates#benefits).)
281286

282287
Since the custom container is already created, it is straightforward to adapt Dataflow Flex Templates:
283-
1. create a `metadata.json` file that contains the parameters required by your Beam pipeline. In this example, we can add `input`, `output`, `device`, `model_name`, `model_state_dict_path`, and `tf_model_uri` as the parameters that can be passed in by users. [Here](https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates#example-metadata-file) is another example metadata file.
284-
2. convert the custom container to your template container following [this](https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates#use_custom_container_images). `tensorflow_gpu.flex.Dockerfile` is one example converted from `tensorflow_gpu.Dockerfile`. Only two parts are needed: switch to the Dataflow Template launcher entrypoint and package `src` into this container. Change `CUSTOM_CONTAINER_IMAGE` in `.env` and run `make docker` to create the custom container for Flex Templates.
288+
1. create a [`metadata.json`](https://github.com/google/dataflow-ml-starter/blob/main/flex/metadata.json) file that contains the parameters required by your Beam pipeline. In this example, we can add `input`, `output`, `device`, `model_name`, `model_state_dict_path`, and `tf_model_uri` as the parameters that can be passed in by users. [Here](https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates#example-metadata-file) is another example metadata file.
289+
2. convert the custom container to your template container following [this](https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates#use_custom_container_images). [`tensorflow_gpu.flex.Dockerfile`](https://github.com/google/dataflow-ml-starter/blob/main/tensorflow_gpu.flex.Dockerfile) is one example converted from `tensorflow_gpu.Dockerfile`. Only two parts are needed: switch to the Dataflow Template launcher entrypoint and package `src` into this container. Change `CUSTOM_CONTAINER_IMAGE` in `.env` and run `make docker` to create the custom container for Flex Templates.
285290
3. `make create-flex-template` creates a template spec file in a Cloud Storage bucket defined by the env `TEMPLATE_FILE_GCS_PATH` that contains all of the necessary information to run the job, such as the SDK information and metadata. This calls the CLI `gcloud dataflow flex-template build`.
286291
4. `make run-df-gpu-flex` runs a Flex Template pipeline using the spec file from `TEMPLATE_FILE_GCS_PATH`. This calls the CLI `gcloud dataflow flex-template run`.
287292

tensorflow_gpu.flex.Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ ENV PYTHONPATH "${PYTHONPATH}:/workspace/src/"
6060
COPY --from=template_launcher /opt/google/dataflow/python_template_launcher /opt/google/dataflow/python_template_launcher
6161

6262
# Copy files from official SDK image, including script/dependencies.
63-
# Note Python 3.8 is used due to the base image from nvidia
63+
# Note Python 3.8 is used since the above setup uses Python 3.8.
6464
COPY --from=apache/beam_python3.8_sdk:${BEAM_VERSION} /opt/apache/beam /opt/apache/beam
6565

6666
# Set the entrypoint to the Dataflow Template launcher

0 commit comments

Comments
 (0)