You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add the example about how to use Flex Template with RunInference (#18)
* add the flex template
* updated the dockerfile
* updated the parameters
* update the dockerfile
* added PYTHONPATH
* remove the pubsub option
* remove time
@@ -271,9 +271,23 @@ Note the cost and time depends on your job settings and the regions.
271
271
### Run the Beam pipeline with the Pub/Sub source
272
272
When `INPUT_DATA` from the `.env` file defines a valid Pub/Sub topic (e.g., `projects/apache-beam-testing/topics/Imagenet_openimage_50k_benchmark`),
273
273
the Beam pipeline is created using the Pub/Sub source with `FixedWindows` and switches to `beam.io.fileio.WriteToFiles` that supports the streaming pipeline.
274
-
We use `shards=0` here since 0 shards is the recommended approach and Dataflow would decide how many files it should write.
274
+
Note for this toy example, writing the predictions to a GCS bucket is not efficient since the file size is quite small with few bytes.
275
+
In practice, you might tune up [the autoscaling options](https://cloud.google.com/dataflow/docs/guides/troubleshoot-autoscaling) to optimize the streaming pipeline performance.
275
276
Note that the streaming job will run forever until it is canceled or drained.
276
277
278
+
### Run the Beam pipeline with Dataflow Flex Templates
279
+
If you prefer to package all your code into a custom container and allow users to easily access your Beam pipeline,
280
+
Dataflow Flex Template could be handy to create and run a Flex Template job using Google Cloud CLI or Google Cloud console. (More benefits about templates are [here](https://cloud.google.com/dataflow/docs/concepts/dataflow-templates#benefits).)
281
+
282
+
Since the custom container is already created, it is straightforward to adapt Dataflow Flex Templates:
283
+
1. create a `metadata.json` file that contains the parameters required by your Beam pipeline. In this example, we can add `input`, `output`, `device`, `model_name`, `model_state_dict_path`, and `tf_model_uri` as the parameters that can be passed in by users. [Here](https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates#example-metadata-file) is another example metadata file.
284
+
2. convert the custom container to your template container following [this](https://cloud.google.com/dataflow/docs/guides/templates/configuring-flex-templates#use_custom_container_images). `tensorflow_gpu.flex.Dockerfile` is one example converted from `tensorflow_gpu.Dockerfile`. Only two parts are needed: switch to the Dataflow Template launcher entrypoint and package `src` into this container. Change `CUSTOM_CONTAINER_IMAGE` in `.env` and run `make docker` to create the custom container for Flex Templates.
285
+
3.`make create-flex-template` creates a template spec file in a Cloud Storage bucket defined by the env `TEMPLATE_FILE_GCS_PATH` that contains all of the necessary information to run the job, such as the SDK information and metadata. This calls the CLI `gcloud dataflow flex-template build`.
286
+
4.`make run-df-gpu-flex` runs a Flex Template pipeline using the spec file from `TEMPLATE_FILE_GCS_PATH`. This calls the CLI `gcloud dataflow flex-template run`.
287
+
288
+
More information about Flex Templates can be found [here](https://cloud.google.com/dataflow/docs/guides/templates/using-flex-templates).
289
+
290
+
277
291
## FAQ
278
292
279
293
### Permission error when using any GCP command
@@ -328,4 +342,7 @@ exec /opt/apache/beam/boot: no such file or directory
0 commit comments