Add PyTorch DistilBERT Sentiment Analysis Batch/Streaming pipelines for ML Benchmarks#34577
Conversation
|
Assigning reviewers. If you would like to opt out of this review, comment R: @liferoad for label python. Available commands:
The PR bot will only process comments in the main thread (not review comments). |
|
|
||
| pipeline = test_pipeline or beam.Pipeline(options=pipeline_options) | ||
|
|
||
| # 1. Load data pipeline: read lines from GCS file and send to Pub/Sub |
There was a problem hiding this comment.
I think we need to add a few things to this:
- These should run as separate pipelines and we should only monitor the results of the streaming portion
- We need a way of applying a consistent rate to the input elements in pub/sub. A better approach might be to either (a) just do this independently via a script, or (b) within Dataflow use periodic impulse or state/timers to control the rate at which we emit. This will likely work better as a helper which we can use across pipelines
- For streaming pipelines, we should enable autoscaling
There was a problem hiding this comment.
Added autoscaling.
Can we merge this PR with the current approach (it is the easiest way to run two pipelines simultaneously)?
Then we can implement reusable independent script that controls input rate.
There was a problem hiding this comment.
I think we should at least handle (1) and run these as two pipelines. Otherwise, I do not think we are measuring a meaningful dataset
|
@jrmccluskey please review this PR. Thanks. |
|
As best I can tell @damccorm's comments haven't been addressed yet so I'll defer here |
|
waiting on author |
Changes:
PyTorch DistilBERT Sentiment Analysis streamingto thebeam_Inference_Python_Benchmarks_Dataflowworkflow.PyTorch DistilBERT Sentiment Analysis batchto thebeam_Inference_Python_Benchmarks_Dataflowworkflow.Successful run: https://github.com/Amar3tto/beam/actions/runs/14516082090/job/40725350012
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.