Correcting the pipeline object definition#34899
Conversation
|
Assigning reviewers: R: @shunping for label python. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
|
Thanks for trying to clarify the concept. However, I actually prefer the comments prior to this change, i.e. a pipeline is a DAG of PTransforms. One reason is that when we construct a pipeline in any Beam SDKs, we are explicitly putting the PTransforms together. PCollections, on the other hand, only serve as the output of one PTransform and the input of another. |
|
Thanks for your review. Isn't this contradicting here, then ? |
|
If every PTransform has only one input PCollection and only one output PCollection, then it sounds also ok to say a Pipeline is a DAG of PCollections. However, some PTransform takes multiple inputs, like Flatten. Say the input PCollections are A and B and the output of Flatten is C. It will be weird to say A,B,C are nodes, because now we have two edges A-C and B-C, and both edges represent the same Flatten Transform. |
Looks like there is inconsistency here. I checked our pipeline proto, and it also says a pipeline is a graph of PTransforms. |
|
My appologies. I closed the PR by mistake. Could you help modify the comment here instead? Thanks! beam/sdks/python/apache_beam/pipeline.py Line 122 in 5b862dd |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #34899 +/- ##
============================================
+ Coverage 54.52% 54.54% +0.01%
Complexity 1479 1479
============================================
Files 1010 1011 +1
Lines 160461 160513 +52
Branches 1079 1079
============================================
+ Hits 87499 87544 +45
- Misses 70864 70871 +7
Partials 2098 2098
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Run Prism_Python PreCommit 3.12 |
|
Just noticed two small typos in the change. Could you correct them so we can merge the PR? Thanks! |
shunping
left a comment
There was a problem hiding this comment.
LGTM. Thanks for clarifying the concepts.
* Correcting the pipeline object definition * Corrected the definition * lint correction * Corrected formatting
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.