Refactor: separate SplittableTruncateSizedRestrictions#35021
Refactor: separate SplittableTruncateSizedRestrictions#35021kennknowles merged 1 commit intoapache:masterfrom
Conversation
b5b739b to
5178d01
Compare
|
R: @scwhittle |
|
Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment |
|
I was spending my time on #34902 but honestly these refactors make that a lot more manageable. So I put an hour into this one today to fix the last remaining harness test failure. CC @stankiewicz @talatuyarer |
300dc36 to
bc12475
Compare
bc12475 to
89fbea4
Compare
|
Please do review. All the failing tests pass locally and I believe are flaking. I will keep kicking them and trying to repro locally but so far non are actually broken. |
|
Green! Huzzah! |
sdks/java/harness/src/main/java/org/apache/beam/fn/harness/FnApiDoFnRunner.java
Show resolved
Hide resolved
.../src/main/java/org/apache/beam/fn/harness/SplittableTruncateSizedRestrictionsDoFnRunner.java
Show resolved
Hide resolved
.../src/main/java/org/apache/beam/fn/harness/SplittableTruncateSizedRestrictionsDoFnRunner.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/beam/fn/harness/SplittableTruncateSizedRestrictionsDoFnRunner.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/beam/fn/harness/SplittableTruncateSizedRestrictionsDoFnRunner.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/beam/fn/harness/SplittableTruncateSizedRestrictionsDoFnRunner.java
Outdated
Show resolved
Hide resolved
| // Note that the assumption here is the fullInputCoder of the Truncate transform should be the | ||
| // the same as the SDF/Process transform. | ||
| Coder<WindowedValue<?>> fullInputCoder = | ||
| (Coder<WindowedValue<?>>) (Object) WindowedValues.getFullCoder(inputCoder, windowCoder); |
There was a problem hiding this comment.
can this be enforced earlier during constuction?
There was a problem hiding this comment.
Hmm. I don't really understand this comment TBH.
I thought having the coders line up is an invariant on the ProcessBundleDescriptor. I had actually thought that it would be up to the descriptor to say what coder to use, and that in the Fn API the fact that the data place is using a WindowedValueCoder is explicit in the instruction graph. But I do see that we pull the coder from the proto and it is not a WindowedValueCoder, so I guess we have left that implicit? That actually seems like a problem we may have to fix...
For now I actually think just removing this comment is fair, no? This is just the same process that is used in all the variants of the FnApiDoFnRunner. I think the TODO would be "just use the coder the runner tells you to"? Like... we want to be able to tell the harness to use coders that may or may not be windowed value coders? As in, the harness would be a UDF server that just does as it is told with as few assumptions as possible.
There was a problem hiding this comment.
I was wondering if this cast could ever fail, and if it could should we be creating the fullInputCoder during the construction of this class so we fail earlier during setup, since I believe we have inputCoder and windowCoder available then.
There was a problem hiding this comment.
The cast can't fail because it is actually just forgetting some actual type parameters for type compatibility with other code. It converts FullWindowedValueCoder<KV<KV<InputT, KV<RestrictionT, WatermarkEstimatorStateT>>, Double>> to Coder<WindowedValue<?>>.
- Because the erased type is
Coderin all cases it literally cannot fail. - It can be incorrect, but we don't have a way to catch it until it goes weirdly wrong later (as with all such "unchecked" casts)
- The wacky cast "through"
Objectavoids raw types, which would require suppression. This used to be a cast through the raw typeCoder.
But it can, indeed, be constructed earlier. I've moved it to the constructor.
.../src/main/java/org/apache/beam/fn/harness/SplittableTruncateSizedRestrictionsDoFnRunner.java
Outdated
Show resolved
Hide resolved
.../src/main/java/org/apache/beam/fn/harness/SplittableTruncateSizedRestrictionsDoFnRunner.java
Outdated
Show resolved
Hide resolved
89fbea4 to
f577796
Compare
scwhittle
left a comment
There was a problem hiding this comment.
Just some minor comments
| * check values for null to determine presence/absence. Instead you can store a {@code @Nullable | ||
| * Holder<T>}. | ||
| */ | ||
| public class Holder<T> { |
There was a problem hiding this comment.
Technically, all of sdk/util is "internal". Not sure whether that or an annotation will stop a user from complaining if we break it though :-). I added the annotation.
| import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; | ||
|
|
||
| /** Miscellaneous methods for working with progress. */ | ||
| public abstract class ProgressUtils { |
There was a problem hiding this comment.
Technically, users should never have a compile-time dep on org.apache.beam.fn.harness. The SDK harness is a runtime-only non-user-facing library. I added the annotation anyhow, though I worry that suggests other classes in it are not internal..
| // Note that the assumption here is the fullInputCoder of the Truncate transform should be the | ||
| // the same as the SDF/Process transform. | ||
| Coder<WindowedValue<?>> fullInputCoder = | ||
| (Coder<WindowedValue<?>>) (Object) WindowedValues.getFullCoder(inputCoder, windowCoder); |
There was a problem hiding this comment.
I was wondering if this cast could ever fail, and if it could should we be creating the fullInputCoder during the construction of this class so we fail earlier during setup, since I believe we have inputCoder and windowCoder available then.
f577796 to
5b4c05e
Compare
kennknowles
left a comment
There was a problem hiding this comment.
Thanks for the thorough review! I will get the tests green, run our internal tests, then merge.
| // Note that the assumption here is the fullInputCoder of the Truncate transform should be the | ||
| // the same as the SDF/Process transform. | ||
| Coder<WindowedValue<?>> fullInputCoder = | ||
| (Coder<WindowedValue<?>>) (Object) WindowedValues.getFullCoder(inputCoder, windowCoder); |
There was a problem hiding this comment.
The cast can't fail because it is actually just forgetting some actual type parameters for type compatibility with other code. It converts FullWindowedValueCoder<KV<KV<InputT, KV<RestrictionT, WatermarkEstimatorStateT>>, Double>> to Coder<WindowedValue<?>>.
- Because the erased type is
Coderin all cases it literally cannot fail. - It can be incorrect, but we don't have a way to catch it until it goes weirdly wrong later (as with all such "unchecked" casts)
- The wacky cast "through"
Objectavoids raw types, which would require suppression. This used to be a cast through the raw typeCoder.
But it can, indeed, be constructed earlier. I've moved it to the constructor.
| * check values for null to determine presence/absence. Instead you can store a {@code @Nullable | ||
| * Holder<T>}. | ||
| */ | ||
| public class Holder<T> { |
There was a problem hiding this comment.
Technically, all of sdk/util is "internal". Not sure whether that or an annotation will stop a user from complaining if we break it though :-). I added the annotation.
| import org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker; | ||
|
|
||
| /** Miscellaneous methods for working with progress. */ | ||
| public abstract class ProgressUtils { |
There was a problem hiding this comment.
Technically, users should never have a compile-time dep on org.apache.beam.fn.harness. The SDK harness is a runtime-only non-user-facing library. I added the annotation anyhow, though I worry that suggests other classes in it are not internal..
|
This PR breaks PubsubIO Load Test, specifically a use case of UnboundedSyntheticSource stopped producing data, see #35194 (comment) for details. |
This follows #34919, splis out calls to
@TruncateRestriction. This is the first Fn API transform that deals with responding to splits so it is a bit more complex than the prior refactors that only touched@GetSizeand@GetInitialRestrictionand@SplitRestrictionthat do not respond to split requests.As I understand it, this responds to split requests by delegating them downstream, as it is expected to be fused with final element+restriction processing.
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.