[GOBBLIN-2188] Define Initializer.AfterInitializeMemento for GoT to tunnel state from GenerateWorkUnits to CommitActivity#4091
Merged
phet merged 2 commits intoapache:masterfrom Jan 7, 2025
Conversation
…state between `GenerateWorkUnits` and `CommitActivity`
418bf05 to
746e378
Compare
AfterInitializeMemento for Initializers for GoT to tunnel state between GenerateWorkUnits and CommitActivityInitializer.AfterInitializeMemento for GoT to tunnel state from GenerateWorkUnits to CommitActivity
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
Stateful writer/converter
Initializers, such asJdbcWriterInitializer, work fine with Gobblin-on-MR, but get disrupted by GoT. While GoMR does also launch an MR application, the remainder of theMRJobLauncherexecution is within the same process. AnInitializermust execute at the end of WorkDiscovery, beforeWorkUnitprocessing may begin, but is.close()d only after Job Commit completes. Crucially, with GoMR, the sameInitializerinstance remains in memory all throughout. With GoT, in contrast, Work Discovery and Commit execute completely independently - creating new objects/instances, perhaps even on a new host/container.Some
Initializers, such as theJdbcWriter'sJdbcWriterInitializerare stateful. (In its case, to maintain the temp/staging table's name, so that may be dropped upon successful Commit.) Specific state originates during Work Discovery (theGenerateWorkUnitsImplactivity in GoT) yet must be available during Commit (theCommitActivityImplin GoT).Accordingly we define the
Initializer.AfterInitializeMementointerface so any concreteInitializerimpl has the opportunity to optionally define an opaque snapshot of its internal state that may be (de)serialized and "revived" as a newInitializerinstance of the same concrete type, and holding equivalent internal state.The evolved
Initializerinterface/API providesdefaultimpls for the new methodscommemorateandrecall, making it source-compatible with existing implementations. Any existingInitializerthat does NOT maintain internal state for itsclose()to use may be recompiled unchanged, and succeed with GoT.This GoT enhancement leverages
JobStateto tunnelAfterInitializeMementos, since it is serialized at the end of Work Discovery (inGenerateWorkUnitsImpl), and is later loaded when the Commit activity begins (inCommitStepActivityImpl).Review Tip: start first with how
Initializer.AfterInitializeMementos are created inGenerateWorkUnitsImpland used byCommitStepActivityImplbefore diving into their impl (with test).Tests
includes new
MultiInitializerTestCommits