Skip to content

[GOBBLIN-2186] Emit GoT GTEs to time WorkUnit prep and to record volume of Work Discovery#4089

Merged
phet merged 2 commits intoapache:masterfrom
phet:got-gen-wu-prep-events
Jan 2, 2025
Merged

[GOBBLIN-2186] Emit GoT GTEs to time WorkUnit prep and to record volume of Work Discovery#4089
phet merged 2 commits intoapache:masterfrom
phet:got-gen-wu-prep-events

Conversation

@phet
Copy link
Contributor

@phet phet commented Jan 1, 2025

Dear Gobblin maintainers,

Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!

JIRA

Description

  • Here are some details about my PR, including screenshots (if applicable):

GaaSJobObservabilityEvents for Gobblin-on-Temporal jobs presently have no values set for the fields jobPlanningStartTimestamp and jobPlanningEndTimestamp. This is because, in contrast with Gobblin-on-MR, GenerateWorkUnitsImpl emits no TimingEvent.LauncherTimings.WORK_UNITS_CREATION GTE to record such values.

The historical impediment was that Temporal does not permit Activities to launch other Activities (only Workflows may do that). TemporalEventTimer, however, emitted its GTE on a separate Activity for reliability. This change reworks TemporalEventTimer.Factory to be useable within an Activity, by refactoring into complementary concrete realizations - TemporalEventTimer.WithinWorkflowFactory and TemporalEventTimer.WithinActivityFactory. The latter sets up direct GTE emission within the same (current) Activity.

The alternative of reworking GenerateWorkUnitsImpl from an Activity into a Workflow would be challenging to accomplish, because "Work Discovery" may be quite long running - 15-30 mins is not uncommon for large full-initial-copy Iceberg-Distcp jobs - yet Temporal allows a workflow-task to run for a maximum of 2 minutes. (Since we wish to emit this timing GTE in the midst of Work Discovery, it would not be easy to have one sub-activity first performing all of Work Discovery and another subsequently doing GTE emission once the first concludes.)

In addition, now that this refactoring happily unblocks GTE-emission from GenerateWorkUnitsImpl we were primed to emit new metadata (on another GTE), which is actually even more critical to emit in the midst of Work Discovery: one to record the volume of WorkUnits discovered. To be useful, this must be done prior to WU serialization, which is memory-intensive and may OOM. If that should happen this GTE's durable measurement would inform re-configuration for any subsequent re-attempt.

Accordingly, we preserve total size and counts, means, and medians of both original and bin packed WUs. This was inspired by a pair of log messages produced within CopySource::getWorkUnits:

Statistics for ConcurrentBoundedPriorityIterable: {ResourcePool: {softBound: [ ... ], hardBound: [ ...]},totalResourcesUsed: [ ... ], maxRequirementPerDimension: [entities: 231943.0, bytesCopied: 1.22419622769628E14], ... }

and

org.apache.gobblin.data.management.copy.CopySource - Bin packed work units. Initial work units: 27252, packed work units: 13175, max weight per bin: 500000000, max work units per bin: 100.

We often seek out these messages to diagnose and resolve failures during WorkUnit Discovery, but rather than merely logging, this durable emission sets up fully-automatic (machine-to-machine) right-sizing to orchestrate a potential re-attempt (in case of WU serialization OOM error).

(Actual handling of this new GTE TimingEvent#WORKUNITS_GENERATED_SUMMARY metadata by GaaS will require a separate change, e.g. to KafkaAvroJobStatusMonitor#parseJobStatus.)

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:

manual execution

Commits

  • My commits all reference JIRA issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

phet added 2 commits December 31, 2024 09:19
…discovered, after reworking `TemporalEventTimer.Factory` for use within an `Activity`
@codecov-commenter
Copy link

codecov-commenter commented Jan 1, 2025

Codecov Report

❌ Patch coverage is 2.04082% with 48 lines in your changes missing coverage. Please review.
✅ Project coverage is 42.95%. Comparing base (003590e) to head (8ce206d).
⚠️ Report is 45 commits behind head on master.

Files with missing lines Patch % Lines
...temporal/workflows/metrics/TemporalEventTimer.java 0.00% 20 Missing ⚠️
...obblin/temporal/ddm/work/WorkUnitsSizeSummary.java 0.00% 13 Missing ⚠️
...poral/ddm/activity/impl/GenerateWorkUnitsImpl.java 0.00% 7 Missing ⚠️
...oral/ddm/workflow/impl/CommitStepWorkflowImpl.java 0.00% 2 Missing ⚠️
.../ddm/workflow/impl/ExecuteGobblinWorkflowImpl.java 33.33% 2 Missing ⚠️
...gobblin/temporal/workflows/metrics/EventTimer.java 0.00% 2 Missing ⚠️
...dm/workflow/impl/ProcessWorkUnitsWorkflowImpl.java 0.00% 1 Missing ⚠️
...ral/workflows/helloworld/GreetingWorkflowImpl.java 0.00% 1 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (003590e) and HEAD (8ce206d). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (003590e) HEAD (8ce206d)
2 1
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #4089      +/-   ##
============================================
- Coverage     47.98%   42.95%   -5.03%     
+ Complexity     8373     2436    -5937     
============================================
  Files          1582      507    -1075     
  Lines         62712    21403   -41309     
  Branches       7105     2456    -4649     
============================================
- Hits          30091     9194   -20897     
+ Misses        29899    11272   -18627     
+ Partials       2722      937    -1785     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Set<String> pathsToCleanUp)
throws ReflectiveOperationException {
// report (timer) metrics for "Work Discovery", *planning only* - NOT including WU prep, like serialization, `DestinationDatasetHandlerService`ing, etc.
// IMPORTANT: for accurate timing, SEPARATELY emit `.createWorkPreparationTimer`, to record time prior to measuring the WU size required for that one
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean here one GTE event for serialization step or GTE event for everything discovery serialization and all ?
and also would that GTE help us in anyway if we had , wdyt ?

Copy link
Contributor Author

@phet phet Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

originally, in AbstractJobLauncher the "WU creation timer" measured only the planning -

workUnitsCreationTimer.stop(this.multiEventMetadataGenerator.getMetadata(this.jobContext,

that is what's included in the GaaSJobObservabilityEvent.

the timer for WU prep happens a bit later -

so in this comment:

"Work Discovery", planning only - NOT including WU prep, like serialization, ...

I just meant that we're timing only planning/creation, not the preparation such as serialization.

as for WU serialization, there is no existing, historical event strictly for that. typically that only takes a long time when memory-constrained and GC-bound. although we could consider adding a new event to time that, for purposes of right-sizing, GC stats are more interesting than the duration it happens to take. if anything, the former is what I'd prioritize.

Comment on lines +133 to +135
// IMPORTANT: send prior to `writeWorkUnits`, so the volume of work discovered (and bin packed) gets durably measured. even if serialization were to
// exceed available memory and this activity execution were to fail, a subsequent re-attempt would know the amount of work, to guide re-config/attempt
createWorkPreparedSizeDistillationTimer(wuSizeSummary, eventSubmitterContext).stop();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, if serialization takes time and if it accounts for more OOM issue then should we consider these both in separate activity and launched through one parent workflow as IIUC activity retry will do everything from beginning and discovered workunits will be generated again which can also lead to duplication of GTE, whats your opinion on this ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having WU planning separate from WU serialization is worth considering. that would allow for a re-attempt of only serialization w/o needing to rerun the planning. there's no concern in sending the GTE again on the re-attempt - that's not the motivation. instead the benefit would be to expedite the re-attempt by subsequently skipping a repeat of successful WU planning.

to separate into two activities we'd need to succeed in persisting some intermediate form of WU planning, so the WU planning activity could "pass input" to the WU serialization activity, as the two won't execute together - or possibly even on the same host. that intermediate form clearly ought to be more likely to succeed than serializing all the WUs themselves - the failure we're trying to address.

this major design choice awaits if we decide to pursue such larger rework.

@phet phet merged commit 3269764 into apache:master Jan 2, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants