[GOBBLIN-2186] Emit GoT GTEs to time WorkUnit prep and to record volume of Work Discovery#4089
Conversation
…discovered, after reworking `TemporalEventTimer.Factory` for use within an `Activity`
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #4089 +/- ##
============================================
- Coverage 47.98% 42.95% -5.03%
+ Complexity 8373 2436 -5937
============================================
Files 1582 507 -1075
Lines 62712 21403 -41309
Branches 7105 2456 -4649
============================================
- Hits 30091 9194 -20897
+ Misses 29899 11272 -18627
+ Partials 2722 937 -1785 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| Set<String> pathsToCleanUp) | ||
| throws ReflectiveOperationException { | ||
| // report (timer) metrics for "Work Discovery", *planning only* - NOT including WU prep, like serialization, `DestinationDatasetHandlerService`ing, etc. | ||
| // IMPORTANT: for accurate timing, SEPARATELY emit `.createWorkPreparationTimer`, to record time prior to measuring the WU size required for that one |
There was a problem hiding this comment.
Do you mean here one GTE event for serialization step or GTE event for everything discovery serialization and all ?
and also would that GTE help us in anyway if we had , wdyt ?
There was a problem hiding this comment.
originally, in AbstractJobLauncher the "WU creation timer" measured only the planning -
that is what's included in the GaaSJobObservabilityEvent.
the timer for WU prep happens a bit later -
so in this comment:
"Work Discovery", planning only - NOT including WU prep, like serialization, ...
I just meant that we're timing only planning/creation, not the preparation such as serialization.
as for WU serialization, there is no existing, historical event strictly for that. typically that only takes a long time when memory-constrained and GC-bound. although we could consider adding a new event to time that, for purposes of right-sizing, GC stats are more interesting than the duration it happens to take. if anything, the former is what I'd prioritize.
| // IMPORTANT: send prior to `writeWorkUnits`, so the volume of work discovered (and bin packed) gets durably measured. even if serialization were to | ||
| // exceed available memory and this activity execution were to fail, a subsequent re-attempt would know the amount of work, to guide re-config/attempt | ||
| createWorkPreparedSizeDistillationTimer(wuSizeSummary, eventSubmitterContext).stop(); |
There was a problem hiding this comment.
I think, if serialization takes time and if it accounts for more OOM issue then should we consider these both in separate activity and launched through one parent workflow as IIUC activity retry will do everything from beginning and discovered workunits will be generated again which can also lead to duplication of GTE, whats your opinion on this ?
There was a problem hiding this comment.
having WU planning separate from WU serialization is worth considering. that would allow for a re-attempt of only serialization w/o needing to rerun the planning. there's no concern in sending the GTE again on the re-attempt - that's not the motivation. instead the benefit would be to expedite the re-attempt by subsequently skipping a repeat of successful WU planning.
to separate into two activities we'd need to succeed in persisting some intermediate form of WU planning, so the WU planning activity could "pass input" to the WU serialization activity, as the two won't execute together - or possibly even on the same host. that intermediate form clearly ought to be more likely to succeed than serializing all the WUs themselves - the failure we're trying to address.
this major design choice awaits if we decide to pursue such larger rework.
Dear Gobblin maintainers,
Please accept this PR. I understand that it will not be reviewed until I have checked off all the steps below!
JIRA
Description
GaaSJobObservabilityEvents for Gobblin-on-Temporal jobs presently have no values set for the fieldsjobPlanningStartTimestampandjobPlanningEndTimestamp. This is because, in contrast with Gobblin-on-MR,GenerateWorkUnitsImplemits noTimingEvent.LauncherTimings.WORK_UNITS_CREATIONGTE to record such values.The historical impediment was that Temporal does not permit Activities to launch other Activities (only Workflows may do that).
TemporalEventTimer, however, emitted its GTE on a separate Activity for reliability. This change reworksTemporalEventTimer.Factoryto be useable within anActivity, by refactoring into complementary concrete realizations -TemporalEventTimer.WithinWorkflowFactoryandTemporalEventTimer.WithinActivityFactory. The latter sets up direct GTE emission within the same (current)Activity.The alternative of reworking
GenerateWorkUnitsImplfrom anActivityinto aWorkflowwould be challenging to accomplish, because "Work Discovery" may be quite long running - 15-30 mins is not uncommon for large full-initial-copy Iceberg-Distcp jobs - yet Temporal allows a workflow-task to run for a maximum of 2 minutes. (Since we wish to emit this timing GTE in the midst of Work Discovery, it would not be easy to have one sub-activity first performing all of Work Discovery and another subsequently doing GTE emission once the first concludes.)In addition, now that this refactoring happily unblocks GTE-emission from
GenerateWorkUnitsImplwe were primed to emit new metadata (on another GTE), which is actually even more critical to emit in the midst of Work Discovery: one to record the volume ofWorkUnits discovered. To be useful, this must be done prior to WU serialization, which is memory-intensive and may OOM. If that should happen this GTE's durable measurement would inform re-configuration for any subsequent re-attempt.Accordingly, we preserve total size and counts, means, and medians of both original and bin packed WUs. This was inspired by a pair of log messages produced within
CopySource::getWorkUnits:and
We often seek out these messages to diagnose and resolve failures during
WorkUnitDiscovery, but rather than merely logging, this durable emission sets up fully-automatic (machine-to-machine) right-sizing to orchestrate a potential re-attempt (in case of WU serialization OOM error).(Actual handling of this new GTE
TimingEvent#WORKUNITS_GENERATED_SUMMARYmetadata by GaaS will require a separate change, e.g. toKafkaAvroJobStatusMonitor#parseJobStatus.)Tests
manual execution
Commits