Skip to content

Conversation

@neilbest-db
Copy link
Contributor

This is a reboot of #1174 to resolve #480. After extensive refactoring in #1223, #1224, and #1253 it was necessary to abandon #1174 and bring its new logic in manually.

gueniai and others added 15 commits May 31, 2024 15:03
commit a6a13fe
Author: Neil Best <[email protected]>
Date:   Thu May 23 16:39:58 2024 -0500

    improve TransformationDescriberTest

commit 1f145aa
Author: Neil Best <[email protected]>
Date:   Thu May 23 15:25:29 2024 -0500

    Add descriptive job group IDs and named transformations

    This makes the Spark UI more developer-friendly when analyzing
    Overwatch runs.

    Job group IDs have the form <workspace name>:<OW module name>

    Any use of `.transform( df => df)` may be replaced with
    `.transformWithDescription( nt)` after instantiating a `val nt =
    NamedTransformation( df => df)` as its argument.

    This commit contains one such application of the new extension method.
    (See `val jobRunsAppendClusterName` in `WorkflowsTransforms.scala`.)

    Some logic in `GoldTransforms` falls through to elements of the
    special job-run-action form of Job Group IDs emitted by the platform
    but the impact is minimal relative to the benefit to Overwatch
    development and troubleshooting.  Even so this form of Job Group ID is
    still present in initial Spark events before OW ETL modules begin to
    execute.

commit da0c55a
Author: Guenia <[email protected]>
Date:   Wed May 8 19:43:29 2024 -0400

    Initial commit
Removed a level of indirection and unnecessary conditional branching
in definition of chained `lookupWhen` transformations.

Moved defintions to have references to `PipelineTable` objects in
scope rather than passing them by argument.

(cherry picked from commit efdd63f)
- enable auto-optimized shuffle for module 2011

- move caching action to previous `NamedTransformation` for more
  meaningful Spark UI labels
for greater visibility in Spark UI. `NamedTransformation` type name
now appears in labels' second position.
prevent certain regressions when the Job Group labels set by the platform are no longer available for parsing.  Labels set by the platform contain tokens that are necessary to preserve referential integrity under certain conditions.  (Which conditions?)
from branch `480_workflows_support_cancelAllRuns-FUBAR` that was the
original dev branch, renamed when automated merge screwed everything up.
@neilbest-db neilbest-db added feature New feature or extension of existing feature (was "enhancement") data quality There is a data quality issue here labels Aug 27, 2024
@neilbest-db neilbest-db added this to the 0.9.0.0 milestone Aug 27, 2024
@sonarqubecloud
Copy link

@neilbest-db neilbest-db linked an issue Aug 27, 2024 that may be closed by this pull request
@neilbest-db neilbest-db self-assigned this Oct 2, 2024
@neilbest-db neilbest-db changed the title correctly integrate cancelAllRuns audit events with job runs Integrate cancelAllRuns audit events with job runs Oct 2, 2024
@gueniai gueniai deleted the branch 0900_placeholder_old October 9, 2024 18:12
@gueniai gueniai closed this Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data quality There is a data quality issue here feature New feature or extension of existing feature (was "enhancement")

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for cancelAllRuns of any/all workflows

5 participants