Skip to content

feat(workflow-engine): Track tainted workflow evaluations#107311

Open
kcons wants to merge 5 commits intomasterfrom
kcons/fastwf
Open

feat(workflow-engine): Track tainted workflow evaluations#107311
kcons wants to merge 5 commits intomasterfrom
kcons/fastwf

Conversation

@kcons
Copy link
Member

@kcons kcons commented Jan 30, 2026

Report metrics for workflow evaluations that may have produced incorrect results due to errors during condition evaluation ("tainted" results).

This helps monitor evaluation reliability by emitting a single metric process_workflows.workflows_evaluated with a tainted tag, allowing us to track the ratio and number of tainted workflows.
This doesn't yet propagate taintedness to delayed evaluation; that's a planned follow-up.

Updates ISWF-1960.

@kcons kcons requested a review from a team as a code owner January 30, 2026 00:31
@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jan 30, 2026
Copy link
Contributor

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.


def report_metrics(self, metric_name: str) -> None:
metrics_incr(metric_name, self.tainted, tags={"tainted": True})
metrics_incr(metric_name, self.untainted, tags={"tainted": False})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated stats class for tainted evaluation tracking

Low Severity

The new EvaluationStats class duplicates the existing _ConditionEvaluationStats class in delayed_workflow.py. Both have identical fields (tainted: int, untainted: int) and serve the same purpose of tracking tainted vs untainted evaluation counts. The new class adds useful methods (from_results, __add__, report_metrics) that could benefit the delayed workflow code as well. These should be unified into a single class to avoid maintenance burden and ensure consistent taint tracking across both immediate and delayed evaluation paths.

Fix in Cursor Fix in Web

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well aware, but need to make that code workflow based first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 should these be workflow based methods?

One thing i've been thinking about is if we could compose these condition group / condition evaluation methods more, to then reuse in delayed processing as well. If we go down that approach, i'd think of these as DataCondition based.

Copy link
Contributor

@saponifi3d saponifi3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally lgtm, mostly just nitpicks / thoughts.


def test_workflow_trigger(self) -> None:
triggered_workflows, _ = evaluate_workflow_triggers(
triggered_workflows, _, _ = evaluate_workflow_triggers(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 since the tuple is growing, should we return a typed dict instead? that way it's a little easier to reason through the returned result.

return TriggerResult(triggered=self.triggered, error=error)

@staticmethod
def choose_tainted(a: "TriggerResult", b: "TriggerResult") -> "TriggerResult":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we just make this a list of TriggerResults and return the first tainted? might be a little more reusable that way.

Comment on lines +247 to +250
if evaluation.is_tainted():
tainted_untriggered += 1
else:
untainted_untriggered += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it kinda feels like the tainted / untainted stuff could be encapsulated a little more. could we just add the evaluation result to a list and have it determine this information? That way we don't need to independently track this then rebuild it for the results


@sentry_sdk.trace
@scopedstats.timer()
def evaluate_workflows_action_filters(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unrelated: we might want to look at decomposing this method and the trigger condition methods. it seems like we could probably compose these two a bit more and reduce code replication


def report_metrics(self, metric_name: str) -> None:
metrics_incr(metric_name, self.tainted, tags={"tainted": True})
metrics_incr(metric_name, self.untainted, tags={"tainted": False})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 should these be workflow based methods?

One thing i've been thinking about is if we could compose these condition group / condition evaluation methods more, to then reuse in delayed processing as well. If we go down that approach, i'd think of these as DataCondition based.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Scope: Backend Automatically applied to PRs that change backend components

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants