feat(workflow-engine): Track tainted workflow evaluations#107311
feat(workflow-engine): Track tainted workflow evaluations#107311
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
|
|
||
| def report_metrics(self, metric_name: str) -> None: | ||
| metrics_incr(metric_name, self.tainted, tags={"tainted": True}) | ||
| metrics_incr(metric_name, self.untainted, tags={"tainted": False}) |
There was a problem hiding this comment.
Duplicated stats class for tainted evaluation tracking
Low Severity
The new EvaluationStats class duplicates the existing _ConditionEvaluationStats class in delayed_workflow.py. Both have identical fields (tainted: int, untainted: int) and serve the same purpose of tracking tainted vs untainted evaluation counts. The new class adds useful methods (from_results, __add__, report_metrics) that could benefit the delayed workflow code as well. These should be unified into a single class to avoid maintenance burden and ensure consistent taint tracking across both immediate and delayed evaluation paths.
There was a problem hiding this comment.
well aware, but need to make that code workflow based first.
There was a problem hiding this comment.
🤔 should these be workflow based methods?
One thing i've been thinking about is if we could compose these condition group / condition evaluation methods more, to then reuse in delayed processing as well. If we go down that approach, i'd think of these as DataCondition based.
saponifi3d
left a comment
There was a problem hiding this comment.
generally lgtm, mostly just nitpicks / thoughts.
|
|
||
| def test_workflow_trigger(self) -> None: | ||
| triggered_workflows, _ = evaluate_workflow_triggers( | ||
| triggered_workflows, _, _ = evaluate_workflow_triggers( |
There was a problem hiding this comment.
🤔 since the tuple is growing, should we return a typed dict instead? that way it's a little easier to reason through the returned result.
| return TriggerResult(triggered=self.triggered, error=error) | ||
|
|
||
| @staticmethod | ||
| def choose_tainted(a: "TriggerResult", b: "TriggerResult") -> "TriggerResult": |
There was a problem hiding this comment.
should we just make this a list of TriggerResults and return the first tainted? might be a little more reusable that way.
| if evaluation.is_tainted(): | ||
| tainted_untriggered += 1 | ||
| else: | ||
| untainted_untriggered += 1 |
There was a problem hiding this comment.
it kinda feels like the tainted / untainted stuff could be encapsulated a little more. could we just add the evaluation result to a list and have it determine this information? That way we don't need to independently track this then rebuild it for the results
|
|
||
| @sentry_sdk.trace | ||
| @scopedstats.timer() | ||
| def evaluate_workflows_action_filters( |
There was a problem hiding this comment.
unrelated: we might want to look at decomposing this method and the trigger condition methods. it seems like we could probably compose these two a bit more and reduce code replication
|
|
||
| def report_metrics(self, metric_name: str) -> None: | ||
| metrics_incr(metric_name, self.tainted, tags={"tainted": True}) | ||
| metrics_incr(metric_name, self.untainted, tags={"tainted": False}) |
There was a problem hiding this comment.
🤔 should these be workflow based methods?
One thing i've been thinking about is if we could compose these condition group / condition evaluation methods more, to then reuse in delayed processing as well. If we go down that approach, i'd think of these as DataCondition based.


Report metrics for workflow evaluations that may have produced incorrect results due to errors during condition evaluation ("tainted" results).
This helps monitor evaluation reliability by emitting a single metric
process_workflows.workflows_evaluatedwith ataintedtag, allowing us to track the ratio and number of tainted workflows.This doesn't yet propagate taintedness to delayed evaluation; that's a planned follow-up.
Updates ISWF-1960.