Skip to content

Hide Dashboard metric percentages when a state count is capped#67664

Open
wilmerdooley wants to merge 1 commit into
apache:mainfrom
wilmerdooley:fix-67336-dashboard-capped-percentage
Open

Hide Dashboard metric percentages when a state count is capped#67664
wilmerdooley wants to merge 1 commit into
apache:mainfrom
wilmerdooley:fix-67336-dashboard-capped-percentage

Conversation

@wilmerdooley
Copy link
Copy Markdown

What

The Dashboard "Historical Metrics" percentages are computed in the frontend as count / total, where total is the sum of the per-state counts. The /ui/dashboard/historical_metrics_data endpoint caps each state count at STATE_COUNT_CAP (1000) for performance, so when any state exceeds the cap the summed total is only a lower bound and every per-state percentage is wrong. Example from the issue: success has 2500 runs but the API returns 1000, so failure (7) shows 7/1007 instead of 7/2507.

Fix

This takes option 2 from the issue: hide the percentages for a metric group when any of its states is capped, rather than display a wrong number. MetricSection already hid the percentage for a state that is itself capped; this adds a group-level totalCapped flag (true when any state is at or above the limit) so the percentage is hidden for all states in the group when the total is unreliable. The per-state "N+" label and the API cap stay as they are.

I kept the cap and fixed this in the frontend rather than returning real counts (option 1), because the cap on this endpoint is a deliberate performance bound that has been optimized for large installations (e.g. #62152, #63166); returning unbounded counts would reintroduce that cost.

closes: #67336


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: Claude Code

@boring-cyborg boring-cyborg Bot added the area:UI Related to UI/UX. For Frontend Developers. label May 28, 2026
@boring-cyborg
Copy link
Copy Markdown

boring-cyborg Bot commented May 28, 2026

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example Dag that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

@wilmerdooley
Copy link
Copy Markdown
Author

Quick note on the CI status: the only failing check is the Firefox UI e2e job, and the failure is in tests/e2e/specs/dag-calendar-tab.spec.ts (a tooltip visibility timeout on the Dag Calendar tab). That spec is unrelated to this change, which only affects the Dashboard metric percentage rendering. The same spec passed on the Chromium and WebKit e2e runs, so this looks like Firefox-specific flakiness rather than a regression from this PR. Happy to rebase onto latest main to retrigger CI if that would be helpful.

Copy link
Copy Markdown
Contributor

@bbovenzi bbovenzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, let's just fix the variable name

@bbovenzi bbovenzi added this to the Airflow 3.2.3 milestone Jun 1, 2026
@bbovenzi bbovenzi added the backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch label Jun 1, 2026
@wilmerdooley wilmerdooley force-pushed the fix-67336-dashboard-capped-percentage branch 2 times, most recently from de6e476 to 4280afe Compare June 1, 2026 22:43
Copy link
Copy Markdown
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

Can you add screenshot too please.

Comment on lines +75 to +77
capped={taskInstanceStates[state] >= stateCountLimit}
endDate={endDate}
isCapped={isCapped}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have 'capped' and 'isCapped' seems duplicated. Same above.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one started out as totalCapped, and I changed it to isCapped from @bbovenzi's earlier suggestion (#67664 (comment)). Happy to switch it back to totalCapped, which keeps it visibly distinct from the per state capped and matches the wording in the description.

They are two different values: capped is per state (this state's own count is at the limit, which drives the N+ badge and the full width bar), while the group flag is true when any state is at the limit, so the summed total is unreliable and the percentages are hidden for the whole group.

Let me know which name you both prefer and I will update.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see. This is my misunderstanding.

How about isTotalTruncated?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to something that convey the information that this is a group/global capped.

const hidePercent = capped || isCapped; // capped || isCapped === isCapped

Remove the capped. (if state is capped, then isCapped is True, the group is capped)

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Renamed the group-level flag to isTotalTruncated (the name @bbovenzi suggested) across MetricSection, TaskInstanceMetrics, and DagRunMetrics, and simplified the suppression to const hidePercent = isTotalTruncated, dropping the redundant capped || since a capped state already implies the group total is truncated. The per-state capped prop is unchanged.

@wilmerdooley
Copy link
Copy Markdown
Author

@pierrejeambrun

Thanks! Here are screenshots of the Historical Metrics section, taken from a local run with the metrics data set so that one state sits at the API cap (state_count_limit, 1000).

A state is capped (success at the limit)

dagrun-capped

The Success badge renders as 1000+, and the percentages are hidden for every state in the group. This is the fix: once any count is capped the summed total is only a lower bound, so the per-state percentages computed from it
would be wrong.

Nothing capped (for comparison)

dagrun-normal

When no state reaches the limit, the percentages display as before. The suppression only kicks in for the capped case.

Full dashboard, capped case

dashboard-capped

The same behavior applies to the Task Instances group.

Signed-off-by: wilmerdooley <wilmerdooley1@gmail.com>
@wilmerdooley wilmerdooley force-pushed the fix-67336-dashboard-capped-percentage branch from 4280afe to 24d61a9 Compare June 4, 2026 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:UI Related to UI/UX. For Frontend Developers. backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dashboard summary page shows wrong percentages when a state count exceeds the API cap (1000)

3 participants