Skip to content

Fix total_entries inflation in GET /api/v2/importErrors when file has multiple DAGs#67640

Closed
GayathriSrividya wants to merge 4 commits into
apache:mainfrom
GayathriSrividya:fix/import-error-total-entries-67525
Closed

Fix total_entries inflation in GET /api/v2/importErrors when file has multiple DAGs#67640
GayathriSrividya wants to merge 4 commits into
apache:mainfrom
GayathriSrividya:fix/import-error-total-entries-67525

Conversation

@GayathriSrividya
Copy link
Copy Markdown
Contributor

@GayathriSrividya GayathriSrividya commented May 28, 2026

closes: #67525

When a single import-error file mapped to N DAGs, `GET /api/v2/importErrors` returned an inflated `total_entries` (N× the real count) and incorrect pagination behaviour.

Root cause: The query JOINed `ParseImportError` with `file_dags_cte` (one row per DAG per file), producing N rows per import error. `paginated_select` counted those N rows and applied `LIMIT`/`OFFSET` against joined rows rather than distinct import-error objects.

Fix: Two-query approach:

  1. `dedup_stmt` with `.distinct()` — one row per import error — fed to `paginated_select` for correct `total_entries` and pagination.
  2. `import_errors_stmt` — full JOIN restricted to the paginated IDs only — used to gather `dag_id` associations for auth checks and stacktrace redaction.

Tests: Added regression test `test_total_entries_counts_distinct_import_errors_when_file_has_multiple_dags` that creates one `ParseImportError` with three associated `DagModel` rows and asserts `total_entries == 1` and the list endpoint returns exactly one entry with `limit=1`.

@boring-cyborg
Copy link
Copy Markdown

boring-cyborg Bot commented May 28, 2026

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contributors' Guide
Here are some useful points:

  • Pay attention to the quality of your code (ruff, mypy and type annotations). Our prek-hooks will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example Dag that shows how users should use it.
  • Consider using Breeze environment for testing locally, it's a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style.
  • Always keep your Pull Requests rebased, otherwise your build might fail due to changes not related to your commits.
    Apache Airflow is a community-driven project and together we are making it better 🚀.
    In case of doubts contact the developers at:
    Mailing List: dev@airflow.apache.org
    Slack: https://s.apache.org/airflow-slack

Comment thread airflow-core/newsfragments/67525.bugfix.rst Outdated
@GayathriSrividya GayathriSrividya force-pushed the fix/import-error-total-entries-67525 branch 5 times, most recently from c3cefd8 to 34bd56c Compare June 2, 2026 08:02
GayathriSrividya added a commit to GayathriSrividya/airflow that referenced this pull request Jun 2, 2026
@GayathriSrividya GayathriSrividya force-pushed the fix/import-error-total-entries-67525 branch from 8f9d6e6 to 667792a Compare June 2, 2026 15:32
GayathriSrividya added a commit to GayathriSrividya/airflow that referenced this pull request Jun 2, 2026
Gayathri Srividya Rajavarapu and others added 4 commits June 2, 2026 21:47
… multiple DAGs

When a single import-error file mapped to N DAGs, the previous query
JOINed ParseImportError with file_dags_cte producing N rows per error.
paginated_select then counted those N rows, inflating total_entries and
applying LIMIT/OFFSET against joined rows rather than distinct errors.

Fix uses a two-query approach:
1. dedup_stmt with DISTINCT - one row per import error for correct count
   and pagination via paginated_select
2. import_errors_stmt - full join only for the paginated IDs to gather
   dag_id associations for authorization/stacktrace redaction

Closes apache#67525
@GayathriSrividya GayathriSrividya force-pushed the fix/import-error-total-entries-67525 branch from 667792a to 253261d Compare June 2, 2026 16:17
@GayathriSrividya
Copy link
Copy Markdown
Contributor Author

Closing in favour of #67550, which addresses the same root cause (#67525) and has already received a maintainer approval from @pierrejeambrun and been assigned to the Airflow 3.2.3 milestone. Thanks to @Codingaditya17 for the parallel work — the two-query pagination approach we both converged on was the right call.


Drafted-by: GitHub Copilot (Claude Sonnet 4.6); reviewed by @GayathriSrividya before posting

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

/api/v2/importErrors inflates total_entries when one import-error file maps to multiple DAGs

2 participants