Skip to content

Fix import errors total_entries count with multiple DAGs per file#67550

Open
Codingaditya17 wants to merge 5 commits into
apache:mainfrom
Codingaditya17:fix-import-errors-total-entries
Open

Fix import errors total_entries count with multiple DAGs per file#67550
Codingaditya17 wants to merge 5 commits into
apache:mainfrom
Codingaditya17:fix-import-errors-total-entries

Conversation

@Codingaditya17
Copy link
Copy Markdown
Contributor

Why

/api/v2/importErrors can inflate total_entries when one import-error file maps to multiple DAGs.

The route joins ParseImportError with DAG IDs so it can apply per-file authorization and stacktrace redaction. However, paginated_select() counts the joined rows before the route groups them back into one import-error object per file.

As a result, one ParseImportError row can be counted multiple times when the same file contains multiple DAGs.

What changed

This updates the import errors list endpoint to count distinct ParseImportError.id values after applying the filename filters, instead of using the raw joined row count.

The route still uses the joined statement for fetching rows and authorization/redaction behavior, but total_entries now reflects the number of distinct import-error objects returned by the API.

Tests

Added a regression test where one import-error file maps to three DAGs. The API now returns one import-error object and total_entries is 1.

Ran:

uv run pytest airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_import_error.py::TestGetImportErrors::test_total_entries_counts_distinct_import_errors_when_file_has_multiple_dags -q

Result: 1 passed, 1 warning

Ran:

uv run pytest airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_import_error.py -q

Result: 32 passed, 1 warning

Ran:

uv run prek run --files airflow-core/src/airflow/api_fastapi/core_api/routes/public/import_error.py airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_import_error.py

Result: mypy and formatting passed locally. generate-openapi-spec could not run locally because Docker is not running on my machine.

@Codingaditya17
Copy link
Copy Markdown
Contributor Author

It looks like the failing CodeQL jobs are failing during actions/checkout, before any project code or tests run.

The failure is happening at:

Checkout repository -> Fetching the repository -> Error: The process '/usr/bin/git' failed with exit code 128

I do not see a more specific fatal: message in the visible logs, and I do not seem to have permission to rerun the failed jobs from the UI. Could a maintainer please rerun the failed checks when possible?

Local validation for the code change passed:

uv run pytest airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_import_error.py -q

Result: 32 passed, 1 warning

uv run prek run --files airflow-core/src/airflow/api_fastapi/core_api/routes/public/import_error.py airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_import_error.py

Result: mypy and formatting passed locally. The only local hook I could not run was generate-openapi-spec because Docker is not running on my machine.

Copy link
Copy Markdown
Contributor

@henry3260 henry3260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

This fixes total_entries, but the data query still applies limit to the joined rows before grouping. limit=1 should mean one import error object. Could we paginate distinct ParseImportError.id values first, then fetch all joined Dag rows for those IDs before grouping? wdyt?

@Codingaditya17
Copy link
Copy Markdown
Contributor Author

Good point, you’re right.

The current change fixes total_entries, but pagination is still applied to the joined rows before grouping, so limit can still behave incorrectly when one import-error file maps to multiple DAG rows.

I’ll update the PR so pagination is applied to distinct ParseImportError.id values first, then fetch all joined DAG rows for those paginated IDs before grouping. I’ll also extend the regression test to cover limit=1 for a file mapped to multiple DAGs.

@Codingaditya17
Copy link
Copy Markdown
Contributor Author

Good point, you were right.

I updated the PR so pagination is now applied to distinct ParseImportError.id values first. After that, the route fetches all joined Dag rows for those paginated import-error IDs before grouping, so limit and offset apply to import-error objects instead of joined rows.

I also extended the regression test to call the endpoint with limit=1 and verify that the auth check still receives all 3 Dag rows for the file.

Updated in ce208ad6bf.

Local validation:

uv run pytest airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_import_error.py::TestGetImportErrors::test_total_entries_counts_distinct_import_errors_when_file_has_multiple_dags -q

Result: 1 passed, 1 warning

uv run pytest airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_import_error.py -q

Result: 32 passed, 1 warning

uv run prek run --files airflow-core/src/airflow/api_fastapi/core_api/routes/public/import_error.py airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_import_error.py

Result: mypy and formatting passed locally. The generate-openapi-spec hook could not complete in my local setup because my checkout path is also my AIRFLOW_HOME.

Copy link
Copy Markdown
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI isn't happy and need fixing. Tests are failing

@eladkal eladkal added this to the Airflow 3.2.3 milestone May 27, 2026
@eladkal eladkal added type:bug-fix Changelog: Bug Fixes backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch labels May 27, 2026
@Codingaditya17
Copy link
Copy Markdown
Contributor Author

The CI failure is fixed now.

The issue was the distinct pagination subquery selecting only ParseImportError.id while ORDER BY could use fields like filename or timestamp, which Postgres/MySQL require to be present in the selected distinct columns.

I updated the query in 7f6cc207a8, and all checks are now passing.

@Codingaditya17
Copy link
Copy Markdown
Contributor Author

Hi @Henry260, I updated the query in 7f6cc207a8 and all checks are passing now. Could you please take another look when you get a chance?

@potiuk
Copy link
Copy Markdown
Member

potiuk commented Jun 1, 2026

@Codingaditya17 — This PR has new commits since the last review requesting changes from henry3260. Could you address the outstanding review comments and either push a fix or reply in each thread explaining why the feedback doesn't apply? When you believe the threads are resolved, please mark them as resolved and ping the reviewer (henry3260) — they'll either re-review or hand the PR back to the queue. Thanks!


Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you.

@Codingaditya17
Copy link
Copy Markdown
Contributor Author

Thanks, this should be addressed now.

I updated the route so pagination is applied on distinct import-error rows first, and then the joined Dag rows are fetched for those paginated import-error IDs before grouping.

I also fixed the Postgres/MySQL SELECT DISTINCT + ORDER BY issue by including the sortable ParseImportError columns in the distinct pagination subquery.

All checks are passing now after 7f6cc207a8.

@henry3260 could you please re-review when you get a chance?

Copy link
Copy Markdown
Contributor

@henry3260 henry3260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! We are almost there

Copy link
Copy Markdown
Member

@pierrejeambrun pierrejeambrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. We might want to add that test suggested by @henry3260.

Tested and working as expected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:API Airflow's REST/HTTP API backport-to-v3-2-test Mark PR with this label to backport to v3-2-test branch type:bug-fix Changelog: Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

/api/v2/importErrors inflates total_entries when one import-error file maps to multiple DAGs

5 participants