Fix import errors total_entries count with multiple DAGs per file#67550
Fix import errors total_entries count with multiple DAGs per file#67550Codingaditya17 wants to merge 5 commits into
Conversation
|
It looks like the failing CodeQL jobs are failing during The failure is happening at:
I do not see a more specific Local validation for the code change passed:
Result:
Result: mypy and formatting passed locally. The only local hook I could not run was |
henry3260
left a comment
There was a problem hiding this comment.
Nice catch!
This fixes total_entries, but the data query still applies limit to the joined rows before grouping. limit=1 should mean one import error object. Could we paginate distinct ParseImportError.id values first, then fetch all joined Dag rows for those IDs before grouping? wdyt?
|
Good point, you’re right. The current change fixes I’ll update the PR so pagination is applied to distinct |
|
Good point, you were right. I updated the PR so pagination is now applied to distinct I also extended the regression test to call the endpoint with Updated in Local validation:
Result:
Result:
Result: mypy and formatting passed locally. The |
pierrejeambrun
left a comment
There was a problem hiding this comment.
CI isn't happy and need fixing. Tests are failing
|
The CI failure is fixed now. The issue was the distinct pagination subquery selecting only I updated the query in |
|
Hi @Henry260, I updated the query in |
|
@Codingaditya17 — This PR has new commits since the last review requesting changes from Note: This comment was drafted by an AI-assisted triage tool and may contain mistakes. Once you have addressed the points above, an Apache Airflow maintainer — a real person — will take the next look at your PR. We use this two-stage triage process so that our maintainers' limited time is spent where it matters most: the conversation with you. |
|
Thanks, this should be addressed now. I updated the route so pagination is applied on distinct import-error rows first, and then the joined Dag rows are fetched for those paginated import-error IDs before grouping. I also fixed the Postgres/MySQL All checks are passing now after @henry3260 could you please re-review when you get a chance? |
henry3260
left a comment
There was a problem hiding this comment.
Nice! We are almost there
pierrejeambrun
left a comment
There was a problem hiding this comment.
LGTM overall. We might want to add that test suggested by @henry3260.
Tested and working as expected.
Why
/api/v2/importErrorscan inflatetotal_entrieswhen one import-error file maps to multiple DAGs.The route joins
ParseImportErrorwith DAG IDs so it can apply per-file authorization and stacktrace redaction. However,paginated_select()counts the joined rows before the route groups them back into one import-error object per file.As a result, one
ParseImportErrorrow can be counted multiple times when the same file contains multiple DAGs.What changed
This updates the import errors list endpoint to count distinct
ParseImportError.idvalues after applying the filename filters, instead of using the raw joined row count.The route still uses the joined statement for fetching rows and authorization/redaction behavior, but
total_entriesnow reflects the number of distinct import-error objects returned by the API.Tests
Added a regression test where one import-error file maps to three DAGs. The API now returns one import-error object and
total_entriesis1.Ran:
uv run pytest airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_import_error.py::TestGetImportErrors::test_total_entries_counts_distinct_import_errors_when_file_has_multiple_dags -qResult:
1 passed, 1 warningRan:
uv run pytest airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_import_error.py -qResult:
32 passed, 1 warningRan:
uv run prek run --files airflow-core/src/airflow/api_fastapi/core_api/routes/public/import_error.py airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_import_error.pyResult: mypy and formatting passed locally.
generate-openapi-speccould not run locally because Docker is not running on my machine.