-
Notifications
You must be signed in to change notification settings - Fork 204
Ele 5089 use row number instead of rank #2015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 @NoyaArie |
WalkthroughReplaced the window function rank() with row_number() for computing invocations_rank_index across three macros: get_models_runs, get_source_freshness_results, and get_test_results. The ordering criteria remain by generated_at descending within partitions, and downstream filtering/order logic is unchanged apart from the tie-handling semantics. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Q as Query/Macro
participant DB as Warehouse
participant R as Results
Q->>DB: Select rows partitioned by unique_id<br/>order by generated_at desc
Note over Q,DB: Replace rank() with row_number() for invocations_rank_index
DB-->>Q: Rows with invocations_rank_index = row_number()
Q->>Q: Filter where invocations_rank_index <= invocations_per_test
Q-->>R: Return ordered results
Note over R: Deterministic, unique indices per partition
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
elementary/monitor/dbt_project/macros/get_models_runs.sql (1)
4-7: Make row selection deterministic; also cast timestamp in the window ORDER BY.Switching to row_number removes ties, but with only generated_at in ORDER BY, the “winner” for invocations_rank_index = 1 is nondeterministic on equal timestamps and across engines. Add a stable tiebreaker (e.g., invocation_id, id) and use the same timestamp cast you already use elsewhere.
Apply:
- row_number() over (partition by unique_id order by generated_at desc) as invocations_rank_index + row_number() over ( + partition by unique_id + order by {{ elementary.edr_cast_as_timestamp('generated_at') }} desc, invocation_id desc + ) as invocations_rank_indexelementary/monitor/dbt_project/macros/get_test_results.sql (1)
196-197: ClickHouse path: add tiebreakers to ROW_NUMBER for determinism.Same determinism concern; add stable keys available in the SELECT.
Apply:
- ROW_NUMBER() OVER (PARTITION BY elementary_unique_id ORDER BY etr.detected_at DESC) AS invocations_rank_index, + ROW_NUMBER() OVER ( + PARTITION BY elementary_unique_id + ORDER BY etr.detected_at DESC, etr.invocation_id DESC, etr.test_execution_id DESC, etr.id DESC + ) AS invocations_rank_index,elementary/monitor/dbt_project/macros/get_source_freshness_results.sql (1)
14-15: Deterministic row_number and consistent timestamp casting.As with the other macros, use a casted timestamp and stable tiebreakers to avoid nondeterministic “latest” selection when generated_at ties.
Apply:
- row_number() over (partition by unique_id order by generated_at desc) as invocations_rank_index + row_number() over ( + partition by unique_id + order by {{ elementary.edr_cast_as_timestamp('generated_at') }} desc, invocation_id desc, source_freshness_execution_id desc + ) as invocations_rank_indexNote: This may reduce rows passing invocations_rank_index <= invocations_per_test versus previous rank() behavior when many rows share the same generated_at.
Run a quick tie audit:
with s as ( select unique_id, {{ elementary.edr_cast_as_timestamp('generated_at') }} as ts, count(*) as c from {{ ref('elementary', 'dbt_source_freshness_results') }} where {{ elementary.edr_datediff(elementary.edr_cast_as_timestamp('generated_at'), elementary.edr_current_timestamp(), 'day') }} < {{ days_back }} group by unique_id, ts ) select * from s where c > 1 order by c desc limit 50;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (3)
elementary/monitor/dbt_project/macros/get_models_runs.sql(1 hunks)elementary/monitor/dbt_project/macros/get_source_freshness_results.sql(1 hunks)elementary/monitor/dbt_project/macros/get_test_results.sql(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: test / test
- GitHub Check: code-quality
🔇 Additional comments (2)
elementary/monitor/dbt_project/macros/get_models_runs.sql (1)
11-11: LGTM.No concerns with the selected columns.
elementary/monitor/dbt_project/macros/get_test_results.sql (1)
17-18: Update stale comment and add deterministic tiebreakers.Comment still says “so we use rank,” but code uses row_number. Also, add stable tiebreakers to avoid nondeterministic picks when detected_at ties happen (common with coarse timestamp precision).
Apply:
- {# When we split test into multiple test results, we want to have the same invocation order for the test results from the same run so we use rank. #} - row_number() over (partition by elementary_unique_id order by {{elementary.edr_cast_as_timestamp('detected_at')}} desc) as invocations_rank_index + {# Deterministic per-run ordering: use row_number with stable tiebreakers. #} + row_number() over ( + partition by elementary_unique_id + order by {{ elementary.edr_cast_as_timestamp('detected_at') }} desc, invocation_id desc, test_execution_id desc, id desc + ) as invocations_rank_indexAlso verify downstream assumptions where invocations_rank_index <= invocations_per_test might now include fewer rows under heavy tie scenarios.
Provide a quick check in your warehouse:
-- How often do ties occur today? with t as ( {{ elementary_cli.current_tests_run_results_query(days_back=1) }} ) select elementary_unique_id, count(*) as cnt from t group by elementary_unique_id, {{ elementary.edr_cast_as_timestamp('detected_at') }} having count(*) > 1 order by cnt desc limit 50;
null
Summary by CodeRabbit