Ele 5089 use row number instead of rank #2015

NoyaArie · 2025-09-22T09:09:19Z

null

Summary by CodeRabbit

Bug Fixes
- Made recent-invocation selection deterministic across tests, models, and sources, eliminating tie-related inconsistencies.
- Ensures unique, sequential invocation indices per resource, preventing gaps and duplicate rankings.
- Improves reliability of filters based on invocations-per-test and stabilizes result ordering across environments.
- Reduces flakiness in dashboards, comparisons, and alerts caused by non-deterministic ordering in tied timestamps.

linear · 2025-09-22T09:09:21Z

github-actions · 2025-09-22T09:09:31Z

👋 @NoyaArie
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in this pull request.

coderabbitai · 2025-09-22T09:09:39Z

Walkthrough

Replaced the window function rank() with row_number() for computing invocations_rank_index across three macros: get_models_runs, get_source_freshness_results, and get_test_results. The ordering criteria remain by generated_at descending within partitions, and downstream filtering/order logic is unchanged apart from the tie-handling semantics.

Changes

Cohort / File(s)	Summary
Window function update: rank() ➜ row_number() `elementary/monitor/dbt_project/macros/get_models_runs.sql`, `elementary/monitor/dbt_project/macros/get_source_freshness_results.sql`, `elementary/monitor/dbt_project/macros/get_test_results.sql`	Compute invocations_rank_index with row_number() over partitions (by unique identifier) ordered by generated_at desc; replaces rank() to enforce strictly increasing, tie-free indices; associated filters/order clauses remain structurally the same.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Q as Query/Macro
    participant DB as Warehouse
    participant R as Results

    Q->>DB: Select rows partitioned by unique_id<br/>order by generated_at desc
    Note over Q,DB: Replace rank() with row_number() for invocations_rank_index
    DB-->>Q: Rows with invocations_rank_index = row_number()
    Q->>Q: Filter where invocations_rank_index <= invocations_per_test
    Q-->>R: Return ordered results
    Note over R: Deterministic, unique indices per partition

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I hop through rows, a tidy run,
From ranks with ties to numbers one-by-one.
No gaps, no fuss—just clean ascent,
A carrot-straight, deterministic intent.
Query fields gleam; my whiskers twirl—done!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "Ele 5089 use row number instead of rank" directly and concisely summarizes the primary change in the diff — replacing rank() with row_number() across multiple SQL macros — and is specific enough for a reviewer to understand the main intent while including the issue reference.
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

✨ Finishing touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ele-5089-use-row-number-instead-of-rank

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

elementary/monitor/dbt_project/macros/get_models_runs.sql (1)
4-7: Make row selection deterministic; also cast timestamp in the window ORDER BY.

Switching to row_number removes ties, but with only generated_at in ORDER BY, the “winner” for invocations_rank_index = 1 is nondeterministic on equal timestamps and across engines. Add a stable tiebreaker (e.g., invocation_id, id) and use the same timestamp cast you already use elsewhere.

Apply:
-                row_number() over (partition by unique_id order by generated_at desc) as invocations_rank_index
+                row_number() over (
+                  partition by unique_id
+                  order by {{ elementary.edr_cast_as_timestamp('generated_at') }} desc, invocation_id desc
+                ) as invocations_rank_index
elementary/monitor/dbt_project/macros/get_test_results.sql (1)
196-197: ClickHouse path: add tiebreakers to ROW_NUMBER for determinism.

Same determinism concern; add stable keys available in the SELECT.

Apply:
-        ROW_NUMBER() OVER (PARTITION BY elementary_unique_id ORDER BY etr.detected_at DESC) AS invocations_rank_index,
+        ROW_NUMBER() OVER (
+          PARTITION BY elementary_unique_id
+          ORDER BY etr.detected_at DESC, etr.invocation_id DESC, etr.test_execution_id DESC, etr.id DESC
+        ) AS invocations_rank_index,
elementary/monitor/dbt_project/macros/get_source_freshness_results.sql (1)
14-15: Deterministic row_number and consistent timestamp casting.

As with the other macros, use a casted timestamp and stable tiebreakers to avoid nondeterministic “latest” selection when generated_at ties.

Apply:
-                row_number() over (partition by unique_id order by generated_at desc) as invocations_rank_index
+                row_number() over (
+                  partition by unique_id
+                  order by {{ elementary.edr_cast_as_timestamp('generated_at') }} desc, invocation_id desc, source_freshness_execution_id desc
+                ) as invocations_rank_index
Note: This may reduce rows passing invocations_rank_index <= invocations_per_test versus previous rank() behavior when many rows share the same generated_at.

Run a quick tie audit:
with s as (
  select unique_id, {{ elementary.edr_cast_as_timestamp('generated_at') }} as ts, count(*) as c
  from {{ ref('elementary', 'dbt_source_freshness_results') }}
  where {{ elementary.edr_datediff(elementary.edr_cast_as_timestamp('generated_at'), elementary.edr_current_timestamp(), 'day') }} < {{ days_back }}
  group by unique_id, ts
)
select * from s where c > 1 order by c desc limit 50;

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between acdac83 and 49e34e3.

📒 Files selected for processing (3)

elementary/monitor/dbt_project/macros/get_models_runs.sql (1 hunks)
elementary/monitor/dbt_project/macros/get_source_freshness_results.sql (1 hunks)
elementary/monitor/dbt_project/macros/get_test_results.sql (2 hunks)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: test / test
GitHub Check: code-quality

🔇 Additional comments (2)

elementary/monitor/dbt_project/macros/get_models_runs.sql (1)

11-11: LGTM.

No concerns with the selected columns.
elementary/monitor/dbt_project/macros/get_test_results.sql (1)
17-18: Update stale comment and add deterministic tiebreakers.

Comment still says “so we use rank,” but code uses row_number. Also, add stable tiebreakers to avoid nondeterministic picks when detected_at ties happen (common with coarse timestamp precision).

Apply:
-                {# When we split test into multiple test results, we want to have the same invocation order for the test results from the same run so we use rank. #}
-                row_number() over (partition by elementary_unique_id order by {{elementary.edr_cast_as_timestamp('detected_at')}} desc) as invocations_rank_index
+                {# Deterministic per-run ordering: use row_number with stable tiebreakers. #}
+                row_number() over (
+                  partition by elementary_unique_id
+                  order by {{ elementary.edr_cast_as_timestamp('detected_at') }} desc, invocation_id desc, test_execution_id desc, id desc
+                ) as invocations_rank_index
Also verify downstream assumptions where invocations_rank_index <= invocations_per_test might now include fewer rows under heavy tie scenarios.

Provide a quick check in your warehouse:
-- How often do ties occur today?
with t as (
  {{ elementary_cli.current_tests_run_results_query(days_back=1) }}
)
select elementary_unique_id, count(*) as cnt
from t
group by elementary_unique_id, {{ elementary.edr_cast_as_timestamp('detected_at') }}
having count(*) > 1
order by cnt desc
limit 50;

NoyaArie added 3 commits September 22, 2025 12:07

change for model runs

828ff45

change for source freshness results

b2f6705

change for test results

49e34e3

coderabbitai bot reviewed Sep 22, 2025

View reviewed changes

ofek1weiss approved these changes Sep 22, 2025

View reviewed changes

NoyaArie merged commit d7f7d9b into master Sep 22, 2025
7 checks passed

NoyaArie deleted the ele-5089-use-row-number-instead-of-rank branch September 22, 2025 09:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Ele 5089 use row number instead of rank #2015

Ele 5089 use row number instead of rank #2015

Uh oh!

NoyaArie commented Sep 22, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

linear bot commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

coderabbitai bot commented Sep 22, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Ele 5089 use row number instead of rank #2015

Ele 5089 use row number instead of rank #2015

Uh oh!

Conversation

NoyaArie commented Sep 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

linear bot commented Sep 22, 2025

Uh oh!

github-actions bot commented Sep 22, 2025

Uh oh!

coderabbitai bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

NoyaArie commented Sep 22, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 22, 2025 •

edited

Loading