Skip to content

Conversation

@NoyaArie
Copy link
Contributor

@NoyaArie NoyaArie commented Sep 22, 2025

null

Summary by CodeRabbit

  • Bug Fixes
    • Made recent-invocation selection deterministic across tests, models, and sources, eliminating tie-related inconsistencies.
    • Ensures unique, sequential invocation indices per resource, preventing gaps and duplicate rankings.
    • Improves reliability of filters based on invocations-per-test and stabilizes result ordering across environments.
    • Reduces flakiness in dashboards, comparisons, and alerts caused by non-deterministic ordering in tied timestamps.

@linear
Copy link

linear bot commented Sep 22, 2025

@github-actions
Copy link
Contributor

👋 @NoyaArie
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in this pull request.

@coderabbitai
Copy link

coderabbitai bot commented Sep 22, 2025

Walkthrough

Replaced the window function rank() with row_number() for computing invocations_rank_index across three macros: get_models_runs, get_source_freshness_results, and get_test_results. The ordering criteria remain by generated_at descending within partitions, and downstream filtering/order logic is unchanged apart from the tie-handling semantics.

Changes

Cohort / File(s) Summary
Window function update: rank() ➜ row_number()
elementary/monitor/dbt_project/macros/get_models_runs.sql, elementary/monitor/dbt_project/macros/get_source_freshness_results.sql, elementary/monitor/dbt_project/macros/get_test_results.sql
Compute invocations_rank_index with row_number() over partitions (by unique identifier) ordered by generated_at desc; replaces rank() to enforce strictly increasing, tie-free indices; associated filters/order clauses remain structurally the same.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Q as Query/Macro
    participant DB as Warehouse
    participant R as Results

    Q->>DB: Select rows partitioned by unique_id<br/>order by generated_at desc
    Note over Q,DB: Replace rank() with row_number() for invocations_rank_index
    DB-->>Q: Rows with invocations_rank_index = row_number()
    Q->>Q: Filter where invocations_rank_index <= invocations_per_test
    Q-->>R: Return ordered results
    Note over R: Deterministic, unique indices per partition
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I hop through rows, a tidy run,
From ranks with ties to numbers one-by-one.
No gaps, no fuss—just clean ascent,
A carrot-straight, deterministic intent.
Query fields gleam; my whiskers twirl—done!

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Ele 5089 use row number instead of rank" directly and concisely summarizes the primary change in the diff — replacing rank() with row_number() across multiple SQL macros — and is specific enough for a reviewer to understand the main intent while including the issue reference.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ele-5089-use-row-number-instead-of-rank

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
elementary/monitor/dbt_project/macros/get_models_runs.sql (1)

4-7: Make row selection deterministic; also cast timestamp in the window ORDER BY.

Switching to row_number removes ties, but with only generated_at in ORDER BY, the “winner” for invocations_rank_index = 1 is nondeterministic on equal timestamps and across engines. Add a stable tiebreaker (e.g., invocation_id, id) and use the same timestamp cast you already use elsewhere.

Apply:

-                row_number() over (partition by unique_id order by generated_at desc) as invocations_rank_index
+                row_number() over (
+                  partition by unique_id
+                  order by {{ elementary.edr_cast_as_timestamp('generated_at') }} desc, invocation_id desc
+                ) as invocations_rank_index
elementary/monitor/dbt_project/macros/get_test_results.sql (1)

196-197: ClickHouse path: add tiebreakers to ROW_NUMBER for determinism.

Same determinism concern; add stable keys available in the SELECT.

Apply:

-        ROW_NUMBER() OVER (PARTITION BY elementary_unique_id ORDER BY etr.detected_at DESC) AS invocations_rank_index,
+        ROW_NUMBER() OVER (
+          PARTITION BY elementary_unique_id
+          ORDER BY etr.detected_at DESC, etr.invocation_id DESC, etr.test_execution_id DESC, etr.id DESC
+        ) AS invocations_rank_index,
elementary/monitor/dbt_project/macros/get_source_freshness_results.sql (1)

14-15: Deterministic row_number and consistent timestamp casting.

As with the other macros, use a casted timestamp and stable tiebreakers to avoid nondeterministic “latest” selection when generated_at ties.

Apply:

-                row_number() over (partition by unique_id order by generated_at desc) as invocations_rank_index
+                row_number() over (
+                  partition by unique_id
+                  order by {{ elementary.edr_cast_as_timestamp('generated_at') }} desc, invocation_id desc, source_freshness_execution_id desc
+                ) as invocations_rank_index

Note: This may reduce rows passing invocations_rank_index <= invocations_per_test versus previous rank() behavior when many rows share the same generated_at.

Run a quick tie audit:

with s as (
  select unique_id, {{ elementary.edr_cast_as_timestamp('generated_at') }} as ts, count(*) as c
  from {{ ref('elementary', 'dbt_source_freshness_results') }}
  where {{ elementary.edr_datediff(elementary.edr_cast_as_timestamp('generated_at'), elementary.edr_current_timestamp(), 'day') }} < {{ days_back }}
  group by unique_id, ts
)
select * from s where c > 1 order by c desc limit 50;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between acdac83 and 49e34e3.

📒 Files selected for processing (3)
  • elementary/monitor/dbt_project/macros/get_models_runs.sql (1 hunks)
  • elementary/monitor/dbt_project/macros/get_source_freshness_results.sql (1 hunks)
  • elementary/monitor/dbt_project/macros/get_test_results.sql (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: test / test
  • GitHub Check: code-quality
🔇 Additional comments (2)
elementary/monitor/dbt_project/macros/get_models_runs.sql (1)

11-11: LGTM.

No concerns with the selected columns.

elementary/monitor/dbt_project/macros/get_test_results.sql (1)

17-18: Update stale comment and add deterministic tiebreakers.

Comment still says “so we use rank,” but code uses row_number. Also, add stable tiebreakers to avoid nondeterministic picks when detected_at ties happen (common with coarse timestamp precision).

Apply:

-                {# When we split test into multiple test results, we want to have the same invocation order for the test results from the same run so we use rank. #}
-                row_number() over (partition by elementary_unique_id order by {{elementary.edr_cast_as_timestamp('detected_at')}} desc) as invocations_rank_index
+                {# Deterministic per-run ordering: use row_number with stable tiebreakers. #}
+                row_number() over (
+                  partition by elementary_unique_id
+                  order by {{ elementary.edr_cast_as_timestamp('detected_at') }} desc, invocation_id desc, test_execution_id desc, id desc
+                ) as invocations_rank_index

Also verify downstream assumptions where invocations_rank_index <= invocations_per_test might now include fewer rows under heavy tie scenarios.

Provide a quick check in your warehouse:

-- How often do ties occur today?
with t as (
  {{ elementary_cli.current_tests_run_results_query(days_back=1) }}
)
select elementary_unique_id, count(*) as cnt
from t
group by elementary_unique_id, {{ elementary.edr_cast_as_timestamp('detected_at') }}
having count(*) > 1
order by cnt desc
limit 50;

@NoyaArie NoyaArie merged commit d7f7d9b into master Sep 22, 2025
7 checks passed
@NoyaArie NoyaArie deleted the ele-5089-use-row-number-instead-of-rank branch September 22, 2025 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants