Skip to content

Conversation

@nchristo1213
Copy link

fix(schema-compiler): Preserve Full Time Buckets in Period-over-Period Queries

This PR replaces INNER JOIN with LEFT JOIN when stitching sub-queries in outerMeasuresJoinFullKeyQueryAggregate, ensuring all expected time buckets appear when calculating period-over-period (PoP) metrics.

Problem

When PoP queries used INNER JOIN, buckets missing from auxiliary sub-queries (q_1, q_2, ...) were dropped causing incomplete results or incorrect ratios.

Solution

Use LEFT JOIN instead, preserving all keys from the anchor query (q_0). Missing values in auxiliary queries yield NULL rather than removing the row entirely.

LEFT JOIN ${this.wrapInParenthesis((q))} as q_${i + 1} ON ${this.dimensionsJoinCondition(`q_${i}`, `q_${i + 1}`)}

Benefits

  • Retains continuity for PoP metrics
  • Allows ratio calculations to appear accurate when prior period data is missing
  • Ensures accuracy without introducing gaps or dropped time buckets

Tests

  • New fixture: sales cube with date, category, region, and PoP measures
  • Adjusted:
    • CAGR: Added NULL rows
    • multi-stage graph tests: Added secondary ordering
  • New tests:
    • Time-only PoP
    • One-dimension PoP
    • Two-dimension PoP

These tests ensure the correct number of periods appear and that PoP measures return NULL instead of dropping rows when previous-period values are unavailable.

Check List

  • Tests have been run in packages where changes made if available
  • Linter has been run for changed code
  • Tests for the changes have been added if not covered yet
  • Docs have been added / updated if required

resolves #9353

@nchristo1213 nchristo1213 requested a review from a team as a code owner April 22, 2025 21:42
@github-actions github-actions bot added the pr:community Contribution from Cube.js community members. label Apr 22, 2025
@codecov
Copy link

codecov bot commented Apr 23, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 59.02%. Comparing base (1d15182) to head (ea13aaa).
Report is 63 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (1d15182) and HEAD (ea13aaa). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (1d15182) HEAD (ea13aaa)
cubesql 1 0
Additional details and impacted files
@@             Coverage Diff             @@
##           master    #9502       +/-   ##
===========================================
- Coverage   83.89%   59.02%   -24.87%     
===========================================
  Files         229      153       -76     
  Lines       83569    13067    -70502     
  Branches        0     2223     +2223     
===========================================
- Hits        70111     7713    -62398     
+ Misses      13458     5039     -8419     
- Partials        0      315      +315     
Flag Coverage Δ
cube-backend 59.02% <ø> (?)
cubesql ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@nchristo1213 nchristo1213 force-pushed the pop-missing-time-buckets branch from ab02440 to ea13aaa Compare April 24, 2025 01:04
@nchristo1213
Copy link
Author

Hi folks, just following up on this PR. I saw that a review was requested from the schema-compiler team. If there’s anything I can clarify or update, I’d be happy to help. Thanks in advance!

Copy link
Member

@KSDaemon KSDaemon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 LGTM!

…d queries

Switches from INNER JOIN to LEFT JOIN in outerMeasuresJoinFullKeyQueryAggregate to prevent dropped rows when previous-period data is missing.

Adds fixture and tests for period-over-period measures.
.map(
(q, i) => (this.dimensionAliasNames().length ?
`INNER JOIN ${this.wrapInParenthesis((q))} as q_${i + 1} ON ${this.dimensionsJoinCondition(`q_${i}`, `q_${i + 1}`)}` :
`LEFT JOIN ${this.wrapInParenthesis((q))} as q_${i + 1} ON ${this.dimensionsJoinCondition(`q_${i}`, `q_${i + 1}`)}` :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there are few subjoins? And it's probably possible that the result set with gaps may occur on the right side too... So maybe FULL OUTER JOIN should be used here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. I was thinking the root q_0 would always have the full time set, but FULL OUTER JOIN will work too. I can make the code and description updates.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After more analysis, FULL OUTER JOIN won't work here because it seems PostgreSQL requires merge-joinable or hash-joinable join conditions and dimensionsJoinCondition includes other null-handling logic that prevents that (found by re-running tests after just changing to FULL OUTER JOIN). LEFT JOIN still seems appropriate because we can assume the first query q_0 defines the full time range and that is what is preserved when data is missing from later subqueries. The same logic would apply in cases with few subjoins, q_0 is treated as the time source subquery and LEFT JOIN ensures we retain those rows when there is missing data on the right.

@nchristo1213 nchristo1213 force-pushed the pop-missing-time-buckets branch from ea13aaa to 715370b Compare May 21, 2025 13:56
@KSDaemon
Copy link
Member

It seems that LEFT JOIN is bad because the result of the query becomes dependent on the order of measures in the query, and this dependence is not obvious - it depends on the order of measure and the order of multistage cte assembly. Therefore, the current fix can break existing queries or at least change the result of their work.
And making FULL JOIN is really quite difficult: it should work at the level of baseQuery and all dialects. But different databases have their own restrictions and specifics regarding joins, like FULL JOIN is only supported with merge-joinable or hash-joinable join conditions.

We won't solve this problem at the baseQuery level right now. We are solving this problem in Tesseract.

@nchristo1213
Copy link
Author

Thanks for the discussion and feedback. I’ll go ahead and close this PR and take a look at other ways I can contribute.

@KSDaemon
Copy link
Member

@nchristo1213 thank you for your contribution (even though we haven't merged it) and discussion!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr:community Contribution from Cube.js community members.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rolling Window Calculations with Multiple Dimensions

4 participants