Skip to content

Unexpected GROUP BY when using multi_stage and referencing dimensions with {}Β #9241

@victormorenoi

Description

@victormorenoi

Describe the bug
When using {} to reference a dimension in Cube with multi_stage: true, the generated query introduces an unnecessary GROUP BY, which alters the final aggregation results.

To Reproduce
I have the following dimension definitions:

- name: count  
  type: sum  
  sql: "{multibrand_weight}"  

- name: count_total  
  type: sum  
  multi_stage: true  
  sql: "{count}"  
  group_by:  
    - retailer_name  
    - category_name

If I define count as sql: multibrand_weight (without {}), referencing the table column directly, the query works correctly, applying the aggregation directly at the table level.
However, when using sql: "{multibrand_weight}", referencing the Cube dimensions, Cube first executes a SELECT on the dimension and introduces an unnecessary GROUP BY that changes the final result.

Expected behavior
I would expect the same behavior when calling the Cube dimension, {}, as when directly referencing the table column.

Example: beginning of the Queries:
βœ… Correct (without {}):

WITH cte_0 AS (  
  SELECT  
    "test_cube"."BRAND_NAME" "test_cube__brand_name",  
    sum("test_cube".multibrand_weight) "test_cube__count"  
  FROM "SH_CMI"."MART_MEDIA_HASHTAG" AS "test_cube"  
  WHERE ...  
  GROUP BY 1  
  ORDER BY 2 DESC  
)

❌ Incorrect (with {}):

WITH cte_0 AS (  
  SELECT  
    "test_cube"."BRAND_NAME" "test_cube__brand_name",  
    "test_cube"."MULTIBRAND_WEIGHT" "test_cube__multibrand_weight"  
  FROM "SH_CMI"."MART_MEDIA_HASHTAG" AS "test_cube"  
  WHERE ...  
  GROUP BY 1, 2  
  ORDER BY 1 ASC  
),  
cte_1 AS (  
  SELECT  
    "test_cube__brand_name",  
    sum("test_cube__multibrand_weight") "test_cube__count"  
  FROM cte_0  
  GROUP BY 1  
  ORDER BY 2 DESC  
)

The GROUP BY performed in the initial SELECT causes the data to be grouped as if applying a DISTINCT, leading to incorrect final results.
I would expect the same behavior when calling the Cube dimension, {}, as when directly referencing the table column.
I get the following questions

  • Is this expected behavior when referencing dimensions using {}?
  • Is there a way to avoid the extra GROUP BY while still referencing the dimension?
  • What would be the best practice to define a dimension that depends on another without altering the query logic?

This question is crucial for us, as our modeling heavily relies on multi_stage, both for total calculations and for time_shift usage. The same issue occurs with any measure type, including count, avg, sum, etc.
Thank you so much! πŸ™Œ

Version:
[Latest 1.2.5]

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions