-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
When using {} to reference a dimension in Cube with multi_stage: true, the generated query introduces an unnecessary GROUP BY, which alters the final aggregation results.
To Reproduce
I have the following dimension definitions:
- name: count
type: sum
sql: "{multibrand_weight}"
- name: count_total
type: sum
multi_stage: true
sql: "{count}"
group_by:
- retailer_name
- category_name
If I define count as sql: multibrand_weight (without {}), referencing the table column directly, the query works correctly, applying the aggregation directly at the table level.
However, when using sql: "{multibrand_weight}", referencing the Cube dimensions, Cube first executes a SELECT on the dimension and introduces an unnecessary GROUP BY that changes the final result.
Expected behavior
I would expect the same behavior when calling the Cube dimension, {}, as when directly referencing the table column.
Example: beginning of the Queries:
β
Correct (without {}):
WITH cte_0 AS (
SELECT
"test_cube"."BRAND_NAME" "test_cube__brand_name",
sum("test_cube".multibrand_weight) "test_cube__count"
FROM "SH_CMI"."MART_MEDIA_HASHTAG" AS "test_cube"
WHERE ...
GROUP BY 1
ORDER BY 2 DESC
)
β Incorrect (with {}):
WITH cte_0 AS (
SELECT
"test_cube"."BRAND_NAME" "test_cube__brand_name",
"test_cube"."MULTIBRAND_WEIGHT" "test_cube__multibrand_weight"
FROM "SH_CMI"."MART_MEDIA_HASHTAG" AS "test_cube"
WHERE ...
GROUP BY 1, 2
ORDER BY 1 ASC
),
cte_1 AS (
SELECT
"test_cube__brand_name",
sum("test_cube__multibrand_weight") "test_cube__count"
FROM cte_0
GROUP BY 1
ORDER BY 2 DESC
)
The GROUP BY performed in the initial SELECT causes the data to be grouped as if applying a DISTINCT, leading to incorrect final results.
I would expect the same behavior when calling the Cube dimension, {}, as when directly referencing the table column.
I get the following questions
- Is this expected behavior when referencing dimensions using
{}? - Is there a way to avoid the extra GROUP BY while still referencing the dimension?
- What would be the best practice to define a dimension that depends on another without altering the query logic?
This question is crucial for us, as our modeling heavily relies on multi_stage, both for total calculations and for time_shift usage. The same issue occurs with any measure type, including count, avg, sum, etc.
Thank you so much! π
Version:
[Latest 1.2.5]