-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
CubeJS generates invalid SQL aliases when querying measures that use joined cubes and perform calculations using other aggregated measures.
To Reproduce
- Go to Cube Playground.
- Query the "Calc" and "FinalCalc" measures from "cube_b".
- The query displays results with no errors.
- Query only the "FinalCalc" measure from "cube_b".
- See error:
Query failed: Error: Unknown expression or function identifier
cube_b.int_column1in scope SELECT (multiIf((sum(cube_b.int_column1) - sum(cube_b.int_column2)) > 0, 4 / 2, 0) * cube_a___my_number) / 4 AS cube_b___final_calc FROM (SELECT sum(cube_a_key__cube_a.int_column) AS cube_a___my_number FROM (SELECT DISTINCT cube_a_key__cube_a.pk_column AS cube_a___key FROM second_table AS cube_a_key__cube_b LEFT JOIN first_table AS cube_a_key__cube_a ON cube_a_key__cube_a.common_column = cube_a_key__cube_b.common_column) AS keys LEFT JOIN first_table AS cube_a_key__cube_a ON keys.cube_a___key = cube_a_key__cube_a.pk_column) AS q_0 LIMIT 10000. ; query id: 6c415a29-ed13-4735-b2e3-d733fd801035
Expected behavior
When querying both the "Calc" and "FinalCalc" measures, CubeJS generates the right aliases and correct ClickHouse SQL:
SELECT
`cube_b___calc` `cube_b___calc`,
`cube_b___calc` * `cube_a___my_number` / 4 `cube_b___final_calc`
FROM
(
SELECT
CASE
WHEN sum(`main__cube_b`.int_column1) - sum(`main__cube_b`.int_column2) > 0 THEN 4 / 2
ELSE 0
END `cube_b___calc`
FROM
second_table AS `main__cube_b`
LEFT JOIN first_table AS `main__cube_a` ON `main__cube_a`.common_column = `main__cube_b`.common_column
) as q_0,
(
SELECT
sum(`cube_a_key__cube_a`.int_column) `cube_a___my_number`
FROM
(
SELECT
DISTINCT `cube_a_key__cube_a`.pk_column `cube_a___key`
FROM
second_table AS `cube_a_key__cube_b`
LEFT JOIN first_table AS `cube_a_key__cube_a` ON `cube_a_key__cube_a`.common_column = `cube_a_key__cube_b`.common_column
) AS `keys`
LEFT JOIN first_table AS `cube_a_key__cube_a` ON `keys`.`cube_a___key` = `cube_a_key__cube_a`.pk_column
) as q_1
LIMIT
10000
However, when querying just the "FinalCalc" measure, the generated SQL is invalid, as CubeJS tries to use a cube_b alias that is not created anywhere:
SELECT
CASE
WHEN sum(`cube_b`.int_column1) - sum(`cube_b`.int_column2) > 0 THEN 4 / 2
ELSE 0
END * `cube_a___my_number` / 4 `cube_b___final_calc`
FROM
(
SELECT
sum(`cube_a_key__cube_a`.int_column) `cube_a___my_number`
FROM
(
SELECT
DISTINCT `cube_a_key__cube_a`.pk_column `cube_a___key`
FROM
second_table AS `cube_a_key__cube_b`
LEFT JOIN first_table AS `cube_a_key__cube_a` ON `cube_a_key__cube_a`.common_column = `cube_a_key__cube_b`.common_column
) AS `keys`
LEFT JOIN first_table AS `cube_a_key__cube_a` ON `keys`.`cube_a___key` = `cube_a_key__cube_a`.pk_column
) as q_0
LIMIT
10000
Minimally reproducible Cube Schema
cube_a.yml:
cubes:
- name: cube_a
sql_table: first_table
joins: []
dimensions:
- name: Key
sql: "pk_column"
type: string
primary_key: true
- name: Common
sql: "common_column"
type: string
measures:
- name: MyNumber
sql: "int_column"
type: sumcube_b.yml:
cubes:
- name: cube_b
sql_table: second_table
joins:
- name: cube_a
relationship: many_to_one
sql: "{cube_a.Common} = {CUBE}.common_column"
dimensions:
- name: Common
sql: "common_column"
type: string
primary_key: true
measures:
- name: FirstNumber
sql: "int_column1"
type: sum
- name: SecondNumber
sql: "int_column2"
type: sum
- name: Diff
sql: "{FirstNumber} - {SecondNumber}"
type: number
- name: Calc
sql: >
CASE
WHEN {Diff} > 0
THEN 4/2
ELSE 0
END
type: number
- name: AnotherCalc
sql: "{cube_a.MyNumber}/4"
type: number
- name: FinalCalc
sql: "{Calc} * {AnotherCalc}"
type: numberVersion:
1.3.44
Additional context
Tested against ClickHouse 25.3.3.42, using the following tables/data:
CREATE TABLE first_table (
pk_column String,
common_column String,
int_column Int64,
PRIMARY KEY (pk_column)
) ENGINE = MergeTree()
ORDER BY pk_column;
CREATE TABLE second_table (
common_column String,
int_column1 Int64,
int_column2 Int64,
PRIMARY KEY (common_column)
) ENGINE = MergeTree()
ORDER BY common_column;
INSERT INTO first_table (pk_column, common_column, int_column) VALUES
('pk1', 'common1', 100),
('pk2', 'common2', 200);
INSERT INTO second_table (common_column, int_column1, int_column2) VALUES
('common1', 300, 400),
('common2', 500, 600);
