Skip to content

Invalid SQL Generation for ClickHouse datasource when joining cubesΒ #9830

@bernardolopes8

Description

@bernardolopes8

Describe the bug
CubeJS generates invalid SQL aliases when querying measures that use joined cubes and perform calculations using other aggregated measures.

To Reproduce

  1. Go to Cube Playground.
  2. Query the "Calc" and "FinalCalc" measures from "cube_b".
  3. The query displays results with no errors.
  4. Query only the "FinalCalc" measure from "cube_b".
  5. See error:

Query failed: Error: Unknown expression or function identifier cube_b.int_column1 in scope SELECT (multiIf((sum(cube_b.int_column1) - sum(cube_b.int_column2)) > 0, 4 / 2, 0) * cube_a___my_number) / 4 AS cube_b___final_calc FROM (SELECT sum(cube_a_key__cube_a.int_column) AS cube_a___my_number FROM (SELECT DISTINCT cube_a_key__cube_a.pk_column AS cube_a___key FROM second_table AS cube_a_key__cube_b LEFT JOIN first_table AS cube_a_key__cube_a ON cube_a_key__cube_a.common_column = cube_a_key__cube_b.common_column) AS keys LEFT JOIN first_table AS cube_a_key__cube_a ON keys.cube_a___key = cube_a_key__cube_a.pk_column) AS q_0 LIMIT 10000. ; query id: 6c415a29-ed13-4735-b2e3-d733fd801035

Expected behavior
When querying both the "Calc" and "FinalCalc" measures, CubeJS generates the right aliases and correct ClickHouse SQL:

SELECT
  `cube_b___calc` `cube_b___calc`,
  `cube_b___calc` * `cube_a___my_number` / 4 `cube_b___final_calc`
FROM
  (
    SELECT
      CASE
        WHEN sum(`main__cube_b`.int_column1) - sum(`main__cube_b`.int_column2) > 0 THEN 4 / 2
        ELSE 0
      END `cube_b___calc`
    FROM
      second_table AS `main__cube_b`
      LEFT JOIN first_table AS `main__cube_a` ON `main__cube_a`.common_column = `main__cube_b`.common_column
  ) as q_0,
  (
    SELECT
      sum(`cube_a_key__cube_a`.int_column) `cube_a___my_number`
    FROM
      (
        SELECT
          DISTINCT `cube_a_key__cube_a`.pk_column `cube_a___key`
        FROM
          second_table AS `cube_a_key__cube_b`
          LEFT JOIN first_table AS `cube_a_key__cube_a` ON `cube_a_key__cube_a`.common_column = `cube_a_key__cube_b`.common_column
      ) AS `keys`
      LEFT JOIN first_table AS `cube_a_key__cube_a` ON `keys`.`cube_a___key` = `cube_a_key__cube_a`.pk_column
  ) as q_1
LIMIT
  10000

However, when querying just the "FinalCalc" measure, the generated SQL is invalid, as CubeJS tries to use a cube_b alias that is not created anywhere:

SELECT
  CASE
    WHEN sum(`cube_b`.int_column1) - sum(`cube_b`.int_column2) > 0 THEN 4 / 2
    ELSE 0
  END * `cube_a___my_number` / 4 `cube_b___final_calc`
FROM
  (
    SELECT
      sum(`cube_a_key__cube_a`.int_column) `cube_a___my_number`
    FROM
      (
        SELECT
          DISTINCT `cube_a_key__cube_a`.pk_column `cube_a___key`
        FROM
          second_table AS `cube_a_key__cube_b`
          LEFT JOIN first_table AS `cube_a_key__cube_a` ON `cube_a_key__cube_a`.common_column = `cube_a_key__cube_b`.common_column
      ) AS `keys`
      LEFT JOIN first_table AS `cube_a_key__cube_a` ON `keys`.`cube_a___key` = `cube_a_key__cube_a`.pk_column
  ) as q_0
LIMIT
  10000

Screenshots
Image

Minimally reproducible Cube Schema

cube_a.yml:

cubes:
  - name: cube_a
    sql_table: first_table

    joins: []

    dimensions:

      - name: Key
        sql: "pk_column"
        type: string
        primary_key: true

      - name: Common
        sql: "common_column"
        type: string

    measures:
      - name: MyNumber
        sql: "int_column"
        type: sum

cube_b.yml:

cubes:
  - name: cube_b
    sql_table: second_table

    joins:
      - name: cube_a
        relationship: many_to_one
        sql: "{cube_a.Common} = {CUBE}.common_column"

    dimensions:
      - name: Common
        sql: "common_column"
        type: string
        primary_key: true

    measures:
      - name: FirstNumber
        sql: "int_column1"
        type: sum

      - name: SecondNumber
        sql: "int_column2"
        type: sum

      - name: Diff
        sql: "{FirstNumber} - {SecondNumber}"
        type: number

      - name: Calc
        sql: >
          CASE
            WHEN {Diff} > 0 
            THEN 4/2
            ELSE 0 
          END
        type: number

      - name: AnotherCalc
        sql: "{cube_a.MyNumber}/4"
        type: number

      - name: FinalCalc
        sql: "{Calc} * {AnotherCalc}"
        type: number

Version:
1.3.44

Additional context
Tested against ClickHouse 25.3.3.42, using the following tables/data:

CREATE TABLE first_table (
    pk_column String,
    common_column String,
    int_column Int64,
    PRIMARY KEY (pk_column)
) ENGINE = MergeTree()
ORDER BY pk_column;

CREATE TABLE second_table (
    common_column String,
    int_column1 Int64,
    int_column2 Int64,
    PRIMARY KEY (common_column)
) ENGINE = MergeTree()
ORDER BY common_column;

INSERT INTO first_table (pk_column, common_column, int_column) VALUES
('pk1', 'common1', 100),
('pk2', 'common2', 200);

INSERT INTO second_table (common_column, int_column1, int_column2) VALUES
('common1', 300, 400),
('common2', 500, 600);

Metadata

Metadata

Assignees

No one assigned

    Labels

    driver:clickhouseIssues related to the ClickHouse driver

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions