Skip to content

Commit 171ea35

Browse files
authored
feat(cubesql): Penalize zero members in wrapper (#8927)
This would allow to extract fully assembled `CubeScan` under wrapper instead of `CubeScan(allMembers, ungrouped=true)`. Before this there were two related components in cost: `non_detected_cube_scans` and `cube_members` `non_detected_cube_scans` allows to penalize `CubeScan` without members specifically outside the wrapper. This is pretty hard penalty, queries like that are Not Good `cube_members` allows to prefer queries will less members, which seems fine. But on it's own it would prefer query with zero member, which is, actually, all the members. New cost component added: `zero_members_wrapper`. It would stand right before `cube_members`, and allow to penalize no-members representation before `cube_members` starts impacting extraction. New `CubeScan` extractions surfaced a couple of bugs related to aliasing in generated SQL, hence all the supporting stuff: * Support member alias for TD with granularity Before this schema compiler didn't use aliases for `cube.timeDimension.granularity` members * Extract ColumnRemapping and Remapper structs * Implement column remapping and literal member handling for `CubeScan` in wrapper Now column names, introduced by DataFusion, would get renamed, and that would avoid sending too long or incorrect aliases to Cube for SQL generation, and later to data source. DF can generate names like `datetrunc(Utf8("day"),Orders.createdAt)`, and aliases like that are not expected by JS side Single `CubeScan` can represent join of multiple `TableScan`s, they can have different table aliases, and column expressions on top of `CubeScan` in original plan can have different qualifiers. But generated SQL can have only one table alias, so all column expressions on top needs to be remapped to that single alias as well. * Support literal members in CubeScan under wrapper Now SQL generated for `CubeScan` will not skip literal members from CubeSCan, and generate SELECT wrapper with literal members as literal columns.
1 parent e661d2a commit 171ea35

File tree

7 files changed

+662
-155
lines changed

7 files changed

+662
-155
lines changed

packages/cubejs-schema-compiler/src/adapter/BaseTimeDimension.ts

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -70,10 +70,14 @@ export class BaseTimeDimension extends BaseFilter {
7070
return super.aliasName();
7171
}
7272

73-
// @ts-ignore
74-
public unescapedAliasName(granularity: string) {
73+
public unescapedAliasName(granularity?: string) {
7574
const actualGranularity = granularity || this.granularityObj?.granularity || 'day';
7675

76+
const fullName = `${this.dimension}.${actualGranularity}`;
77+
if (this.query.options.memberToAlias && this.query.options.memberToAlias[fullName]) {
78+
return this.query.options.memberToAlias[fullName];
79+
}
80+
7781
return `${this.query.aliasName(this.dimension)}_${actualGranularity}`; // TODO date here for rollups
7882
}
7983

packages/cubejs-testing/test/__snapshots__/smoke-cubesql.test.ts.snap

Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -232,6 +232,84 @@ Array [
232232
]
233233
`;
234234

235+
exports[`SQL API Postgres (Data) select __user and literal grouped under wrapper: select __user and literal in wrapper 1`] = `
236+
Array [
237+
Object {
238+
"my_created_at": 2024-01-01T00:00:00.000Z,
239+
"my_literal": "1",
240+
"my_status": "new",
241+
"my_user": null,
242+
},
243+
Object {
244+
"my_created_at": 2024-01-01T00:00:00.000Z,
245+
"my_literal": "1",
246+
"my_status": "processed",
247+
"my_user": null,
248+
},
249+
Object {
250+
"my_created_at": 2024-01-01T00:00:00.000Z,
251+
"my_literal": "1",
252+
"my_status": "shipped",
253+
"my_user": null,
254+
},
255+
]
256+
`;
257+
258+
exports[`SQL API Postgres (Data) select __user and literal grouped: select __user and literal 1`] = `
259+
Array [
260+
Object {
261+
"Int64(2)": "2",
262+
"__cubeJoinField": null,
263+
"datetrunc(Utf8(\\"day\\"),Orders.createdAt)": 2024-01-01T00:00:00.000Z,
264+
"id": 1,
265+
"my_created_at": 2024-01-01T00:00:00.000Z,
266+
"my_literal": "1",
267+
"my_status": "new",
268+
"my_user": null,
269+
},
270+
Object {
271+
"Int64(2)": "2",
272+
"__cubeJoinField": null,
273+
"datetrunc(Utf8(\\"day\\"),Orders.createdAt)": 2024-01-02T00:00:00.000Z,
274+
"id": 2,
275+
"my_created_at": 2024-01-01T00:00:00.000Z,
276+
"my_literal": "1",
277+
"my_status": "new",
278+
"my_user": null,
279+
},
280+
Object {
281+
"Int64(2)": "2",
282+
"__cubeJoinField": null,
283+
"datetrunc(Utf8(\\"day\\"),Orders.createdAt)": 2024-01-03T00:00:00.000Z,
284+
"id": 3,
285+
"my_created_at": 2024-01-01T00:00:00.000Z,
286+
"my_literal": "1",
287+
"my_status": "processed",
288+
"my_user": null,
289+
},
290+
Object {
291+
"Int64(2)": "2",
292+
"__cubeJoinField": null,
293+
"datetrunc(Utf8(\\"day\\"),Orders.createdAt)": 2024-01-04T00:00:00.000Z,
294+
"id": 4,
295+
"my_created_at": 2024-01-01T00:00:00.000Z,
296+
"my_literal": "1",
297+
"my_status": "processed",
298+
"my_user": null,
299+
},
300+
Object {
301+
"Int64(2)": "2",
302+
"__cubeJoinField": null,
303+
"datetrunc(Utf8(\\"day\\"),Orders.createdAt)": 2024-01-05T00:00:00.000Z,
304+
"id": 5,
305+
"my_created_at": 2024-01-01T00:00:00.000Z,
306+
"my_literal": "1",
307+
"my_status": "shipped",
308+
"my_user": null,
309+
},
310+
]
311+
`;
312+
235313
exports[`SQL API Postgres (Data) select null in subquery with streaming 1`] = `
236314
Array [
237315
Object {

packages/cubejs-testing/test/smoke-cubesql.test.ts

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -404,6 +404,78 @@ describe('SQL API', () => {
404404
expect(res.rows).toEqual([{ max: null }]);
405405
});
406406

407+
test('select __user and literal grouped', async () => {
408+
const query = `
409+
SELECT
410+
status AS my_status,
411+
date_trunc('month', createdAt) AS my_created_at,
412+
__user AS my_user,
413+
1 AS my_literal,
414+
-- Columns without aliases should also work
415+
id,
416+
date_trunc('day', createdAt),
417+
__cubeJoinField,
418+
2
419+
FROM
420+
Orders
421+
GROUP BY 1,2,3,4,5,6,7,8
422+
ORDER BY 1,2,3,4,5,6,7,8
423+
`;
424+
425+
const res = await connection.query(query);
426+
expect(res.rows).toMatchSnapshot('select __user and literal');
427+
});
428+
429+
test('select __user and literal grouped under wrapper', async () => {
430+
const query = `
431+
WITH
432+
-- This subquery should be represented as CubeScan(ungrouped=false) inside CubeScanWrapper
433+
cube_scan_subq AS (
434+
SELECT
435+
status AS my_status,
436+
date_trunc('month', createdAt) AS my_created_at,
437+
__user AS my_user,
438+
1 AS my_literal,
439+
-- Columns without aliases should also work
440+
id,
441+
date_trunc('day', createdAt),
442+
__cubeJoinField,
443+
2
444+
FROM Orders
445+
GROUP BY 1,2,3,4,5,6,7,8
446+
),
447+
filter_subq AS (
448+
SELECT
449+
status status_filter
450+
FROM Orders
451+
GROUP BY
452+
status_filter
453+
)
454+
SELECT
455+
-- Should use SELECT * here to reference columns without aliases.
456+
-- But it's broken ATM in DF, initial plan contains \`Projection: ... #__subquery-0.logs_content_filter\` on top, but it should not be there
457+
-- TODO fix it
458+
my_created_at,
459+
my_status,
460+
my_user,
461+
my_literal
462+
FROM cube_scan_subq
463+
WHERE
464+
-- This subquery filter should trigger wrapping of whole query
465+
my_status IN (
466+
SELECT
467+
status_filter
468+
FROM filter_subq
469+
)
470+
GROUP BY 1,2,3,4
471+
ORDER BY 1,2,3,4
472+
;
473+
`;
474+
475+
const res = await connection.query(query);
476+
expect(res.rows).toMatchSnapshot('select __user and literal in wrapper');
477+
});
478+
407479
test('where segment is false', async () => {
408480
const query =
409481
'SELECT value AS val, * FROM "SegmentTest" WHERE segment_eq_1 IS FALSE ORDER BY value;';

0 commit comments

Comments
 (0)