Skip to content

Commit 37d8d17

Browse files
authored
Merge branch 'master' into feat/scheduledRefreshTimeZones
2 parents d2ca1d1 + 95021f2 commit 37d8d17

File tree

249 files changed

+7251
-2306
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

249 files changed

+7251
-2306
lines changed
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
#!/bin/bash
2+
set -eo pipefail
3+
4+
# Debug log for test containers
5+
export DEBUG=testcontainers
6+
7+
echo "::group::Dremio [cloud]"
8+
yarn lerna run --concurrency 1 --stream --no-prefix integration:dremio
9+
10+
echo "::endgroup::"

.github/workflows/push.yml

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -327,6 +327,7 @@ jobs:
327327
env:
328328
CLOUD_DATABASES: >
329329
firebolt
330+
dremio
330331
# Athena (just to check for secrets availability)
331332
DRIVERS_TESTS_ATHENA_CUBEJS_AWS_KEY: ${{ secrets.DRIVERS_TESTS_ATHENA_CUBEJS_AWS_KEY }}
332333

@@ -335,7 +336,7 @@ jobs:
335336
node-version: [20.x]
336337
db: [
337338
'clickhouse', 'druid', 'elasticsearch', 'mssql', 'mysql', 'postgres', 'prestodb',
338-
'mysql-aurora-serverless', 'crate', 'mongobi', 'firebolt'
339+
'mysql-aurora-serverless', 'crate', 'mongobi', 'firebolt', 'dremio'
339340
]
340341
fail-fast: false
341342

@@ -397,6 +398,10 @@ jobs:
397398
DRIVERS_TESTS_FIREBOLT_CUBEJS_FIREBOLT_ACCOUNT: ${{ secrets.DRIVERS_TESTS_FIREBOLT_CUBEJS_FIREBOLT_ACCOUNT }}
398399
DRIVERS_TESTS_FIREBOLT_CUBEJS_DB_USER: ${{ secrets.DRIVERS_TESTS_FIREBOLT_CUBEJS_DB_USER }}
399400
DRIVERS_TESTS_FIREBOLT_CUBEJS_DB_PASS: ${{ secrets.DRIVERS_TESTS_FIREBOLT_CUBEJS_DB_PASS }}
401+
# Dremio Integration
402+
DRIVERS_TESTS_DREMIO_CUBEJS_DB_URL: ${{ secrets.DRIVERS_TESTS_DREMIO_CUBEJS_DB_URL }}
403+
DRIVERS_TESTS_DREMIO_CUBEJS_DB_NAME: ${{ secrets.DRIVERS_TESTS_DREMIO_CUBEJS_DB_NAME }}
404+
DRIVERS_TESTS_DREMIO_CUBEJS_DB_DREMIO_AUTH_TOKEN: ${{ secrets.DRIVERS_TESTS_DREMIO_CUBEJS_DB_DREMIO_AUTH_TOKEN }}
400405

401406
integration-smoke:
402407
needs: [ latest-tag-sha, build-cubestore ]
@@ -407,6 +412,7 @@ jobs:
407412
strategy:
408413
matrix:
409414
node-version: [ 20.x ]
415+
python-version: [ 3.11 ]
410416
fail-fast: false
411417

412418
steps:
@@ -432,6 +438,10 @@ jobs:
432438
uses: actions/setup-node@v4
433439
with:
434440
node-version: ${{ matrix.node-version }}
441+
- name: Install Python
442+
uses: actions/setup-python@v5
443+
with:
444+
python-version: ${{ matrix.python-version }}
435445
- name: Get yarn cache directory path
436446
id: yarn-cache-dir-path
437447
run: echo "dir=$(yarn cache dir)" >> "$GITHUB_OUTPUT"
@@ -459,6 +469,11 @@ jobs:
459469
uses: GoodManWEN/oracle-client-action@main
460470
- name: Build client
461471
run: yarn build
472+
- name: Build cubejs-backend-native (with Python)
473+
run: yarn run native:build-release-python
474+
working-directory: ./packages/cubejs-backend-native
475+
env:
476+
PYO3_PYTHON: python${{ matrix.python-version }}
462477
- name: Lerna tsc
463478
run: yarn tsc
464479
- name: Download cubestored-x86_64-unknown-linux-gnu-release artifact

CHANGELOG.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,39 @@
33
All notable changes to this project will be documented in this file.
44
See [Conventional Commits](https://conventionalcommits.org) for commit guidelines.
55

6+
## [1.1.11](https://github.com/cube-js/cube/compare/v1.1.10...v1.1.11) (2024-12-16)
7+
8+
9+
### Bug Fixes
10+
11+
* TypeError: Cannot read properties of undefined (reading 'joins') ([14adaeb](https://github.com/cube-js/cube/commit/14adaebdd1c3d398bcd2997012da070999e47d9d))
12+
13+
14+
15+
16+
17+
## [1.1.10](https://github.com/cube-js/cube/compare/v1.1.9...v1.1.10) (2024-12-16)
18+
19+
20+
### Bug Fixes
21+
22+
* **api-gateway:** allow switch sql user when the new user is the same ([#9037](https://github.com/cube-js/cube/issues/9037)) ([a69c28f](https://github.com/cube-js/cube/commit/a69c28f524fa0625b825b98a38e7f5a211a98f74))
23+
* **api-gateway:** make sure DAP works sql pushdown ([#9021](https://github.com/cube-js/cube/issues/9021)) ([23695b2](https://github.com/cube-js/cube/commit/23695b2b5e886b5b7daf8b3f74003bb04e5b2e0b))
24+
* **cubestore:** Allow create an index from expressions ([#9006](https://github.com/cube-js/cube/issues/9006)) ([222cab8](https://github.com/cube-js/cube/commit/222cab897c289bfc929f217483e4905204bac12f))
25+
* **schema-compiler:** fix DAP with query_rewrite and python config ([#9033](https://github.com/cube-js/cube/issues/9033)) ([849790f](https://github.com/cube-js/cube/commit/849790f965dd0d9fddba11e3d8d124b84397ca9b))
26+
* **schema-compiler:** join relationship aliases ([ad4e8e3](https://github.com/cube-js/cube/commit/ad4e8e3872307ab77e035709e5208b0191f87f5b))
27+
28+
29+
### Features
30+
31+
* **cubesql:** Basic VALUES support in rewrite engine ([#9041](https://github.com/cube-js/cube/issues/9041)) ([368671f](https://github.com/cube-js/cube/commit/368671fd1b53b2ed5ad8df6af113492982f23c0c))
32+
* **dremio-driver:** Add Dremio Cloud Support ([#8956](https://github.com/cube-js/cube/issues/8956)) ([d2c2fcd](https://github.com/cube-js/cube/commit/d2c2fcdaf8944ea7dd27e73b63c0b151c317022e))
33+
* **tesseract:** Support multiple join paths within single query ([#9047](https://github.com/cube-js/cube/issues/9047)) ([b62446e](https://github.com/cube-js/cube/commit/b62446e3c3893068f8dd8aa32d7204ea06a16f98))
34+
35+
36+
37+
38+
639
## [1.1.9](https://github.com/cube-js/cube/compare/v1.1.8...v1.1.9) (2024-12-08)
740

841

docs/pages/product/apis-integrations/ai-api.mdx

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,44 @@ to give the AI context on possible values in a categorical dimension:
181181
- completed
182182
```
183183
184+
### Value search
185+
186+
By default, the AI API has no ability to see the contents of your data (for privacy reasons).
187+
However, this makes it difficult for the AI API to generate correct filters for some queries.
188+
189+
Imagine you have a categorical `order_status` dimension with the possible values "shipped",
190+
"processing", and "completed". Without value search, asking "how many complete orders did
191+
we have today" might get you a query filtering on `order_status = 'Complete'` instead of
192+
the correct `order_status = 'completed'`.
193+
194+
To solve this, the AI API can perform "value searches" where it introspects the values in
195+
selected categorical dimensions before running a query. Value search is opt-in and dimensions
196+
must be enabled for it individually. Currently, the AI API performs value search by running
197+
Cube queries using the `contains` filter operator against one or more chosen dimensions.
198+
The LLM will select dimensions from among those you have based on the question asked and
199+
generate possible values dynamically.
200+
201+
<InfoBox>
202+
When running value search queries, the AI API passes through the security context used
203+
for the AI API request, so security is maintained and only dimensions the end user has
204+
access to are able to be searched.
205+
</InfoBox>
206+
207+
To enable value search on a dimension, set the `searchable` field to true under the `ai`
208+
meta tag, as shown below:
209+
```yaml
210+
- name: order_status
211+
sql: order_status
212+
type: string
213+
meta:
214+
ai:
215+
searchable: true
216+
```
217+
218+
Note that enabling Value Search may lead to slightly longer AI API response times when it
219+
is used but should result in significantly more accurate queries in many situations. Value
220+
Search can only be used on string dimensions.
221+
184222
### Other LLM providers
185223

186224
<InfoBox>

docs/pages/product/auth/context.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,7 @@ def masked(sql, security_context):
220220
if is_trusted_team:
221221
return sql
222222
else:
223-
return "\"'--- masked ---'\""
223+
return "'--- masked ---'"
224224
```
225225

226226

docs/pages/product/caching.mdx

Lines changed: 17 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -257,21 +257,26 @@ versions.
257257

258258
Any query that is fulfilled by Cube will use one of the following cache types:
259259

260-
- **[Pre-aggregations](#pre-aggregations) in Cube Store.** This is the most
261-
advantageous and performant option.
260+
- **[Pre-aggregations](#pre-aggregations) in Cube Store.** This cache type
261+
indicates that the query utilized existing pre-aggregations in Cube Store,
262+
so it did not need to go to the database for processing.
262263
- **Pre-aggregations in Cube Store with a suboptimal query plan.** This cache
263-
type indicates that queries still benefit from pre-aggregations in Cube Store
264-
but it's possible to get a performance boost by [using indexes][ref-indexes].
264+
type indicates that the query ultilized pre-aggregations in Cube Store,
265+
but that it's possible to get a performance boost by [using indexes][ref-indexes].
265266
- **Pre-aggregations in the data source.** This cache type indicates that
266-
queries don't benefit from pre-aggregations in Cube Store and it's possible
267-
to get a massive performance boost by using Cube Store as [pre-aggregation
267+
the query utilized pre-aggregations from in the upstream data source.
268+
These queries could gain a performance boost by using Cube Store as [pre-aggregation
268269
storage][ref-storage].
269-
- **[In-memory cache.](#in-memory-cache)** This cache type indicates that
270-
queries don't benefit from pre-aggregations at all. Queries directly hit the
271-
upstream data source and in-memory cache is used to speed up the execution of
272-
identical queries that arrive within a short period of time.
273-
- **No cache.** This cache type indicates queries that directly hit the
274-
upstream data source and have the worst performance possible.
270+
- **[In-memory cache.](#in-memory-cache)** This cache type indicates that the
271+
results were retrieved from Cube's in-memory cache. All query results
272+
are stored in Cube's in-memory cache, and if the same query is
273+
run within a certain time frame, the results will be retrieved from in-memory
274+
cache instead of being processed on the database or in Cube Store. This is the
275+
fastest query retrieval method, but it requires that the exact same query was
276+
run very recently.
277+
- **No cache.** This cache type indicates that the query was processed in the upstream
278+
data source and was not accelrated using pre-aggregations. These queries could have
279+
a significant performance boost if pre-aggregations and Cube Store were utilized.
275280

276281
In [Query History][ref-query-history] and throughout Cube Cloud, colored bolt
277282
icons are used to indicate the cache type. Also, [Performance

docs/pages/product/workspace/ai-assistant.mdx

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,11 @@ to give the AI context on possible values in a categorical dimension:
9797
- completed
9898
```
9999
100+
### Value search
101+
102+
Value Search can be enabled for AI Assistant in the same way as for the AI API. See the
103+
[AI API's documentation][ref-ai-api-value-search] for details and instructions.
104+
100105
### Other LLM providers
101106
102107
See the [AI API's documentation][ref-ai-api-providers] for information on how to "bring your own" LLM.
@@ -127,3 +132,4 @@ See the [AI API's documentation][ref-ai-api-providers] for information on how to
127132
[ref-playground]: /product/workspace/playground
128133
[ref-catalog-downstream]: /product/workspace/semantic-catalog#connecting-downstream-tools
129134
[ref-ai-api-providers]: /product/apis-integrations/ai-api#other-llm-providers
135+
[ref-ai-api-value-search]: /product/apis-integrations/ai-api#value-search

docs/pages/reference/configuration/environment-variables.mdx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -572,8 +572,8 @@ The timeout value for any queries made to the database by Cube.
572572
<InfoBox>
573573

574574
There's a hard limit of 20 minutes for queries that ingest data into Cube Store
575-
when pre-aggregations are built. If you bump into this limit, consider using an
576-
export bucket and splitting pre-aggregations into partitions.
575+
when pre-aggregations are built. If you bump into this limit, consider using
576+
an export bucket and splitting pre-aggregations into partitions.
577577

578578
</InfoBox>
579579

docs/pages/reference/data-model/joins.mdx

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -404,6 +404,119 @@ cubes:
404404
405405
</CodeTabs>
406406
407+
## Chasm and fan traps
408+
409+
Cube automatically detects chasm and fan traps based on the `many_to_one` and `one_to_many` relationships defined in join.
410+
When detected, Cube generates a deduplication query that evaluates all distinct primary keys within the multiplied measure's cube and then joins distinct primary keys to this cube on itself to calculate the aggregation result.
411+
If there's more than one multiplied measure in a query, then such query is generated for every such multiplied measure, and results are joined.
412+
Cube solves for chasm and fan traps during query time.
413+
If there's pre-aggregregation that fits measure multiplication requirements it'd be leveraged to serve such a query.
414+
Such pre-aggregations and queries are always considered non-additive for the purpose of pre-aggregation matching.
415+
416+
Let's consider an example data model:
417+
418+
<CodeTabs>
419+
420+
```javascript
421+
cube(`orders`, {
422+
sql_table: `orders`
423+
424+
dimensions: {
425+
id: {
426+
sql: `id`,
427+
type: `number`,
428+
primary_key: true
429+
},
430+
city: {
431+
sql: `city`,
432+
type: `string`
433+
}
434+
},
435+
436+
joins: {
437+
customers: {
438+
relationship: `many_to_one`,
439+
sql: `${CUBE}.customer_id = ${customers.id}`,
440+
},
441+
},
442+
});
443+
444+
cube(`customers`, {
445+
sql_table: `customers`
446+
447+
measures: {
448+
count: {
449+
type: `count`,
450+
}
451+
},
452+
453+
dimensions: {
454+
id: {
455+
sql: `id`,
456+
type: `number`,
457+
primary_key: true
458+
}
459+
}
460+
});
461+
```
462+
463+
```yaml
464+
cubes:
465+
- name: orders
466+
sql_table: orders
467+
468+
dimensions:
469+
- name: id
470+
sql: id
471+
type: number
472+
primary_key: true
473+
- name: city
474+
sql: city
475+
type: string
476+
477+
joins:
478+
- name: customers
479+
relationship: many_to_one
480+
sql: "{orders}.customer_id = {customers.id}"
481+
482+
- name: customers
483+
sql_table: customers
484+
485+
dimensions:
486+
- name: id
487+
sql: id
488+
type: number
489+
primary_key: true
490+
491+
measures:
492+
- name: average_age
493+
sql: age
494+
type: avg
495+
496+
```
497+
498+
</CodeTabs>
499+
500+
If we try to query `customers.average_age` by `orders.city`, the Cube detects that the `average_age` measure in the `customers` cube would be multiplied by `orders` to `customers` and would generate SQL similar to:
501+
502+
```sql
503+
SELECT
504+
"keys"."orders__city",
505+
avg("customers_key__customers".age) "customers__average_age"
506+
FROM
507+
(
508+
SELECT
509+
DISTINCT "customers_key__orders".city "orders__city",
510+
"customers_key__customers".id "customers__id"
511+
FROM
512+
orders AS "customers_key__orders"
513+
LEFT JOIN customers AS "customers_key__customers" ON "customers_key__orders".customer_id = "customers_key__customers".id
514+
) AS "keys"
515+
LEFT JOIN customers AS "customers_key__customers" ON "keys"."customers__id" = "customers_key__customers".id
516+
GROUP BY
517+
1
518+
```
519+
407520
## CUBE reference
408521

409522
When you have several joined cubes, you should accurately use columns’ names to

docs/pages/reference/data-model/pre-aggregations.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -921,7 +921,7 @@ cubes:
921921

922922
</CodeTabs>
923923

924-
For possible `every` parameter values please refer to
924+
To have a pre-aggregation rebuild at a specific time of day, you can use a CRON string with some limitations. For more details about values that can be used with the `every` parameter, please refer to the
925925
[`refreshKey`][ref-cube-refreshkey] documentation.
926926

927927
You can also use `every` with `sql`:

0 commit comments

Comments
 (0)