Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion pages/dashboards/saving.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ The version you save can be:
- promoted to Development/Staging/Production via the [Publish flow](/dashboards/publishing)
- targeted via the [Tokens API](/deployment/tokens-api) (`savedVersion`),
- accessed in the [Embeddable API](/deployment/embeddables-api) metadata, and
- used by the [Caching API](/data-modeling/caching/caching-api) when refreshing [pre‑aggregations](/data-modeling/caching/pre-aggregations) for each security context, based on the **refresh_key** you’ve set.
- used by the [Caching API](/data-modeling/caching/level-2-cache/caching-api) when refreshing [pre‑aggregations](/data-modeling/caching/level-2-cache/pre-aggregations) for each security context, based on the **refresh_key** you’ve set.

## Version Picker

Expand Down
5 changes: 2 additions & 3 deletions pages/data-modeling/caching/_meta.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
{
"in-memory": "In-memory cache",
"pre-aggregations": "Pre-aggregations",
"caching-api": "Caching API"
"in-memory": "Level 1 cache: in-memory",
"level-2-cache": "Level 2 cache: pre-aggregations"
}

2 changes: 1 addition & 1 deletion pages/data-modeling/caching/in-memory.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ To deliver fast and responsive analytics, Embeddable leverages Cube's **Level 1

You can tell Cube to evaluate and invalidate the Level 1 Cache using **Refresh Keys**. Learn more [here](https://cube.dev/docs/product/caching#refresh-keys).

Cube does not recommend changing the default in-memory caching configuration unless necessary. Instead, to speed up query performance, you should use [pre-aggregations](/data-modeling/caching/pre-aggregations).
To further improve query performance, we recommend using [pre-aggregations](/data-modeling/caching/level-2-cache/pre-aggregations).
7 changes: 7 additions & 0 deletions pages/data-modeling/caching/level-2-cache/_meta.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
{
"prerequisites": "Prerequisites",
"pre-aggregations": "Pre-aggregations 101",
"advanced-pre-aggregations": "Advanced Pre-aggregations",
"caching-api": "Caching API"
}

Original file line number Diff line number Diff line change
@@ -0,0 +1,309 @@
# Advanced Pre-aggregations

This guide covers a set of advanced pre-aggregation topics that help you optimise performance and handle more complex data scenarios.

## Handling incremental data loads

Sometimes your source data is updated incrementally for example: only the last few days are reloaded or updated while older data remains unchanged. In these cases, it’s more efficient to build your pre-aggregations incrementally instead of rebuilding the entire dataset.

Using the `customers` cube example:

```yaml
pre_aggregations:
- name: daily_count_by_countries
measures:
- CUBE.count
dimensions:
- CUBE.country
time_dimension: CUBE.signed_up_at
granularity: day
partition_granularity: day
build_range_start:
sql: SELECT NOW() - INTERVAL '365 day'
build_range_end:
sql: SELECT NOW()
refresh_key:
every: 1 day
incremental: true
update_window: 3 day
```

**Things to notice:**

- Most queries focus on the past year, so we limit the build range to 365 days using `build_range_start` and `build_range_end`. Learn more [here](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#build_range_start-and-build_range_end).
- `partition_granularity: day` splits the pre-aggregation into daily partitions, making it possible to refresh only the days that change instead of rebuilding the whole year.
- Partitioned pre-aggregations require both a `time_dimension` and a `granularity`. See the Cube docs on [supported values](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#partition_granularity).
- With `incremental: true` and `update_window: 3 day`, Cube refreshes only the last three partitions each day. Learn more about [`update_window`](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#update_window) and [`incremental`](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#incremental) .

<Callout>
Without `update_window`, Cube refreshes partitions strictly according to `partition_granularity` (in this case, just the last day).
</Callout>

## Indexes

Indexes make data retrieval faster. Think of an index as a shortcut that points directly to the relevant rows instead of searching through all the data. This speeds up queries that filter, group, or join on specific fields.

In the context of pre-aggregations, indexes help [Cube Store](https://cube.dev/docs/product/deployment#cube-store) quickly locate and read only the data needed for a query improving performance, especially on large datasets.

Indexes are particularly useful when:

- For larger pre-aggregations, indexes are often required to achieve optimal performance, especially when a query doesn’t use all dimensions from the pre-aggregation.
- Queries frequently filter on **high-cardinality dimensions**, such as `product_id` or `date`. Indexes help Cube Store find matching rows faster in these cases.
- You plan to join one pre-aggregation with another, such as in a [`rollup_join`](/data-modeling/caching/level-2-cache/advanced-pre-aggregations#rollup_join).

<Callout emoji="💡">
Adding indexes doesn’t change your data, it simply makes Cube Store more efficient at finding it.
</Callout>

### Using indexes in pre-aggregations

Let’s start with a simple `products` model and define a `products_preagg` pre-aggregation.

Here we add an index on `size` within our pre-aggregation, which Cube Store uses to quickly resolve joins and filters involving that indexed column.

```yaml
cubes:
- name: products
sql_table: my_db.main.products
data_source: default

dimensions:
- name: id
sql: id
type: number
primary_key: true
public: true

- name: name
sql: name
type: string

- name: size
sql: size
type: string


measures:
- name: count
type: count
title: "# of products"

- name: price
type: sum
title: Total USD
sql: price

joins:
- name: orders
sql: "{CUBE.id} = {orders.product_id}"
relationship: one_to_many

pre_aggregations:
- name: products_preagg
type: rollup
dimensions:
- size
measures:
- count
- price
indexes:
- name: product_index
columns:
- size
```

In this example:

- The `products_preagg` pre-aggregation stores aggregated products data by size dimension.
- The index `product_index` on `size` speeds up queries using that dimension.
- Make sure the column you’re indexing is also included in the pre-aggregation dimensions; otherwise, Cube will return an error like:

> Error during create table: Column 'products__id' in index 'products_products_preagg_product_index' is not found in table 'products_products_preagg'
>

<Callout emoji="💡">
Each index adds to the pre-aggregation build time, since all indexes are created during ingestion. Add only the ones you need.
</Callout>

Learn more about indexes [here](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#indexes).

## Rollup_join

- Cube can run SQL joins across different data sources. For example, you might have products in [PostgreSQL](/data/credentials#postgres) and orders in [MotherDuck](/data/credentials#motherduck).

- All pre-aggregations so far have been of type rollup (which is the default pre-aggregation type). Cube also supports `rollup_join`, which combines data from two or more rollups coming from different data sources.

- `rollup_join` joins pre-aggregated data inside [cube store](https://cube.dev/docs/product/deployment#cube-store), so you can query it together efficiently.

<Callout>
You don’t need a rollup_join to join cubes from the same data source. Just include the other cube’s dimensions and measures directly in your rollup definition as mentioned [here](/data-modeling/caching/level-2-cache/pre-aggregations#performing-joins-across-cubes-in-your-pre-aggregations)
</Callout>

Let’s extend the example from the [indexes](/data-modeling/caching/level-2-cache/advanced-pre-aggregations#indexes) section. We’ll keep the products model from the PostgreSQL (default) data source. Since it joins to the orders model on the id column, we’ll need to update the pre-aggregation to include id and name and add an index on it.

```yaml

pre_aggregations:
- name: products_preagg
type: rollup
dimensions:
- id
- name
- size
measures:
- count
- price
indexes:
- name: product_index
columns:
- id
refresh_key:
every: 1 hour
```

The new orders model from MotherDuck data source will be added to show how to run analytics across databases.


```yaml
cubes:
- name: orders
sql_table: public.orders
data_source: motherduck

dimensions:
- name: id
sql: id
type: number
primary_key: true

- name: created_at
sql: created_at
type: time

- name: product_id
sql: product_id
type: number
public: false

measures:
- name: count
type: count
title: "# of orders"

joins:
- name: products
sql: "{CUBE.product_id} = {products.id}"
relationship: many_to_one

pre_aggregations:
- name: orders_preagg
type: rollup
dimensions:
- product_id
- created_at
measures:
- count
time_dimension: CUBE.created_at
granularity: day
indexes:
- name: orders_index
columns:
- product_id
refresh_key:
every: 1 hour

- name: orders_with_products_rollup
type: rollup_join
dimensions:
- products.name
- orders.created_at
measures:
- orders.count
time_dimension: orders.created_at
granularity: day
rollups:
- products.products_preagg
- orders_preagg
```

**Things to notice:**

- `orders` uses the **MotherDuck** data source.
- `products` uses **default** data source (for example, PostgreSQL). Learn more about connecting to multiple datasources [here](/data/credentials).
- Always reference dimensions explicitly in your joins between models, especially when using a `rollup_join`:

```yaml
joins:
- name: products
sql: "{CUBE.product_id} = {products.id}"
relationship: many_to_one
```

If you use `{CUBE}.product_id` or `{products}.id`, Cube will not recognise them as dimension references and will return an error like:

```
From members are not found in [] for join ...
Please make sure join fields are referencing dimensions instead of columns.
```

- Indexes are required when using `rollup_join` pre-aggregations so Cube Store can join multiple pre-aggregations efficiently.

Without the right index, Cube may fail to plan the join and return an error like:

```
Error during planning: Can't find index to join table ...
Consider creating index ... ON ... (orders__product_id)
```

Therefore, notice that we have indexed the **join keys on both sides**:

```
- `products.products_preagg` → index on `id`
- `orders.orders_preagg` → index on `product_id`
```

- `orders_with_products_rollup` combines both pre-aggregations inside **Cube Store** using the type `rollup_join`.

The `rollups:` property lists which pre-aggregations to join together:

```yaml
rollups:
- products.products_preagg
- orders_preagg
```

- We also added a `time_dimension` with **day-level granularity** in `orders_with_products_rollup`.

We expect users to ask questions at a daily level, such as “How many orders were placed per product each day?”. Setting the `time_dimension` to **day** ensures Cube builds and queries this data efficiently.

<Callout emoji="💡">
`rollup_join` is an ephemeral pre-aggregation. It uses the referenced pre-aggregations at query time, so freshness is controlled by them, not the rollup_join itself.
</Callout>

- Notice that we’ve set the `refresh_key` to **1 hour** on both referenced pre-aggregations (`products_preagg` and `orders_preagg`) to keep the data up to date. Learn more about refreshing pre-aggregations [here](/data-modeling/caching/level-2-cache/pre-aggregations#refreshing-pre-aggregations).

### How `rollup_join` works in Embeddable

In this example, we’ll find the total **number of orders** for each **product**. The **product name** comes from the `products` model, while the **orders count** comes from the `orders` model.

<VideoComponent
src="/video/rollup_join_example.mp4"
width="1250"
height="854"
/>

**Things to notice:**
- The query’s FROM clause references both pre-aggregations. This is how Cube joins pre-aggregated datasets from different data sources inside Cube Store.

### Benefits of using `rollup_join`

- Enables **cross-database joins** inside Cube Store
- Leverages **indexed pre-aggregations** for efficient distributed joins
- Avoids the need for ETL or database federation
- Provides consistent, scalable analytics across data sources

Learn more about rollup_join [here](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#rollup_join).

## Next Steps

The next step is to setup Embeddable’s [Caching API](/data-modeling/caching/level-2-cache/caching-api) to refresh pre-aggregations for each of your security contexts. Without it, pre-aggregations will only refresh on demand.
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Caching API

Use Embeddable’s Caching API to tell Embeddable which security contexts need refreshing. The refresh frequency comes from the [refresh_key](/data-modeling/caching/pre-aggregations#refreshing-pre-aggregations) you set in your [pre-aggregations](/data-modeling/caching/pre-aggregations) within the data model.
Use Embeddable’s Caching API to tell Embeddable which security contexts need refreshing. The refresh frequency comes from the [refresh_key](/data-modeling/caching/level-2-cache/pre-aggregations#refreshing-pre-aggregations) you set in your [pre-aggregations](/data-modeling/caching/level-2-cache/pre-aggregations) within the data model.

<Bruno/>

Expand Down
Loading