diff --git a/pages/dashboards/saving.mdx b/pages/dashboards/saving.mdx
index fc6ede8..669f4f5 100644
--- a/pages/dashboards/saving.mdx
+++ b/pages/dashboards/saving.mdx
@@ -43,7 +43,7 @@ The version you save can be:
- promoted to Development/Staging/Production via the [Publish flow](/dashboards/publishing)
- targeted via the [Tokens API](/deployment/tokens-api) (`savedVersion`),
- accessed in the [Embeddable API](/deployment/embeddables-api) metadata, and
- - used by the [Caching API](/data-modeling/caching/caching-api) when refreshing [pre‑aggregations](/data-modeling/caching/pre-aggregations) for each security context, based on the **refresh_key** you’ve set.
+ - used by the [Caching API](/data-modeling/caching/level-2-cache/caching-api) when refreshing [pre‑aggregations](/data-modeling/caching/level-2-cache/pre-aggregations) for each security context, based on the **refresh_key** you’ve set.
## Version Picker
diff --git a/pages/data-modeling/caching/_meta.json b/pages/data-modeling/caching/_meta.json
index 0304a1e..2b48949 100644
--- a/pages/data-modeling/caching/_meta.json
+++ b/pages/data-modeling/caching/_meta.json
@@ -1,6 +1,5 @@
{
- "in-memory": "In-memory cache",
- "pre-aggregations": "Pre-aggregations",
- "caching-api": "Caching API"
+ "in-memory": "Level 1 cache: in-memory",
+ "level-2-cache": "Level 2 cache: pre-aggregations"
}
\ No newline at end of file
diff --git a/pages/data-modeling/caching/in-memory.mdx b/pages/data-modeling/caching/in-memory.mdx
index d2119cf..815eb76 100644
--- a/pages/data-modeling/caching/in-memory.mdx
+++ b/pages/data-modeling/caching/in-memory.mdx
@@ -8,4 +8,4 @@ To deliver fast and responsive analytics, Embeddable leverages Cube's **Level 1
You can tell Cube to evaluate and invalidate the Level 1 Cache using **Refresh Keys**. Learn more [here](https://cube.dev/docs/product/caching#refresh-keys).
-Cube does not recommend changing the default in-memory caching configuration unless necessary. Instead, to speed up query performance, you should use [pre-aggregations](/data-modeling/caching/pre-aggregations).
\ No newline at end of file
+To further improve query performance, we recommend using [pre-aggregations](/data-modeling/caching/level-2-cache/pre-aggregations).
\ No newline at end of file
diff --git a/pages/data-modeling/caching/level-2-cache/_meta.json b/pages/data-modeling/caching/level-2-cache/_meta.json
new file mode 100644
index 0000000..678e793
--- /dev/null
+++ b/pages/data-modeling/caching/level-2-cache/_meta.json
@@ -0,0 +1,7 @@
+{
+ "prerequisites": "Prerequisites",
+ "pre-aggregations": "Pre-aggregations 101",
+ "advanced-pre-aggregations": "Advanced Pre-aggregations",
+ "caching-api": "Caching API"
+}
+
\ No newline at end of file
diff --git a/pages/data-modeling/caching/level-2-cache/advanced-pre-aggregations.mdx b/pages/data-modeling/caching/level-2-cache/advanced-pre-aggregations.mdx
new file mode 100644
index 0000000..635ec0a
--- /dev/null
+++ b/pages/data-modeling/caching/level-2-cache/advanced-pre-aggregations.mdx
@@ -0,0 +1,309 @@
+# Advanced Pre-aggregations
+
+This guide covers a set of advanced pre-aggregation topics that help you optimise performance and handle more complex data scenarios.
+
+## Handling incremental data loads
+
+Sometimes your source data is updated incrementally for example: only the last few days are reloaded or updated while older data remains unchanged. In these cases, it’s more efficient to build your pre-aggregations incrementally instead of rebuilding the entire dataset.
+
+Using the `customers` cube example:
+
+```yaml
+pre_aggregations:
+ - name: daily_count_by_countries
+ measures:
+ - CUBE.count
+ dimensions:
+ - CUBE.country
+ time_dimension: CUBE.signed_up_at
+ granularity: day
+ partition_granularity: day
+ build_range_start:
+ sql: SELECT NOW() - INTERVAL '365 day'
+ build_range_end:
+ sql: SELECT NOW()
+ refresh_key:
+ every: 1 day
+ incremental: true
+ update_window: 3 day
+```
+
+**Things to notice:**
+
+- Most queries focus on the past year, so we limit the build range to 365 days using `build_range_start` and `build_range_end`. Learn more [here](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#build_range_start-and-build_range_end).
+- `partition_granularity: day` splits the pre-aggregation into daily partitions, making it possible to refresh only the days that change instead of rebuilding the whole year.
+- Partitioned pre-aggregations require both a `time_dimension` and a `granularity`. See the Cube docs on [supported values](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#partition_granularity).
+- With `incremental: true` and `update_window: 3 day`, Cube refreshes only the last three partitions each day. Learn more about [`update_window`](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#update_window) and [`incremental`](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#incremental) .
+
+
+Without `update_window`, Cube refreshes partitions strictly according to `partition_granularity` (in this case, just the last day).
+
+
+## Indexes
+
+Indexes make data retrieval faster. Think of an index as a shortcut that points directly to the relevant rows instead of searching through all the data. This speeds up queries that filter, group, or join on specific fields.
+
+In the context of pre-aggregations, indexes help [Cube Store](https://cube.dev/docs/product/deployment#cube-store) quickly locate and read only the data needed for a query improving performance, especially on large datasets.
+
+Indexes are particularly useful when:
+
+- For larger pre-aggregations, indexes are often required to achieve optimal performance, especially when a query doesn’t use all dimensions from the pre-aggregation.
+- Queries frequently filter on **high-cardinality dimensions**, such as `product_id` or `date`. Indexes help Cube Store find matching rows faster in these cases.
+- You plan to join one pre-aggregation with another, such as in a [`rollup_join`](/data-modeling/caching/level-2-cache/advanced-pre-aggregations#rollup_join).
+
+
+Adding indexes doesn’t change your data, it simply makes Cube Store more efficient at finding it.
+
+
+### Using indexes in pre-aggregations
+
+Let’s start with a simple `products` model and define a `products_preagg` pre-aggregation.
+
+Here we add an index on `size` within our pre-aggregation, which Cube Store uses to quickly resolve joins and filters involving that indexed column.
+
+```yaml
+cubes:
+ - name: products
+ sql_table: my_db.main.products
+ data_source: default
+
+ dimensions:
+ - name: id
+ sql: id
+ type: number
+ primary_key: true
+ public: true
+
+ - name: name
+ sql: name
+ type: string
+
+ - name: size
+ sql: size
+ type: string
+
+
+ measures:
+ - name: count
+ type: count
+ title: "# of products"
+
+ - name: price
+ type: sum
+ title: Total USD
+ sql: price
+
+ joins:
+ - name: orders
+ sql: "{CUBE.id} = {orders.product_id}"
+ relationship: one_to_many
+
+ pre_aggregations:
+ - name: products_preagg
+ type: rollup
+ dimensions:
+ - size
+ measures:
+ - count
+ - price
+ indexes:
+ - name: product_index
+ columns:
+ - size
+```
+
+In this example:
+
+- The `products_preagg` pre-aggregation stores aggregated products data by size dimension.
+- The index `product_index` on `size` speeds up queries using that dimension.
+- Make sure the column you’re indexing is also included in the pre-aggregation dimensions; otherwise, Cube will return an error like:
+
+ > Error during create table: Column 'products__id' in index 'products_products_preagg_product_index' is not found in table 'products_products_preagg'
+ >
+
+
+Each index adds to the pre-aggregation build time, since all indexes are created during ingestion. Add only the ones you need.
+
+
+Learn more about indexes [here](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#indexes).
+
+## Rollup_join
+
+- Cube can run SQL joins across different data sources. For example, you might have products in [PostgreSQL](/data/credentials#postgres) and orders in [MotherDuck](/data/credentials#motherduck).
+
+- All pre-aggregations so far have been of type rollup (which is the default pre-aggregation type). Cube also supports `rollup_join`, which combines data from two or more rollups coming from different data sources.
+
+- `rollup_join` joins pre-aggregated data inside [cube store](https://cube.dev/docs/product/deployment#cube-store), so you can query it together efficiently.
+
+
+You don’t need a rollup_join to join cubes from the same data source. Just include the other cube’s dimensions and measures directly in your rollup definition as mentioned [here](/data-modeling/caching/level-2-cache/pre-aggregations#performing-joins-across-cubes-in-your-pre-aggregations)
+
+
+Let’s extend the example from the [indexes](/data-modeling/caching/level-2-cache/advanced-pre-aggregations#indexes) section. We’ll keep the products model from the PostgreSQL (default) data source. Since it joins to the orders model on the id column, we’ll need to update the pre-aggregation to include id and name and add an index on it.
+
+```yaml
+
+ pre_aggregations:
+ - name: products_preagg
+ type: rollup
+ dimensions:
+ - id
+ - name
+ - size
+ measures:
+ - count
+ - price
+ indexes:
+ - name: product_index
+ columns:
+ - id
+ refresh_key:
+ every: 1 hour
+```
+
+The new orders model from MotherDuck data source will be added to show how to run analytics across databases.
+
+
+```yaml
+cubes:
+ - name: orders
+ sql_table: public.orders
+ data_source: motherduck
+
+ dimensions:
+ - name: id
+ sql: id
+ type: number
+ primary_key: true
+
+ - name: created_at
+ sql: created_at
+ type: time
+
+ - name: product_id
+ sql: product_id
+ type: number
+ public: false
+
+ measures:
+ - name: count
+ type: count
+ title: "# of orders"
+
+ joins:
+ - name: products
+ sql: "{CUBE.product_id} = {products.id}"
+ relationship: many_to_one
+
+ pre_aggregations:
+ - name: orders_preagg
+ type: rollup
+ dimensions:
+ - product_id
+ - created_at
+ measures:
+ - count
+ time_dimension: CUBE.created_at
+ granularity: day
+ indexes:
+ - name: orders_index
+ columns:
+ - product_id
+ refresh_key:
+ every: 1 hour
+
+ - name: orders_with_products_rollup
+ type: rollup_join
+ dimensions:
+ - products.name
+ - orders.created_at
+ measures:
+ - orders.count
+ time_dimension: orders.created_at
+ granularity: day
+ rollups:
+ - products.products_preagg
+ - orders_preagg
+```
+
+**Things to notice:**
+
+- `orders` uses the **MotherDuck** data source.
+- `products` uses **default** data source (for example, PostgreSQL). Learn more about connecting to multiple datasources [here](/data/credentials).
+- Always reference dimensions explicitly in your joins between models, especially when using a `rollup_join`:
+
+ ```yaml
+ joins:
+ - name: products
+ sql: "{CUBE.product_id} = {products.id}"
+ relationship: many_to_one
+ ```
+
+ If you use `{CUBE}.product_id` or `{products}.id`, Cube will not recognise them as dimension references and will return an error like:
+
+ ```
+ From members are not found in [] for join ...
+ Please make sure join fields are referencing dimensions instead of columns.
+ ```
+
+- Indexes are required when using `rollup_join` pre-aggregations so Cube Store can join multiple pre-aggregations efficiently.
+
+ Without the right index, Cube may fail to plan the join and return an error like:
+
+ ```
+ Error during planning: Can't find index to join table ...
+ Consider creating index ... ON ... (orders__product_id)
+ ```
+
+ Therefore, notice that we have indexed the **join keys on both sides**:
+
+ ```
+ - `products.products_preagg` → index on `id`
+ - `orders.orders_preagg` → index on `product_id`
+ ```
+
+- `orders_with_products_rollup` combines both pre-aggregations inside **Cube Store** using the type `rollup_join`.
+
+ The `rollups:` property lists which pre-aggregations to join together:
+
+ ```yaml
+ rollups:
+ - products.products_preagg
+ - orders_preagg
+ ```
+
+- We also added a `time_dimension` with **day-level granularity** in `orders_with_products_rollup`.
+
+ We expect users to ask questions at a daily level, such as “How many orders were placed per product each day?”. Setting the `time_dimension` to **day** ensures Cube builds and queries this data efficiently.
+
+
+ `rollup_join` is an ephemeral pre-aggregation. It uses the referenced pre-aggregations at query time, so freshness is controlled by them, not the rollup_join itself.
+
+
+- Notice that we’ve set the `refresh_key` to **1 hour** on both referenced pre-aggregations (`products_preagg` and `orders_preagg`) to keep the data up to date. Learn more about refreshing pre-aggregations [here](/data-modeling/caching/level-2-cache/pre-aggregations#refreshing-pre-aggregations).
+
+### How `rollup_join` works in Embeddable
+
+In this example, we’ll find the total **number of orders** for each **product**. The **product name** comes from the `products` model, while the **orders count** comes from the `orders` model.
+
+
+
+**Things to notice:**
+- The query’s FROM clause references both pre-aggregations. This is how Cube joins pre-aggregated datasets from different data sources inside Cube Store.
+
+### Benefits of using `rollup_join`
+
+- Enables **cross-database joins** inside Cube Store
+- Leverages **indexed pre-aggregations** for efficient distributed joins
+- Avoids the need for ETL or database federation
+- Provides consistent, scalable analytics across data sources
+
+Learn more about rollup_join [here](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#rollup_join).
+
+## Next Steps
+
+The next step is to setup Embeddable’s [Caching API](/data-modeling/caching/level-2-cache/caching-api) to refresh pre-aggregations for each of your security contexts. Without it, pre-aggregations will only refresh on demand.
\ No newline at end of file
diff --git a/pages/data-modeling/caching/caching-api.mdx b/pages/data-modeling/caching/level-2-cache/caching-api.mdx
similarity index 94%
rename from pages/data-modeling/caching/caching-api.mdx
rename to pages/data-modeling/caching/level-2-cache/caching-api.mdx
index b5b427f..5cf09f3 100644
--- a/pages/data-modeling/caching/caching-api.mdx
+++ b/pages/data-modeling/caching/level-2-cache/caching-api.mdx
@@ -1,6 +1,6 @@
# Caching API
-Use Embeddable’s Caching API to tell Embeddable which security contexts need refreshing. The refresh frequency comes from the [refresh_key](/data-modeling/caching/pre-aggregations#refreshing-pre-aggregations) you set in your [pre-aggregations](/data-modeling/caching/pre-aggregations) within the data model.
+Use Embeddable’s Caching API to tell Embeddable which security contexts need refreshing. The refresh frequency comes from the [refresh_key](/data-modeling/caching/level-2-cache/pre-aggregations#refreshing-pre-aggregations) you set in your [pre-aggregations](/data-modeling/caching/level-2-cache/pre-aggregations) within the data model.
diff --git a/pages/data-modeling/caching/pre-aggregations.mdx b/pages/data-modeling/caching/level-2-cache/pre-aggregations.mdx
similarity index 68%
rename from pages/data-modeling/caching/pre-aggregations.mdx
rename to pages/data-modeling/caching/level-2-cache/pre-aggregations.mdx
index 281212a..e837efe 100644
--- a/pages/data-modeling/caching/pre-aggregations.mdx
+++ b/pages/data-modeling/caching/level-2-cache/pre-aggregations.mdx
@@ -1,4 +1,4 @@
-# Level 2 cache: pre-aggregations
+# Pre-aggregations 101
In addition to the Level 1 [in-memory cache](/data-modeling/caching/in-memory), Embeddable leverages Cube's Level 2 cache through **pre-aggregations** to enhance query performance and scalability.
@@ -6,7 +6,7 @@ In addition to the Level 1 [in-memory cache](/data-modeling/caching/in-memory),
Pre-aggregations allow you to compute and store aggregations (like sums, averages, and counts, but also a set of dimensions) in advance rather than calculating them on the fly for each query. When a query can be fully answered from a pre-aggregation, results are retrieved directly from the pre-aggregated table instead of scanning raw data, significantly speeding up response times (and reducing cost and load on your databases). This also means that multiple components + charts on your dashboard can retrieve data from just one or a few pre-aggregations (meaning fewer queries).
-> **Pre-aggregations**: You can think of a pre-aggregation as a simple **temporary table** (or **materialized view**) but managed and stored automatically by Embeddable, rather than in your database. You control how often each pre-aggregation is refreshed using the **refresh_key**. Learn more about refreshing pre-aggregations [here](/data-modeling/caching/pre-aggregations#refreshing-pre-aggregations)
+> **Pre-aggregations**: You can think of a pre-aggregation as a simple **temporary table** (or **materialized view**) but managed and stored automatically by Embeddable, rather than in your database. You control how often each pre-aggregation is refreshed using the **refresh_key**. Learn more about refreshing pre-aggregations [here](/data-modeling/caching/level-2-cache/pre-aggregations#refreshing-pre-aggregations)
>
## Why use pre-aggregations?
@@ -33,9 +33,9 @@ To set up a pre-aggregation you need to do three simple steps:
1. define any pre-aggegregations directly in your model files (see example below)
-2. define a refresh schedule using the **refresh_key** within your pre-aggregation. Learn more about refreshing pre-aggregations [here](/data-modeling/caching/pre-aggregations#refreshing-pre-aggregations).
+2. define a refresh schedule using the **refresh_key** within your pre-aggregation. Learn more about refreshing pre-aggregations [here](/data-modeling/caching/level-2-cache/pre-aggregations#refreshing-pre-aggregations).
-3. setup embeddable’s [caching api](/data-modeling/caching/caching-api) to refresh pre-aggregations for each of your security contexts.
+3. setup embeddable’s [caching api](/data-modeling/caching/level-2-cache/caching-api) to refresh pre-aggregations for each of your security contexts.
Each pre-aggregation is a list of dimensions and measures that you want to pre-aggregate. E.g. see the `pre_aggregation` defined at the bottom of this model file:
@@ -619,6 +619,12 @@ In this example the query every hour. If the result has changed since the last r
You can’t combine a CRON string with `sql` in the same refresh key. Doing so will cause a compilation error.
+### Required setup
+
+Setting up Embeddable’s [Caching API](/data-modeling/caching/level-2-cache/caching-api) is required to ensure your pre-aggregations are refreshed on a schedule for each of your security contexts. Without it, pre-aggregations will only refresh on demand.
+
+
+
## Performing joins across cubes in your pre-aggregations
All the examples so far have shown pre-aggregations using just the dimensions and measures from the cube in which it is defined. This is, however, not a requirement at all. You can easily define a pre-aggregation that uses dimensions and measures from multiple cubes, as long as the appropriate `joins` have been defined.
@@ -648,307 +654,8 @@ cubes:
granularity: day
```
-## Handling incremental data loads
-
-Sometimes your source data is updated incrementally for example: only the last few days are reloaded or updated while older data remains unchanged. In these cases, it’s more efficient to build your pre-aggregations incrementally instead of rebuilding the entire dataset.
-
-Using the `customers` cube example:
-
-```yaml
-pre_aggregations:
- - name: daily_count_by_countries
- measures:
- - CUBE.count
- dimensions:
- - CUBE.country
- time_dimension: CUBE.signed_up_at
- granularity: day
- partition_granularity: day
- build_range_start:
- sql: SELECT NOW() - INTERVAL '365 day'
- build_range_end:
- sql: SELECT NOW()
- refresh_key:
- every: 1 day
- incremental: true
- update_window: 3 day
-```
-
-**Things to notice:**
-
-- Most queries focus on the past year, so we limit the build range to 365 days using `build_range_start` and `build_range_end`. Learn more [here](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#build_range_start-and-build_range_end).
-- `partition_granularity: day` splits the pre-aggregation into daily partitions, making it possible to refresh only the days that change instead of rebuilding the whole year.
-- Partitioned pre-aggregations require both a `time_dimension` and a `granularity`. See the Cube docs on [supported values](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#partition_granularity).
-- With `incremental: true` and `update_window: 3 day`, Cube refreshes only the last three partitions each day. Learn more about [`update_window`](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#update_window) and [`incremental`](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#incremental) .
-
-Without `update_window`, Cube refreshes partitions strictly according to `partition_granularity` (in this case, just the last day).
-
-
-## Indexes
-
-Indexes make data retrieval faster. Think of an index as a shortcut that points directly to the relevant rows instead of searching through all the data. This speeds up queries that filter, group, or join on specific fields.
-
-In the context of pre-aggregations, indexes help [Cube Store](https://cube.dev/docs/product/deployment#cube-store) quickly locate and read only the data needed for a query improving performance, especially on large datasets.
-
-Indexes are particularly useful when:
-
-- For larger pre-aggregations, indexes are often required to achieve optimal performance, especially when a query doesn’t use all dimensions from the pre-aggregation.
-- Queries frequently filter on **high-cardinality dimensions**, such as `product_id` or `date`. Indexes help Cube Store find matching rows faster in these cases.
-- You plan to join one pre-aggregation with another, such as in a [`rollup_join`](/data-modeling/caching/pre-aggregations#rollup_join).
-
-
-Adding indexes doesn’t change your data, it simply makes Cube Store more efficient at finding it.
-
-
-### Using indexes in pre-aggregations
-
-Let’s start with a simple `products` model and define a `products_preagg` pre-aggregation.
-
-Here we add an index on `size` within our pre-aggregation, which Cube Store uses to quickly resolve joins and filters involving that indexed column.
-
-```yaml
-cubes:
- - name: products
- sql_table: my_db.main.products
- data_source: default
-
- dimensions:
- - name: id
- sql: id
- type: number
- primary_key: true
- public: true
-
- - name: name
- sql: name
- type: string
-
- - name: size
- sql: size
- type: string
-
-
- measures:
- - name: count
- type: count
- title: "# of products"
-
- - name: price
- type: sum
- title: Total USD
- sql: price
-
- joins:
- - name: orders
- sql: "{CUBE.id} = {orders.product_id}"
- relationship: one_to_many
-
- pre_aggregations:
- - name: products_preagg
- type: rollup
- dimensions:
- - size
- measures:
- - count
- - price
- indexes:
- - name: product_index
- columns:
- - size
-```
-
-In this example:
-
-- The `products_preagg` pre-aggregation stores aggregated products data by size dimension.
-- The index `product_index` on `size` speeds up queries using that dimension.
-- Make sure the column you’re indexing is also included in the pre-aggregation dimensions; otherwise, Cube will return an error like:
-
- > Error during create table: Column 'products__id' in index 'products_products_preagg_product_index' is not found in table 'products_products_preagg'
- >
-
-
-Each index adds to the pre-aggregation build time, since all indexes are created during ingestion. Add only the ones you need.
-
-
-Learn more about indexes [here](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#indexes).
-
-## Rollup_join
-
-- Cube can run SQL joins across different data sources. For example, you might have products in [PostgreSQL](/data/credentials#postgres) and orders in [MotherDuck](/data/credentials#motherduck).
-
-- All pre-aggregations so far have been of type rollup (which is the default pre-aggregation type). Cube also supports `rollup_join`, which combines data from two or more rollups coming from different data sources.
-
-- `rollup_join` joins pre-aggregated data inside [cube store](https://cube.dev/docs/product/deployment#cube-store), so you can query it together efficiently.
-
-
-You don’t need a rollup_join to join cubes from the same data source. Just include the other cube’s dimensions and measures directly in your rollup definition as mentioned [here](/data-modeling/caching/pre-aggregations#performing-joins-across-cubes-in-your-pre-aggregations)
-
-
-Let’s extend the example from the [indexes](/data-modeling/caching/pre-aggregations#indexes) section. We’ll keep the products model from the PostgreSQL (default) data source. Since it joins to the orders model on the id column, we’ll need to update the pre-aggregation to include id and name and add an index on it.
-
-```yaml
-
- pre_aggregations:
- - name: products_preagg
- type: rollup
- dimensions:
- - id
- - name
- - size
- measures:
- - count
- - price
- indexes:
- - name: product_index
- columns:
- - id
- refresh_key:
- every: 1 hour
-```
-
-The new orders model from MotherDuck data source will be added to show how to run analytics across databases.
-
-
-```yaml
-cubes:
- - name: orders
- sql_table: public.orders
- data_source: motherduck
-
- dimensions:
- - name: id
- sql: id
- type: number
- primary_key: true
-
- - name: created_at
- sql: created_at
- type: time
-
- - name: product_id
- sql: product_id
- type: number
- public: false
-
- measures:
- - name: count
- type: count
- title: "# of orders"
-
- joins:
- - name: products
- sql: "{CUBE.product_id} = {products.id}"
- relationship: many_to_one
-
- pre_aggregations:
- - name: orders_preagg
- type: rollup
- dimensions:
- - product_id
- - created_at
- measures:
- - count
- time_dimension: CUBE.created_at
- granularity: day
- indexes:
- - name: orders_index
- columns:
- - product_id
- refresh_key:
- every: 1 hour
-
- - name: orders_with_products_rollup
- type: rollup_join
- dimensions:
- - products.name
- - orders.created_at
- measures:
- - orders.count
- time_dimension: orders.created_at
- granularity: day
- rollups:
- - products.products_preagg
- - orders_preagg
-```
-
-**Things to notice:**
-
-- `orders` uses the **MotherDuck** data source.
-- `products` uses **default** data source (for example, PostgreSQL). Learn more about connecting to multiple datasources [here](/data/credentials).
-- Always reference dimensions explicitly in your joins between models, especially when using a `rollup_join`:
-
- ```yaml
- joins:
- - name: products
- sql: "{CUBE.product_id} = {products.id}"
- relationship: many_to_one
- ```
-
- If you use `{CUBE}.product_id` or `{products}.id`, Cube will not recognise them as dimension references and will return an error like:
-
- ```
- From members are not found in [] for join ...
- Please make sure join fields are referencing dimensions instead of columns.
- ```
-
-- Indexes are required when using `rollup_join` pre-aggregations so Cube Store can join multiple pre-aggregations efficiently.
-
- Without the right index, Cube may fail to plan the join and return an error like:
-
- ```
- Error during planning: Can't find index to join table ...
- Consider creating index ... ON ... (orders__product_id)
- ```
-
- Therefore, notice that we have indexed the **join keys on both sides**:
-
- ```
- - `products.products_preagg` → index on `id`
- - `orders.orders_preagg` → index on `product_id`
- ```
-
-- `orders_with_products_rollup` combines both pre-aggregations inside **Cube Store** using the type `rollup_join`.
-
- The `rollups:` property lists which pre-aggregations to join together:
-
- ```yaml
- rollups:
- - products.products_preagg
- - orders_preagg
- ```
-
-- We also added a `time_dimension` with **day-level granularity** in `orders_with_products_rollup`.
-
- We expect users to ask questions at a daily level, such as “How many orders were placed per product each day?”. Setting the `time_dimension` to **day** ensures Cube builds and queries this data efficiently.
-
-
- `rollup_join` is an ephemeral pre-aggregation. It uses the referenced pre-aggregations at query time, so freshness is controlled by them, not the rollup_join itself.
-
-
-- Notice that we’ve set the `refresh_key` to **1 hour** on both referenced pre-aggregations (`products_preagg` and `orders_preagg`) to keep the data up to date. Learn more about refreshing pre-aggregations [here](/data-modeling/caching/pre-aggregations#refreshing-pre-aggregations).
-
-### How `rollup_join` works in Embeddable
-
-In this example, we’ll find the total **number of orders** for each **product**. The **product name** comes from the `products` model, while the **orders count** comes from the `orders` model.
-
-
-
-**Things to notice:**
-- The query’s FROM clause references both pre-aggregations. This is how Cube joins pre-aggregated datasets from different data sources inside Cube Store.
-
-### Benefits of using `rollup_join`
-
-- Enables **cross-database joins** inside Cube Store
-- Leverages **indexed pre-aggregations** for efficient distributed joins
-- Avoids the need for ETL or database federation
-- Provides consistent, scalable analytics across data sources
-
-Learn more about rollup_join [here](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#rollup_join).
-
## Next Steps
-The next step is to setup Embeddable’s [Caching API](/data-modeling/caching/caching-api) to refresh pre-aggregations for each of your security contexts.
+- The next step is to setup Embeddable’s [Caching API](/data-modeling/caching/level-2-cache/caching-api) to refresh pre-aggregations for each of your security contexts. Without it, pre-aggregations will only refresh on demand.
+
+- If you’d like to go deeper, you can also continue to the [Advanced Pre-aggregations](/data-modeling/caching/level-2-cache/advanced-pre-aggregations) guide, which covers more complex topics and optimisation techniques.
diff --git a/pages/data-modeling/caching/level-2-cache/prerequisites.mdx b/pages/data-modeling/caching/level-2-cache/prerequisites.mdx
new file mode 100644
index 0000000..b2688c4
--- /dev/null
+++ b/pages/data-modeling/caching/level-2-cache/prerequisites.mdx
@@ -0,0 +1,287 @@
+# Prerequisites for pre-aggregations
+
+Before enabling pre-aggregations, it’s recommended to review the performance of these queries. In many cases, straightforward database optimisations are sufficient, and pre-aggregations may not be required.
+
+This guide highlights the areas that most commonly affect query performance and are worth reviewing **before** introducing pre-aggregations.
+
+## Indexes
+
+Indexes are the single most effective way to improve query performance. They allow the database to locate rows efficiently without scanning the entire table.
+
+### What is an index?
+
+Without an index, the database scans every row to find matches:
+
+```
+events table
+-------------------------
+id | account_id | status
+1 | 42 | active
+2 | 17 | active
+3 | 42 | inactive
+4 | 99 | active
+```
+
+```sql
+WHERE account_id = 42
+```
+
+**Things to notice:**
+
+- Every row is checked.
+
+- With an index on `account_id`, the database uses a lookup structure to jump directly to matching values instead of scanning rows one by one.
+
+- Indexes are **sorted**, which allows the database to navigate efficiently.
+
+#### Conceptual example
+
+Assume an index on `account_id` with these values:
+
+```
+12, 17, 42, 42, 58, 73, 99
+```
+
+```
+ [42]
+ / \
+ [17] [73]
+ / \ / \
+ [12] [42] [58] [99]
+```
+
+```sql
+WHERE account_id = 58
+```
+
+- The database follows the sorted structure and reaches `58` in a small number of comparisons.
+
+- The database does **not** scan values sequentially.
+
+#### Why this matters
+
+- Full table scans check every row
+- Index lookups follow a small number of comparisons
+- Lookup cost remains predictable as data grows
+
+### Which columns should be indexed?
+
+Index columns commonly used in:
+
+#### Filters
+
+```sql
+WHERE account_id = 42 AND status = 'active'
+```
+
+Index:
+
+- `account_id`
+- `status` *(if selective)*
+
+#### Joins
+
+Always join on indexed columns.
+
+```sql
+JOIN users ON events.user_id = users.id
+```
+
+Index:
+
+- `events.user_id`
+- `users.id`
+
+#### Grouping
+
+```sql
+GROUP BY account_id, event_date
+```
+
+Index:
+
+- `account_id`
+- `event_date`
+
+#### Composite indexes
+
+If filters commonly appear together:
+
+```sql
+WHERE account_id = 42 AND created_at >= ?
+```
+
+Create a composite index:
+
+```
+(account_id, created_at)
+```
+
+Column order should match common filter patterns.
+
+### Index pitfalls
+
+#### Too many indexes
+
+- Inserts and updates become slower
+- Storage usage increases
+
+Each index must be maintained as data changes.
+
+#### Indexing low-value columns
+
+Indexes are effective only when they significantly reduce the number of rows scanned.
+
+Avoid indexing columns with very few distinct values, such as:
+
+- Boolean flags
+- Status columns where most rows share the same value
+
+If a filter matches most rows, an index adds overhead without improving performance.
+
+#### Breaking index usage
+
+Functions and casts prevent index usage as it changes the indexed value, forcing the database to evaluate every row instead of using the index.
+
+```sql
+WHERE CAST(user_id AS INTEGER) = 42
+```
+
+```sql
+WHERE LOWER(email) = 'test@example.com'
+```
+
+### Primary key vs explicit indexes
+
+- **Primary key index**
+ - Created automatically
+ - Optimised for point lookups
+ - Rarely helps aggregation queries
+- **Explicit indexes**
+ - Created using `CREATE INDEX`
+ - Used for filters, joins, and groupings
+
+Pre-aggregations typically benefit from explicit indexes, not from the primary key.
+
+## Partitioning Large Tables
+
+Partitioning becomes useful as tables grow and queries filter by time.
+
+### What is partitioning?
+
+Partitioning splits a large table into smaller physical segments, most commonly by time.
+
+#### Example: partitioning a 1M-row table
+
+Assume an `events` table with **1,000,000 rows** spanning one year.
+
+Without partitioning:
+
+```
+events (1,000,000 rows)
+└── All data stored together
+```
+
+A query like:
+
+```sql
+WHERE created_at >= '2024-12-01'
+```
+
+Must scan a large portion of the table.
+
+---
+
+With monthly time-based partitioning:
+
+```
+events
+├── events_2024_01 (~83k rows)
+├── events_2024_02 (~83k rows)
+├── ...
+├── events_2024_12 (~83k rows)
+```
+
+The same query scans only:
+
+```
+events_2024_12
+```
+
+- All other partitions are skipped.
+
+- This behaviour is known as **partition pruning**.
+
+#### Why this matters
+
+- Far fewer rows are scanned
+- Less data is read from disk
+- Performance remains predictable as data grows
+
+Partitioning is especially effective when queries consistently filter by time.
+
+#### Best practices
+
+- Partition by time
+- Use daily or monthly partitions
+- Automate partition creation and cleanup
+
+## Review Query Execution Plans
+
+Indexes and partitioning only help **if the database actually uses them**.
+
+Query execution plans show how a query is executed and are the most reliable way to understand where time and resources are spent.
+
+Most databases support:
+
+```sql
+EXPLAIN
+```
+
+Use execution plans to:
+
+- Confirm indexes are being used
+- Verify partition pruning
+- Identify full table scans
+- Validate performance improvements
+
+### Red flags
+
+```
+Seq Scan on events
+Filter: (created_at >= '2024-01-01' AND status = 'active')
+```
+
+A **sequential scan** means:
+
+- The entire table is read
+- Filters are applied after scanning all rows
+- Query cost grows linearly as the table grows
+
+On large tables, this usually indicates:
+
+- Missing or unusable indexes
+- Functions preventing index usage
+- Partitioning not being applied
+
+Sequential scans are expected on small tables. They are problematic on large, frequently queried tables.
+
+### Good signs
+
+```
+Index Scan using idx_events_created_at on events
+Index Cond: (created_at >= '2024-01-01')
+```
+
+This indicates:
+
+- Indexes are being used
+- Filters are applied early
+- Fewer rows are scanned
+- Performance scales predictably
+
+Execution plans should be reviewed before and after changes to confirm impact.
+
+## Next Steps
+
+Once you have these optimisations in place to improve query performance, the next step is to use [pre-aggregations](/data-modeling/caching/level-2-cache/pre-aggregations) to further speed up your queries.
\ No newline at end of file
diff --git a/pages/data-modeling/defining-models.mdx b/pages/data-modeling/defining-models.mdx
index 7fda613..c3b42a3 100644
--- a/pages/data-modeling/defining-models.mdx
+++ b/pages/data-modeling/defining-models.mdx
@@ -110,7 +110,7 @@ You can also define other parameters, including [joins](./joins) between models
- `description`: A short note or summary of what the model represents or how it should be used. Useful for keeping track of intentions or business context.
-- `pre_aggregations`: Optional configurations for creating and managing materialized views or rollups. This helps improve performance by caching common aggregations [learn more](/data-modeling/caching/pre-aggregations). Note that pre-aggregations are **not** currently supported in the **Data Model Editor**. To use them, please export your data models and add them to your code repository (learn more [here](/data-modeling/getting-setup#defining-models-in-code)).
+- `pre_aggregations`: Optional configurations for creating and managing materialized views or rollups. This helps improve performance by caching common aggregations [learn more](/data-modeling/caching/level-2-cache/pre-aggregations). Note that pre-aggregations are **not** currently supported in the **Data Model Editor**. To use them, please export your data models and add them to your code repository (learn more [here](/data-modeling/getting-setup#defining-models-in-code)).
- `views`: Views are very useful if you want to create different subsets of your cube models for different use-cases, or if you want fine-grained control of which join path should be used when. Learn more [here](https://cube.dev/docs/product/data-modeling/concepts#views).
diff --git a/pages/data/using-cube-cloud-with-dbt.mdx b/pages/data/using-cube-cloud-with-dbt.mdx
index 0a176e8..acaeb75 100644
--- a/pages/data/using-cube-cloud-with-dbt.mdx
+++ b/pages/data/using-cube-cloud-with-dbt.mdx
@@ -1,6 +1,6 @@
# Using Cube Cloud with dbt
-If you already use [**dbt**](https://docs.getdbt.com/docs/build/projects) for your data modelling, you don’t need to start over when adopting [**Cube Cloud**](https://cube.dev/docs/product/getting-started#getting-started-with-cube-cloud). Cube Cloud integrates directly with [**dbt**](https://docs.getdbt.com/docs/build/projects) projects, letting you reuse your existing models and build metrics, joins, and [pre-aggregations](/data-modeling/caching/pre-aggregations#what-are-pre-aggregations) on top of them. Finally, you can create [views](/data-modeling/views) to organize and present your data model, and expose it to Embeddable using the [Data Provider API](/data/cube-cloud#data-provider-api).
+If you already use [**dbt**](https://docs.getdbt.com/docs/build/projects) for your data modelling, you don’t need to start over when adopting [**Cube Cloud**](https://cube.dev/docs/product/getting-started#getting-started-with-cube-cloud). Cube Cloud integrates directly with [**dbt**](https://docs.getdbt.com/docs/build/projects) projects, letting you reuse your existing models and build metrics, joins, and [pre-aggregations](/data-modeling/caching/level-2-cache/pre-aggregations#what-are-pre-aggregations) on top of them. Finally, you can create [views](/data-modeling/views) to organize and present your data model, and expose it to Embeddable using the [Data Provider API](/data/cube-cloud#data-provider-api).
Refer to the diagram below to see how dbt and Cube Cloud work together. From data modelling in dbt to semantic enrichment in Cube.
diff --git a/pages/development/loading-data.mdx b/pages/development/loading-data.mdx
index 2fe976f..09f1512 100644
--- a/pages/development/loading-data.mdx
+++ b/pages/development/loading-data.mdx
@@ -172,7 +172,7 @@ OFFSET
],
```
-- `timezone` is the time zone you want to aggregate the **time dimensions** by. Must be a string in [tz database format](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones), e.g. 'America/Los_Angeles', 'Europe/Paris', 'Australia/Sydney', 'UTC', etc. Defaults to 'UTC'. If you're using [Caching API](/data-modeling/caching/caching-api), note that [pre-aggregations](/data-modeling/caching/pre-aggregations) are time zone aware, and so, for example, a query run with timezone 'Europe/Paris' cannot use a pre-aggregation built with time zone 'America/New_York'.
Our [Vanilla Components library](https://github.com/embeddable-hq/vanilla-components-v1/) now supports time zones out of the box as of version `1.2.1`. All components that may have time/date-related data (e.g. line charts, date pickers, etc.) have been updated to accept time zones via Client Context. This allows you to set a time zone on a per-user basis from your application.
+- `timezone` is the time zone you want to aggregate the **time dimensions** by. Must be a string in [tz database format](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones), e.g. 'America/Los_Angeles', 'Europe/Paris', 'Australia/Sydney', 'UTC', etc. Defaults to 'UTC'. If you're using [Caching API](/data-modeling/caching/level-2-cache/caching-api), note that [pre-aggregations](/data-modeling/caching/level-2-cache/pre-aggregations) are time zone aware, and so, for example, a query run with timezone 'Europe/Paris' cannot use a pre-aggregation built with time zone 'America/New_York'.
Our [Vanilla Components library](https://github.com/embeddable-hq/vanilla-components-v1/) now supports time zones out of the box as of version `1.2.1`. All components that may have time/date-related data (e.g. line charts, date pickers, etc.) have been updated to accept time zones via Client Context. This allows you to set a time zone on a per-user basis from your application.
Note that the JavaScript `Date` object operates in the local time zone of the user's environment or in UTC and doesn't natively support time zones. When developing your own components, it's easy for dates to accidentally get converted out of the intended time zone, so it's important to check thoroughly to be sure they're passing the correct timestamp and time zone combination to `loadData`.
diff --git a/pages/index.mdx b/pages/index.mdx
index 34d24d2..c7c9fc2 100644
--- a/pages/index.mdx
+++ b/pages/index.mdx
@@ -41,7 +41,7 @@ Embeddable makes it easy to **build**, **edit**, **deploy**, and **scale custome
-- **Performance & Scalability** - Leverage caching and [advanced pre-aggregations](/data-modeling/caching/pre-aggregations) to keep your data fast, even at scale.
+- **Performance & Scalability** - Leverage caching and [advanced pre-aggregations](/data-modeling/caching/level-2-cache/pre-aggregations) to keep your data fast, even at scale.
## How It Works