embeddable-hq · Nauman1011 · Jan 27, 2026 · Jan 23, 2026 · Jan 27, 2026
@@ -43,7 +43,7 @@ The version you save can be:
  - promoted to Development/Staging/Production via the [Publish flow](/dashboards/publishing) 
  - targeted via the [Tokens API](/deployment/tokens-api) (`savedVersion`), 
  - accessed in the [Embeddable API](/deployment/embeddables-api) metadata, and 
- - used by the [Caching API](/data-modeling/caching/caching-api) when refreshing [pre‑aggregations](/data-modeling/caching/pre-aggregations) for each security context, based on the **refresh_key** you’ve set.  
+ - used by the [Caching API](/data-modeling/caching/level-2-cache/caching-api) when refreshing [pre‑aggregations](/data-modeling/caching/level-2-cache/pre-aggregations) for each security context, based on the **refresh_key** you’ve set.  
 
 ## Version Picker
 

@@ -1,6 +1,5 @@
 {
-    "in-memory": "In-memory cache",
-    "pre-aggregations": "Pre-aggregations",
-    "caching-api": "Caching API"
+    "in-memory": "Level 1 cache: in-memory",
+    "level-2-cache": "Level 2 cache: pre-aggregations"
   }
 
@@ -8,4 +8,4 @@ To deliver fast and responsive analytics, Embeddable leverages Cube's **Level 1
 
 You can tell Cube to evaluate and invalidate the Level 1 Cache using **Refresh Keys**. Learn more [here](https://cube.dev/docs/product/caching#refresh-keys). 
 
-Cube does not recommend changing the default in-memory caching configuration unless necessary. Instead, to speed up query performance, you should use [pre-aggregations](/data-modeling/caching/pre-aggregations).
+To further improve query performance, we recommend using [pre-aggregations](/data-modeling/caching/level-2-cache/pre-aggregations).
@@ -0,0 +1,7 @@
+{
+    "prerequisites": "Prerequisites",
+    "pre-aggregations": "Pre-aggregations 101",
+    "advanced-pre-aggregations": "Advanced Pre-aggregations",
+    "caching-api": "Caching API"
+}
+
@@ -0,0 +1,309 @@
+# Advanced Pre-aggregations
+
+This guide covers a set of advanced pre-aggregation topics that help you optimise performance and handle more complex data scenarios.
+
+## Handling incremental data loads
+
+Sometimes your source data is updated incrementally for example: only the last few days are reloaded or updated while older data remains unchanged. In these cases, it’s more efficient to build your pre-aggregations incrementally instead of rebuilding the entire dataset.
+
+Using the `customers` cube example:
+
+```yaml
+pre_aggregations:
+  - name: daily_count_by_countries
+    measures:
+      - CUBE.count
+    dimensions:
+      - CUBE.country
+    time_dimension: CUBE.signed_up_at
+    granularity: day
+    partition_granularity: day
+    build_range_start:
+      sql: SELECT NOW() - INTERVAL '365 day'
+    build_range_end:
+      sql: SELECT NOW()
+    refresh_key:
+      every: 1 day
+      incremental: true
+      update_window: 3 day
+```
+
+**Things to notice:**
+
+- Most queries focus on the past year, so we limit the build range to 365 days using `build_range_start` and `build_range_end`. Learn more [here](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#build_range_start-and-build_range_end).
+- `partition_granularity: day` splits the pre-aggregation into daily partitions, making it possible to refresh only the days that change instead of rebuilding the whole year.
+- Partitioned pre-aggregations require both a `time_dimension` and a `granularity`. See the Cube docs on [supported values](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#partition_granularity).
+- With `incremental: true` and `update_window: 3 day`, Cube refreshes only the last three partitions each day. Learn more about [`update_window`](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#update_window) and [`incremental`](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#incremental) .
+
+<Callout> 
+Without `update_window`, Cube refreshes partitions strictly according to `partition_granularity` (in this case, just the last day).
+</Callout>
+
+## Indexes
+
+Indexes make data retrieval faster. Think of an index as a shortcut that points directly to the relevant rows instead of searching through all the data. This speeds up queries that filter, group, or join on specific fields.
+
+In the context of pre-aggregations, indexes help [Cube Store](https://cube.dev/docs/product/deployment#cube-store) quickly locate and read only the data needed for a query improving performance, especially on large datasets.
+
+Indexes are particularly useful when:
+
+- For larger pre-aggregations, indexes are often required to achieve optimal performance, especially when a query doesn’t use all dimensions from the pre-aggregation.
+- Queries frequently filter on **high-cardinality dimensions**, such as `product_id` or `date`. Indexes help Cube Store find matching rows faster in these cases.
+- You plan to join one pre-aggregation with another, such as in a [`rollup_join`](/data-modeling/caching/level-2-cache/advanced-pre-aggregations#rollup_join).
+
+<Callout emoji="💡">
+Adding indexes doesn’t change your data, it simply makes Cube Store more efficient at finding it.
+</Callout>
+
+### Using indexes in pre-aggregations
+
+Let’s start with a simple `products` model and define a `products_preagg` pre-aggregation.
+
+Here we add an index on `size` within our pre-aggregation, which Cube Store uses to quickly resolve joins and filters involving that indexed column.
+
+```yaml
+cubes:
+  - name: products
+    sql_table: my_db.main.products
+    data_source: default
+
+    dimensions:
+      - name: id
+        sql: id
+        type: number
+        primary_key: true
+        public: true
+
+      - name: name
+        sql: name
+        type: string
+
+      - name: size
+        sql: size
+        type: string
+
+
+    measures:
+      - name: count
+        type: count
+        title: "# of products"
+
+      - name: price
+        type: sum
+        title: Total USD
+        sql: price
+
+    joins:
+      - name: orders
+        sql: "{CUBE.id} = {orders.product_id}"
+        relationship: one_to_many
+
+    pre_aggregations:
+      - name: products_preagg
+        type: rollup
+        dimensions:
+          - size
+        measures:
+          - count
+          - price
+        indexes:
+          - name: product_index
+            columns:
+              - size
+```
+
+In this example:
+
+- The `products_preagg` pre-aggregation stores aggregated products data by size dimension.
+- The index `product_index` on `size` speeds up queries using that dimension.
+- Make sure the column you’re indexing is also included in the pre-aggregation dimensions; otherwise, Cube will return an error like:
+
+  > Error during create table: Column 'products__id' in index 'products_products_preagg_product_index' is not found in table 'products_products_preagg'
+  >   
+
+<Callout emoji="💡">
+Each index adds to the pre-aggregation build time, since all indexes are created during ingestion. Add only the ones you need.
+</Callout>
+
+Learn more about indexes [here](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#indexes).
+
+## Rollup_join
+
+- Cube can run SQL joins across different data sources. For example, you might have products in [PostgreSQL](/data/credentials#postgres) and orders in [MotherDuck](/data/credentials#motherduck).
+
+- All pre-aggregations so far have been of type rollup (which is the default pre-aggregation type). Cube also supports `rollup_join`, which combines data from two or more rollups coming from different data sources.
+
+- `rollup_join` joins pre-aggregated data inside [cube store](https://cube.dev/docs/product/deployment#cube-store), so you can query it together efficiently.
+
+<Callout>
+You don’t need a rollup_join to join cubes from the same data source. Just include the other cube’s dimensions and measures directly in your rollup definition as mentioned [here](/data-modeling/caching/level-2-cache/pre-aggregations#performing-joins-across-cubes-in-your-pre-aggregations)
+</Callout>
+
+Let’s extend the example from the [indexes](/data-modeling/caching/level-2-cache/advanced-pre-aggregations#indexes) section. We’ll keep the products model from the PostgreSQL (default) data source. Since it joins to the orders model on the id column, we’ll need to update the pre-aggregation to include id and name and add an index on it.
+
+```yaml
+
+    pre_aggregations:
+      - name: products_preagg
+        type: rollup
+        dimensions:
+          - id
+          - name
+          - size
+        measures:
+          - count
+          - price
+        indexes:
+          - name: product_index
+            columns:
+              - id
+        refresh_key:
+	        every: 1 hour
+```
+
+The new orders model from MotherDuck data source will be added to show how to run analytics across databases.
+
+
+```yaml
+cubes:
+  - name: orders
+    sql_table: public.orders
+    data_source: motherduck
+
+    dimensions:
+      - name: id
+        sql: id
+        type: number
+        primary_key: true
+
+      - name: created_at
+        sql: created_at
+        type: time
+
+      - name: product_id
+        sql: product_id
+        type: number
+        public: false
+
+    measures:
+      - name: count
+        type: count
+        title: "# of orders"
+
+    joins:
+      - name: products
+        sql: "{CUBE.product_id} = {products.id}"
+        relationship: many_to_one
+
+    pre_aggregations:
+      - name: orders_preagg
+        type: rollup
+        dimensions:
+          - product_id
+          - created_at
+        measures:
+          - count
+        time_dimension: CUBE.created_at
+        granularity: day
+        indexes:
+          - name: orders_index
+            columns:
+              - product_id
+        refresh_key:
+	        every: 1 hour
+
+      - name: orders_with_products_rollup
+        type: rollup_join
+        dimensions:
+          - products.name
+          - orders.created_at
+        measures:
+          - orders.count
+        time_dimension: orders.created_at
+        granularity: day
+        rollups:
+          - products.products_preagg
+          - orders_preagg
+```
+
+**Things to notice:**
+
+- `orders` uses the **MotherDuck** data source.
+- `products` uses **default** data source (for example, PostgreSQL). Learn more about connecting to multiple datasources [here](/data/credentials).
+- Always reference dimensions explicitly in your joins between models, especially when using a `rollup_join`:
+
+    ```yaml
+        joins:
+          - name: products 
+            sql: "{CUBE.product_id} = {products.id}"
+            relationship: many_to_one
+    ```
+
+    If you use `{CUBE}.product_id` or `{products}.id`, Cube will not recognise them as dimension references and will return an error like:
+
+    ```
+    From members are not found in [] for join ...
+    Please make sure join fields are referencing dimensions instead of columns.
+    ```
+
+- Indexes are required when using `rollup_join` pre-aggregations so Cube Store can join multiple pre-aggregations efficiently.
+
+    Without the right index, Cube may fail to plan the join and return an error like:
+
+    ```
+    Error during planning: Can't find index to join table ...
+    Consider creating index ... ON ... (orders__product_id)
+    ```
+
+    Therefore, notice that we have indexed the **join keys on both sides**:
+
+    ```
+    - `products.products_preagg` → index on `id`
+    - `orders.orders_preagg` → index on `product_id`
+    ```
+
+- `orders_with_products_rollup` combines both pre-aggregations inside **Cube Store** using the type `rollup_join`.
+
+    The `rollups:` property lists which pre-aggregations to join together:
+
+    ```yaml
+    rollups:
+      - products.products_preagg
+      - orders_preagg
+    ```
+
+- We also added a `time_dimension` with **day-level granularity** in `orders_with_products_rollup`.
+
+    We expect users to ask questions at a daily level, such as “How many orders were placed per product each day?”. Setting the `time_dimension` to **day** ensures Cube builds and queries this data efficiently.
+
+<Callout emoji="💡">
+  `rollup_join` is an ephemeral pre-aggregation. It uses the referenced pre-aggregations at query time, so freshness is controlled by them, not the rollup_join itself.
+</Callout>
+
+- Notice that we’ve set the `refresh_key` to **1 hour** on both referenced pre-aggregations (`products_preagg` and `orders_preagg`) to keep the data up to date. Learn more about refreshing pre-aggregations [here](/data-modeling/caching/level-2-cache/pre-aggregations#refreshing-pre-aggregations).
+
+### How `rollup_join` works in Embeddable
+
+In this example, we’ll find the total **number of orders** for each **product**. The **product name** comes from the `products` model, while the **orders count** comes from the `orders` model.
+
+<VideoComponent
+    src="/video/rollup_join_example.mp4"
+    width="1250"
+    height="854"
+/>
+
+**Things to notice:**
+- The query’s FROM clause references both pre-aggregations. This is how Cube joins pre-aggregated datasets from different data sources inside Cube Store.
+
+### Benefits of using `rollup_join`
+
+- Enables **cross-database joins** inside Cube Store
+- Leverages **indexed pre-aggregations** for efficient distributed joins
+- Avoids the need for ETL or database federation
+- Provides consistent, scalable analytics across data sources
+
+Learn more about rollup_join [here](https://cube.dev/docs/product/data-modeling/reference/pre-aggregations#rollup_join).
+
+## Next Steps
+
+The next step is to setup Embeddable’s [Caching API](/data-modeling/caching/level-2-cache/caching-api) to refresh pre-aggregations for each of your security contexts. Without it, pre-aggregations will only refresh on demand.
@@ -1,6 +1,6 @@
 # Caching API
 
-Use Embeddable’s Caching API to tell Embeddable which security contexts need refreshing. The refresh frequency comes from the [refresh_key](/data-modeling/caching/pre-aggregations#refreshing-pre-aggregations) you set in your [pre-aggregations](/data-modeling/caching/pre-aggregations) within the data model.
+Use Embeddable’s Caching API to tell Embeddable which security contexts need refreshing. The refresh frequency comes from the [refresh_key](/data-modeling/caching/level-2-cache/pre-aggregations#refreshing-pre-aggregations) you set in your [pre-aggregations](/data-modeling/caching/level-2-cache/pre-aggregations) within the data model.
 
 <Bruno/>
Original file line number	Diff line number	Diff line change
Expand Up		@@ -8,4 +8,4 @@ To deliver fast and responsive analytics, Embeddable leverages Cube's **Level 1

		You can tell Cube to evaluate and invalidate the Level 1 Cache using Refresh Keys. Learn more [here](https://cube.dev/docs/product/caching#refresh-keys).

		Cube does not recommend changing the default in-memory caching configuration unless necessary. Instead, to speed up query performance, you should use [pre-aggregations](/data-modeling/caching/pre-aggregations).
		To further improve query performance, we recommend using [pre-aggregations](/data-modeling/caching/level-2-cache/pre-aggregations).