cube-js
diff --git a/‎docs/pages/product/caching.mdx‎
Lines changed: 34 additions & 2 deletions b/‎docs/pages/product/caching.mdx‎
Lines changed: 34 additions & 2 deletions
diff --git a/‎docs/pages/product/caching/using-pre-aggregations.mdx‎
Lines changed: 105 additions & 64 deletions b/‎docs/pages/product/caching/using-pre-aggregations.mdx‎
Lines changed: 105 additions & 64 deletions
diff --git a/‎docs/pages/product/deployment/cloud.mdx‎
Lines changed: 3 additions & 1 deletion b/‎docs/pages/product/deployment/cloud.mdx‎
Lines changed: 3 additions & 1 deletion
diff --git a/‎docs/pages/product/deployment/cloud/_meta.js‎
Lines changed: 1 addition & 0 deletions b/‎docs/pages/product/deployment/cloud/_meta.js‎
Lines changed: 1 addition & 0 deletions
@@ -31,7 +31,7 @@ We do not recommend changing the default **in-memory** caching configuration
 unless it is necessary. To speed up query performance, consider using
 **pre-aggregations**.
 
-## Pre-Aggregations
+## Pre-aggregations
 
 Pre-aggregations is a layer of the aggregated data built and refreshed by Cube.
 It can dramatically improve the query performance and provide a higher
@@ -106,7 +106,7 @@ cube(`orders`, {
 
 </CodeTabs>
 
-## In-memory Cache
+## In-memory cache
 
 Cube caches the results of executed queries using in-memory cache. The cache key
 is a generated SQL statement with any existing query-dependent pre-aggregations.
@@ -253,6 +253,34 @@ timestamp, and the time spent to build the pre-aggregation. You can also inspect
 every pre-aggregation's details: the list of queries it serves and all its
 versions.
 
+### Cache type
+
+Any query that is fulfilled by Cube will use one of the following cache types:
+
+- **[Pre-aggregations](#pre-aggregations) in Cube Store.** This is the most
+advantageous and performant option.
+- **Pre-aggregations in Cube Store with a suboptimal query plan.** This cache
+type indicates that queries still benefit from pre-aggregations in Cube Store
+but it's possible to get a performance boost by [using indexes][ref-indexes].
+- **Pre-aggregations in the data source.** This cache type indicates that
+queries don't benefit from pre-aggregations in Cube Store and it's possible
+to get a massive performance boost by using Cube Store as [pre-aggregation
+storage][ref-storage].
+- **[In-memory cache.](#in-memory-cache)** This cache type indicates that
+queries don't benefit from pre-aggregations at all. Queries directly hit the
+upstream data source and in-memory cache is used to speed up the execution of
+identical queries that arrive within a short period of time.
+- **No cache.** This cache type indicates queries that directly hit the
+upstream data source and have the worst performance possible.
+
+In [Query History][ref-query-history] and throughout Cube Cloud, colored bolt
+icons are used to indicate the cache type. Also, [Performance
+Insights][ref-perf-insights] provide an overview of API requests by specific
+cache types.
+
+<Screenshot src="https://ucarecdn.com/cd63c899-3f0d-444d-9476-7d60001ff113/"/>
+
+
 [link-cube-cloud]: https://cube.dev/cloud
 [ref-config-preagg-schema]:
   /reference/configuration/config#preaggregationsschema
@@ -264,3 +292,7 @@ versions.
 [ref-schema-ref-cube-refresh-key]:
   /reference/data-model/cube#refresh_key
 [ref-schema-ref-preaggs]: /reference/data-model/pre-aggregations
+[ref-query-history]: /product/workspace/query-history#inspecting-api-queries
+[ref-perf-insights]: /product/workspace/performance#cache-type
+[ref-indexes]: /product/caching/using-pre-aggregations#using-indexes
+[ref-storage]: /product/caching/using-pre-aggregations#pre-aggregations-storage
@@ -10,7 +10,7 @@ configuration options to consider. Please make sure to also check [the
 Pre-Aggregations reference in the data modeling
 section][ref-schema-ref-preaggs].
 
-## Refresh Strategy
+## Refresh strategy
 
 Refresh strategy can be customized by setting the
 [`refresh_key`][ref-schema-ref-preaggs-refresh-key] property for the
@@ -144,7 +144,7 @@ When `every` and `sql` are used together, Cube will run the query from the `sql`
 property on an interval defined by the `every` property. If the query returns
 new results, then the pre-aggregation will be refreshed.
 
-## Rollup Only Mode
+## Rollup-only mode
 
 To make Cube _only_ serve requests from pre-aggregations, the
 [`CUBEJS_ROLLUP_ONLY`][ref-config-env-rolluponly] environment variable can be
@@ -240,47 +240,66 @@ Alternatively, if you want to explicitly introduce key partitioning, you can use
 Each orchestrator ID can use a different pre-aggregation schema, so you may define those based on the partitioning key you want to introduce.
 This technique, together with multi-router Cube Store approach, allows you to achieve linear scaling on the partitioning key of your choice.
 
-## Using Indexes
+## Using indexes
+
+[Indexes][ref-ref-indexes] are sorted copies of pre-aggregation data.
+
+**When you define a pre-aggregation without any explicit indexes, the default
+index is created.** In this index, dimensions come first, time dimensions come
+second, and measures come last.
+
+When you define additional indexes, you don't incur any additional costs on
+the data warehouse side. However, the pre-aggregation build time for a
+particular pre-aggregation increases with each index.
 
 ### When to use indexes?
 
-When you define pre-aggregation without any indexes, the default index will be created.
-For the default index, dimensions come first, time dimensions come second, and measures come last.
-At query time, if the default index can't be selected for merge sort scan, then hash aggregation would be used.
-It usually means that the full table needs to be scanned to get query results.
-And it's usually no big deal if the pre-aggregation table is only several MB in size.
-Once you go over, indexes are usually required to achieve optimal performance.
-Especially if not all columns from pre-aggregation are used in a particular query.
-You can read more about indexes [here][ref-schema-ref-preaggs-index].
-
-### Best Practices
-
-To maximize performance, you can introduce an index per type of query so the set
-of dimensions used in the query overlap as much as possible with the ones
-defined in the index. 
-As indexes are sorted copies of the data, you don't incur any additional costs on the data warehouse side, however, you multiply your build time for a given pre-aggregation with every index added.
-Measures are traditionally only used in indexes if you
-plan to filter a measured value and the cardinality of the possible values of
-the measure is low.
-
-The order in which columns are specified in the index is **very** important;
+At query time, if the default index can't be selected for a merge sort scan,
+then a less performant hash aggregation would be used. It usually means that
+the full table needs to be scanned to get query results.
+
+It usually doesn't make much difference if the pre-aggregation table is only
+several MBs in size. However, for larger pre-aggregations, indexes are usually
+required to achieve optimal performance, especially if not all dimensions from
+a pre-aggregation are used in a particular query.
+
+### Best practices
+
+Most pre-aggregations represent [additive][ref-additivity] rollups. For such
+rollups, **the rule of thumb is that, for most queries, there should be
+at least one index that makes a particular query scan very little amount of
+data,** which makes it very fast. (There are exceptions to this rule like
+top-k queries or queries with only low selectivity range filters. Optimization
+for these use cases usually involves remodeling data and queries.)
+
+To maximize performance, you can introduce an index per each query type so
+that the set of dimensions used in a query overlaps as much as possible with
+the set of dimensions in the index. Measures are usually only used in indexes
+if you plan to filter on a measure value and the cardinality of the possible
+values of the measure is low.
+
+The order in which dimensions are specified in the index is **very** important;
 suboptimal ordering can lead to diminished performance. To improve the
-performance of an index the main thing to consider is the order of the columns
-defined in it.
+performance of an index the main thing to consider is its order of dimensions.
+The rule of thumb for dimension order is as follows:
 
-The key property of additive rollups is that for most queries, there's at least one index that makes a particular query scan very little amount of data which makes it very fast.
-There however exceptions to this rule like TopK queries, use of low selectivity range filters without high selectivity single value filters, etc.
-Optimization of those use cases usually should be handled by remodeling data and queries.
+- Dimensions used in high selectivity, single-value filters come first.
+- Dimensions used in `GROUP BY` come second.
+- Everything else used in the query comes in the end, including dimensions
+used in low selectivity, multiple-value filters.
 
-The rule of thumb for index column order is:
+It might sound counter-intuitive to have dimensions used in `GROUP BY` before
+dimensions used in multiple-value filters. However, Cube Store always performs
+scans on sorted data, and if `GROUP BY` matches index ordering, merge
+sort-based algorithms are used for querying, which are usually much faster
+than hash-based `GROUP BY` in case index ordering doesn't match the query.
 
-- Single value filters come first
-- `GROUP BY` columns come second
-- Everything else used in the query comes afterward
+If in doubt, always [use `EXPLAIN` and `EXPLAIN ANALYZE`](#explain-queries)
+to figure out the final query plan.
 
-**Example:**
+#### Example
 
-Suppose you have a pre-aggregation that has millions of rows with the following
+Suppose you have a pre-aggregation that has millions of rows and the following
 structure:
 
 | timestamp           | product_name  | product_category | zip_code | order_total |
@@ -291,7 +310,7 @@ structure:
 | 2023-01-01 11:00:00 | Keyboard      | Electronics      | 88524    | 2000        |
 | 2023-01-01 11:00:00 | Plastic Chair | Furniture        | 88524    | 3000        |
 
-The pre-aggregation code would look as follows:
+The pre-aggregation definition looks as follows:
 
 <CodeTabs>
 
@@ -352,10 +371,10 @@ cubes:
 
 </CodeTabs>
 
-You run the following query on a regular basis, with the only difference between
-queries being the filter values:
+You run the following query on a regular basis, with the only difference
+between queries being the filter values:
 
-```JSON
+```json
 {
   "measures": [
     "orders.order_total"
@@ -397,15 +416,15 @@ queries being the filter values:
 }
 ```
 
-After running this on a dataset with millions of records you find that it's
-taking a long time to run, so you decide to add an index to target this specific
-query. Taking into account the best practices mentioned previously you should
-define an index as follows:
+After running this query on a dataset with millions of records you find that
+it's taking too long to run, so you decide to add an index to target this
+specific query. Taking into account the best practices, you should define an
+index as follows:
 
 <CodeTabs>
 
 ```javascript
-cube("orders", {
+cube(`orders`, {
   // ...
 
   pre_aggregations: {
@@ -414,7 +433,11 @@ cube("orders", {
 
       indexes: {
         category_productname_zipcode_index: {
-          columns: [product_category, zip_code, product_name],
+          columns: [
+            product_category,
+            zip_code,
+            product_name
+          ],
         },
       },
     },
@@ -441,7 +464,15 @@ cubes:
 
 </CodeTabs>
 
-Then the data within `category_productname_zipcode_index` would look like:
+Here's why:
+
+- The `product_category` dimension comes first as it's used in a single-value
+filter.
+- Then, the `zip_code` dimension comes second as it's used in `GROUP BY`.
+- The `product_name` dimension comes last as it's used in a multiple-value
+filter.
+
+The data within `category_productname_zipcode_index` would look as follows:
 
 | product_category | zip_code | product_name  | timestamp           | order_total |
 | ---------------- | -------- | ------------- | ------------------- | ----------- |
@@ -451,23 +482,30 @@ Then the data within `category_productname_zipcode_index` would look like:
 | Electronics      | 88524    | Keyboard      | 2023-01-01 11:00:00 | 2000        |
 | Furniture        | 88524    | Plastic Chair | 2023-01-01 11:00:00 | 3000        |
 
-`product_category` column comes first as it's a single value filter. Then
-`zip_code` as it's `GROUP BY` column. `product_name` comes last as it's a
-multiple value filter.
+### Aggregating indexes
+
+Aggregating indexes can be defined as well. Such indexes contain **only**
+dimensions and pre-aggregated measures from the pre-aggregation definition.
+
+Queries with the following characteristics can target aggregating indexes:
 
-It might sound counter-intuitive to have `GROUP BY` columns before filter ones,
-however Cube Store always performs scans on sorted data, and if `GROUP BY`
-matches index ordering, merge sort-based algorithms are used for querying, which
-are usually much faster than hash-based group by in case of index ordering
-doesn't match the query. If in doubt, always use `EXPLAIN` and `EXPLAIN ANALYZE`
-in Cube Store to figure out the final query plan.
+- They cannot make use of any `filters` other than for dimensions that are
+included in that index.
+- **All** dimensions used in the query must be defined in the aggregating
+index.
 
-### Aggregated indexes
+Queries that do not have the characteristics above can still make use of
+regular indexes so that their performance can still be optimized.
 
-Aggregated indexes can be defined as well. You can read more about them
-[here][ref-schema-ref-preaggs-index].
+**In other words, an aggregating index is a rollup of data in a rollup table.**
+Data needs to be downloaded from the upstream data source as many times as
+many pre-aggregations you have. Compared to having multiple pre-aggregations,
+having a single pre-aggregation with multiple aggregating indexes gives you
+pretty much the same performance from the Cube Store side but multiple times
+less cost from a data warehouse side.
 
-Example:
+Aggregating indexes are defined by using the [`type` option][ref-ref-index-type]
+in the index definition:
 
 <CodeTabs>
 
@@ -512,20 +550,20 @@ cubes:
 
 </CodeTabs>
 
-And the data for `zip_code_index` would look like the following:
+The data for `zip_code_index` would look as follows:
 
 | zip_code | order_total |
 | -------- | ----------- |
 | 88523    | 3800        |
 | 88524    | 5000        |
 
-## Inspecting Pre-Aggregations
+## Inspecting pre-aggregations
 
 Cube Store partially supports the MySQL protocol. This allows you to execute
 simple queries using a familiar SQL syntax. You can connect using the MySQL CLI
 client, for example:
 
-```bash{promptUser: user}
+```bash
 mysql -h <CUBESTORE_IP> --user=cubestore -pcubestore
 ```
 
@@ -558,7 +596,7 @@ SELECT * FROM information_schema.tables;
 These pre-aggregations are stored as Parquet files under the `.cubestore/`
 folder in the project root during development.
 
-### EXPLAIN queries
+### `EXPLAIN` queries
 
 Cube Store's MySQL protocol also supports `EXPLAIN` and `EXPLAIN ANALYZE`
 queries both of which are useful for determining how much processing a query
@@ -610,7 +648,7 @@ Sometimes, there can be exceptions to this rule.
 For example, a total count query run on top of the index will perform `HashAggregate` strategy on top of `MergeSort` nodes even if all required indexes are in place.
 This query would be optimal as well.
 
-## Pre-Aggregations Storage
+## Pre-aggregations storage
 
 The default pre-aggregations storage in Cube is its own purpose-built storage
 layer: Cube Store.
@@ -800,7 +838,7 @@ With all of the above set up, making a query such as the following will now use
 }
 ```
 
-## Pre-Aggregation Build Strategies
+## Pre-Aggregation build strategies
 
 <InfoBox>
 
@@ -953,3 +991,6 @@ streaming engine.
 [self-batching]: #batching
 [self-export-bucket]: #export-bucket
 [wiki-partitioning]: https://en.wikipedia.org/wiki/Partition_(database)
+[ref-ref-indexes]: /reference/data-model/pre-aggregations#indexes
+[ref-additivity]: /product/caching/getting-started-pre-aggregations#additivity
+[ref-ref-index-type]: /reference/data-model/pre-aggregations#type-1
@@ -35,7 +35,8 @@ In Cube Cloud, you can:
   API endpoints for the source code in the main branch, any other branch,
   or any user-specific [development mode][ref-dev-mode] branch.
 * Assign a [custom domain][ref-domains] to API endpoints of any deployment.
-* Review [performance insights][ref-performance] and fine-tune deployments for better [scalability][ref-scalability].
+* Review [performance insights][ref-performance], use the [deployment warm-up][ref-warmup],
+and fine-tune deployments for better [scalability][ref-scalability].
 * Set up account-wide [budgets][ref-budgets] to control resource consumption
   and use [auto-suspension][ref-auto-sus] to reduce resource consumption of
   non-production deployments.
@@ -51,6 +52,7 @@ In Cube Cloud, you can:
 [ref-cd]: /product/deployment/cloud/continuous-deployment
 [ref-dev-mode]: /product/workspace/dev-mode
 [ref-domains]: /product/deployment/cloud/custom-domains
+[ref-warmup]: /product/deployment/cloud/warm-up
 [ref-auto-sus]: /product/deployment/cloud/auto-suspension
 [ref-budgets]: /product/workspace/budgets
 [ref-performance]: /product/workspace/performance
 
@@ -3,6 +3,7 @@ module.exports = {
   "deployment-types": "Deployment types",
   "continuous-deployment": "Continuous deployment",
   "custom-domains": "Custom domains",
+  "warm-up": "Deployment warm-up",
   "auto-suspension": "Auto-suspension",
   "scalability": "Scalability",
   "pricing": "Pricing",