You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/docs-new/pages/product/caching/using-pre-aggregations.mdx
+24-13Lines changed: 24 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -228,15 +228,22 @@ cube(`orders`, {
228
228
229
229
### When to use indexes?
230
230
231
-
Indexes are great when you filter large amounts of data across one or several
232
-
dimension columns. You can read more about them
233
-
[here][ref-schema-ref-preaggs-index].
231
+
When you define pre-aggregation without any indexes, the default index will be created.
232
+
For the default index, dimensions come first, time dimensions come second, and measures come last.
233
+
At query time, if the default index can't be selected for merge sort scan, then hash aggregation would be used.
234
+
It usually means that the full table needs to be scanned to get query results.
235
+
And it's usually no big deal if the pre-aggregation table is only several MB in size.
236
+
Once you go over, indexes are usually required to achieve optimal performance.
237
+
Especially if not all columns from pre-aggregation are used in a particular query.
238
+
You can read more about indexes [here][ref-schema-ref-preaggs-index].
234
239
235
240
### Best Practices
236
241
237
242
To maximize performance, you can introduce an index per type of query so the set
238
243
of dimensions used in the query overlap as much as possible with the ones
239
-
defined in the index. Measures are traditionally only used in indexes if you
244
+
defined in the index.
245
+
As indexes are sorted copies of the data, you don't incur any additional costs on the data warehouse side, however, you multiply your build time for a given pre-aggregation with every index added.
246
+
Measures are traditionally only used in indexes if you
240
247
plan to filter a measured value and the cardinality of the possible values of
241
248
the measure is low.
242
249
@@ -245,6 +252,10 @@ suboptimal ordering can lead to diminished performance. To improve the
245
252
performance of an index the main thing to consider is the order of the columns
246
253
defined in it.
247
254
255
+
The key property of additive rollups is that for most queries, there's at least one index that makes a particular query scan very little amount of data which makes it very fast.
256
+
There however exceptions to this rule like TopK queries, use of low selectivity range filters without high selectivity single value filters, etc.
257
+
Optimization of those use cases usually should be handled by remodeling data and queries.
0 commit comments