Skip to content

Commit 274674d

Browse files
authored
docs: Clarify index column ordering
1 parent 1134df4 commit 274674d

File tree

1 file changed

+13
-16
lines changed

1 file changed

+13
-16
lines changed

docs/content/Caching/Using-Pre-Aggregations.mdx

Lines changed: 13 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -242,15 +242,12 @@ the measure is low.
242242
The order in which columns are specified in the index is **very** important;
243243
suboptimal ordering can lead to diminished performance. To improve the
244244
performance of an index the main thing to consider is the order of the columns
245-
defined in it. The order of the columns should be defined in accordance with the
246-
[Index Selectivity](https://blog.toadworld.com/2018/09/05/unselective-indexes-selectivity).
247-
In a nutshell, index selectivity measures how many unique values a database has.
248-
An index is said to have a **higher** selectivity as the number of unique values
249-
goes up and a **lower** selectivity as the number of unique values goes down.
250-
Once we know the selectivity of the columns we'll be adding to the index, we
251-
need to place them from **lowest** to **highest** selectivity (this is probably
252-
contrary to what you've read about the topic in traditional row-based
253-
databases).
245+
defined in it.
246+
247+
The rule of thumb for index column order is:
248+
- Single value filters come first
249+
- `GROUP BY` columns come second
250+
- Everything else used in the query comes afterward
254251

255252
**Example:**
256253

@@ -388,7 +385,7 @@ cube('orders', {
388385

389386
indexes: {
390387
category_productname_zipcode_index: {
391-
columns: [product_category, product_name, zip_code],
388+
columns: [product_category, zip_code, product_name],
392389
},
393390
},
394391
},
@@ -409,8 +406,8 @@ cubes:
409406
- name: category_productname_zipcode_index
410407
columns:
411408
- product_category
412-
- product_name
413409
- zip_code
410+
- product_name
414411
```
415412
416413
</CodeTabs>
@@ -425,11 +422,11 @@ Then the data within `category_productname_zipcode_index` would look like:
425422
| Furniture | Plastic Chair | 88524 | 2023-01-01 11:00:00 | 3000 |
426423
| Electronics | Keyboard | 88524 | 2023-01-01 11:00:00 | 2000 |
427424

428-
The columns are ordered from **lowest** to **highest** selectivity. We can
429-
expect there to be a lower number of product categories, hence a lower number of
430-
unique records resulting in a lower selectivity. Although `zip_code` may have
431-
lower or higher selectivity, the dimensions used to filter must come first than
432-
the dimensions in the `GROUP BY` part of the query.
425+
`product_category` column comes first as it's a single value filter.
426+
Then `zip_code` as it's `GROUP BY` column.
427+
`product_name` comes last as it's a multiple value filter.
428+
429+
It might sound counter-intuitive to have `GROUP BY` columns before filter ones, however Cube Store always performs scans on sorted data, and if `GROUP BY` matches index ordering, merge sort-based algorithms are used for querying, which are usually much faster than hash-based group by in case of index ordering doesn't match the query. If in doubt, always use `EXPLAIN` and `EXPLAIN ANALYZE` in Cube Store to figure out the final query plan.
433430

434431
### Aggregated indexes
435432

0 commit comments

Comments
 (0)