Skip to content

Commit 7c5ab4c

Browse files
rchkvrpaikadnanrahicigorlukanin
authored
docs(recipes): Refreshing Select Partitions (#3498)
* init * update * update * add recipe * init data-updater script * use data updater script * move scripts to folder * update Dockerfile * update sql script * remove unnecessary schema * update recipe page and compose config * improve error log * update the recipe page * update the recipe list * update the recipe page * Apply suggestions from code review Co-authored-by: Ray Paik <[email protected]> Co-authored-by: Adnan Rahić <[email protected]> --------- Co-authored-by: Ray Paik <[email protected]> Co-authored-by: Adnan Rahić <[email protected]> Co-authored-by: Igor Lukanin <[email protected]>
1 parent 397b697 commit 7c5ab4c

File tree

14 files changed

+452
-0
lines changed

14 files changed

+452
-0
lines changed

docs/content/Examples-Tutorials-Recipes/Recipes.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ These recipes will show you the best practices of using Cube.js.
4747
- [Accelerating non-additive measures](/recipes/non-additivity)
4848
- [Using originalSql and rollup pre-aggregations effectively](/recipes/using-originalsql-and-rollups-effectively)
4949
- [Incrementally building pre-aggregations for a date range](/recipes/incrementally-building-pre-aggregations-for-a-date-range)
50+
- [Refreshing select partitions of a pre-aggregation](/recipes/refreshing-select-partitions)
5051

5152
### <--{"id" : "Recipes"}--> Code reusability
5253

Lines changed: 230 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,230 @@
1+
---
2+
title: Refreshing Select Partitions
3+
permalink: /recipes/refreshing-select-partitions
4+
category: Examples & Tutorials
5+
subCategory: Query acceleration
6+
menuOrder: 6
7+
---
8+
9+
## Use case
10+
11+
We have a dataset with orders and we want to aggregate data while having decent
12+
performance. Orders have a creation time, so we can use
13+
[partitioning](https://cube.dev/docs/caching/using-pre-aggregations#partitioning)
14+
by time to optimize pre-aggregations build and refresh time. The problem is that the
15+
order's status can change after a long period. In this case, we want to rebuild only
16+
partitions associated with this order.
17+
18+
In the recipe below, we'll learn how to use the
19+
[`refreshKey`](https://cube.dev/docs/schema/reference/pre-aggregations#parameters-refresh-key-sql)
20+
together with the
21+
[`FITER_PARAMS`](https://cube.dev/docs/schema/reference/cube#filter-params) for
22+
partition separately.
23+
24+
## Data schema
25+
26+
Let's explore the `Orders` cube data that contains various information about
27+
orders, including number and status:
28+
29+
| id | number | status | created_at | updated_at |
30+
| --- | ------ | ---------- | ------------------- | ------------------- |
31+
| 1 | 1 | processing | 2021-08-10 14:26:40 | 2021-08-10 14:26:40 |
32+
| 2 | 2 | completed | 2021-08-20 13:21:38 | 2021-08-22 13:10:38 |
33+
| 3 | 3 | shipped | 2021-09-01 10:27:38 | 2021-09-02 01:12:38 |
34+
| 4 | 4 | completed | 2021-09-20 10:27:38 | 2021-09-20 10:27:38 |
35+
36+
In our case, each order has `created_at` and `updated_at` properties. The
37+
`updated_at` property is the last order update timestamp. To create a
38+
pre-aggregation with partitions, we need to specify the
39+
[`partitionGranularity` property](https://cube.dev/docs/schema/reference/pre-aggregations#partition-granularity).
40+
Partitions will be split monthly by the `created_at` dimension.
41+
42+
```javascript
43+
preAggregations: {
44+
orders: {
45+
type: `rollup`,
46+
external: true,
47+
dimensions: [CUBE.number, CUBE.status, CUBE.createdAt, CUBE.updatedAt],
48+
timeDimension: CUBE.createdAt,
49+
granularity: `day`,
50+
partitionGranularity: `month`, // this is where we specify the partition
51+
refreshKey: {
52+
sql: `SELECT max(updated_at) FROM public.orders` // check for updates of the updated_at property
53+
},
54+
},
55+
},
56+
```
57+
58+
As you can see, we defined custom a
59+
[`refreshKey`](https://cube.dev/docs/schema/reference/pre-aggregations#parameters-refresh-key-sql)
60+
that will check for new values of the `updated_at` property. The refresh key is
61+
evaluated for each partition separately. For example, if we update orders
62+
from august and update their `updated_at` property, the current refresh key will
63+
update **for all partitions**. There is how it looks in the Cube logs:
64+
65+
```bash
66+
Executing SQL: 5b4c517f-b496-4c69-9503-f8cd2b4c73b6
67+
--
68+
SELECT max(updated_at) FROM public.orders
69+
--
70+
Performing query completed: 5b4c517f-b496-4c69-9503-f8cd2b4c73b6 (15ms)
71+
Performing query: 5b4c517f-b496-4c69-9503-f8cd2b4c73b6
72+
Performing query: 5b4c517f-b496-4c69-9503-f8cd2b4c73b6
73+
Executing SQL: 5b4c517f-b496-4c69-9503-f8cd2b4c73b6
74+
--
75+
select min(("orders".created_at::timestamptz AT TIME ZONE 'UTC')) from public.orders AS "orders"
76+
--
77+
Executing SQL: 5b4c517f-b496-4c69-9503-f8cd2b4c73b6
78+
--
79+
select max(("orders".created_at::timestamptz AT TIME ZONE 'UTC')) from public.orders AS "orders"
80+
--
81+
```
82+
83+
Note that the query for two partitions is the same. It's the reason why **all
84+
partitions** will be updated.
85+
86+
How do we fix this and update only the partition for august? We can use the
87+
[`FITER_PARAMS`](https://cube.dev/docs/schema/reference/cube#filter-params) for
88+
that!
89+
90+
Let's update our pre-aggregation definition:
91+
92+
```javascript
93+
preAggregations: {
94+
orders: {
95+
type: `rollup`,
96+
external: true,
97+
dimensions: [CUBE.number, CUBE.status, CUBE.createdAt, CUBE.updatedAt],
98+
timeDimension: CUBE.createdAt,
99+
granularity: `day`,
100+
partitionGranularity: `month`,
101+
refreshKey: {
102+
sql: `SELECT max(updated_at) FROM public.orders WHERE ${FILTER_PARAMS.Orders.createdAt.filter('created_at')}`
103+
},
104+
},
105+
},
106+
```
107+
108+
Cube will filter data by the `created_at` property and then apply the refresh key for the `updated_at` property.
109+
Here's how it looks in the Cube logs:
110+
111+
```bash
112+
Executing SQL: e1155b2f-859b-4e61-a760-17af891f5f0b
113+
--
114+
select min(("updated_orders".created_at::timestamptz AT TIME ZONE 'UTC')) from public.orders AS "updated_orders"
115+
--
116+
Executing SQL: e1155b2f-859b-4e61-a760-17af891f5f0b
117+
--
118+
select max(("updated_orders".created_at::timestamptz AT TIME ZONE 'UTC')) from public.orders AS "updated_orders"
119+
--
120+
Performing query completed: e1155b2f-859b-4e61-a760-17af891f5f0b (10ms)
121+
Performing query completed: e1155b2f-859b-4e61-a760-17af891f5f0b (13ms)
122+
Performing query: e1155b2f-859b-4e61-a760-17af891f5f0b
123+
Performing query: e1155b2f-859b-4e61-a760-17af891f5f0b
124+
Executing SQL: e1155b2f-859b-4e61-a760-17af891f5f0b
125+
--
126+
SELECT max(updated_at) FROM public.orders WHERE created_at >= '2021-08-01T00:00:00.000Z'::timestamptz AND created_at <= '2021-08-31T23:59:59.999Z'::timestamptz
127+
--
128+
Executing SQL: e1155b2f-859b-4e61-a760-17af891f5f0b
129+
--
130+
SELECT max(updated_at) FROM public.orders WHERE created_at >= '2021-09-01T00:00:00.000Z'::timestamptz AND created_at <= '2021-09-30T23:59:59.999Z'::timestamptz
131+
```
132+
133+
Note that Cube checks the refresh key value using a date range over the
134+
`created_at` property. With this refresh key, only one partition will be updated.
135+
136+
## Result
137+
138+
We have received orders from two partitions of a pre-aggregation and only one of
139+
them has been updated when an order changed its status:
140+
141+
```javascript
142+
// Orders before update:
143+
[
144+
{
145+
"Orders.number": "1",
146+
"Orders.status": "processing",
147+
"Orders.createdAt": "2021-08-10T14:26:40.000",
148+
"Orders.updatedAt": "2021-08-10T14:26:40.000"
149+
},
150+
{
151+
"Orders.number": "2",
152+
"Orders.status": "completed",
153+
"Orders.createdAt": "2021-08-20T13:21:38.000",
154+
"Orders.updatedAt": "2021-08-20T13:21:38.000"
155+
},
156+
{
157+
"Orders.number": "3",
158+
"Orders.status": "shipped",
159+
"Orders.createdAt": "2021-09-01T10:27:38.000",
160+
"Orders.updatedAt": "2021-09-01T10:27:38.000"
161+
},
162+
{
163+
"Orders.number": "4",
164+
"Orders.status": "completed",
165+
"Orders.createdAt": "2021-09-20T10:27:38.000",
166+
"Orders.updatedAt": "2021-09-20T10:27:38.000"
167+
}
168+
]
169+
// Pre-aggregations for orders before update:
170+
{
171+
"dev_pre_aggregations.orders__orders": {
172+
"targetTableName": "(
173+
SELECT * FROM dev_pre_aggregations.orders__orders20210801_qgajzwit_mdtjpixm_1glan84 UNION ALL
174+
SELECT * FROM dev_pre_aggregations.orders__orders20210901_bvzl43q1_py2oudte_1glan84)",
175+
"refreshKeyValues": [
176+
{},
177+
{}
178+
]
179+
}
180+
}
181+
```
182+
183+
```javascript
184+
// Orders after update:
185+
[
186+
{
187+
"Orders.number": "1",
188+
"Orders.status": "shipped",
189+
"Orders.createdAt": "2021-08-10T14:26:40.000",
190+
"Orders.updatedAt": "2021-09-30T06:45:28.000"
191+
},
192+
{
193+
"Orders.number": "2",
194+
"Orders.status": "completed",
195+
"Orders.createdAt": "2021-08-20T13:21:38.000",
196+
"Orders.updatedAt": "2021-08-20T13:21:38.000"
197+
},
198+
{
199+
"Orders.number": "3",
200+
"Orders.status": "shipped",
201+
"Orders.createdAt": "2021-09-01T10:27:38.000",
202+
"Orders.updatedAt": "2021-09-01T10:27:38.000"
203+
},
204+
{
205+
"Orders.number": "4",
206+
"Orders.status": "completed",
207+
"Orders.createdAt": "2021-09-20T10:27:38.000",
208+
"Orders.updatedAt": "2021-09-20T10:27:38.000"
209+
}
210+
]
211+
// Pre-aggregations for orders after update:
212+
{
213+
"dev_pre_aggregations.orders__orders": {
214+
"targetTableName": "(
215+
SELECT * FROM dev_pre_aggregations.orders__orders20210801_lx4b2bkg_mdtjpixm_1glana3 UNION ALL
216+
SELECT * FROM dev_pre_aggregations.orders__orders20210901_bvzl43q1_py2oudte_1glan84)",
217+
"refreshKeyValues": [
218+
{},
219+
{}
220+
]
221+
}
222+
}
223+
```
224+
225+
## Source code
226+
227+
Please feel free to check out the
228+
[full source code](https://github.com/cube-js/cube.js/tree/master/examples/recipes/refreshing-select-partitions)
229+
or run it with the `docker-compose up` command. You'll see the result, including
230+
queried data, in the console.

examples/recipes/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ These recipes will show you the best practices of using Cube.js.
4040

4141
- [Accelerating Non-Additive Measures](https://cube.dev/docs/recipes/non-additivity)
4242
- [Joining Data from Multiple Data Sources](https://cube.dev/docs/recipes/joining-multiple-data-sources)
43+
- [Refreshing Select Partitions](https://cube.dev/docs/recipes/refreshing-select-partitions)
4344

4445
### Code reusability
4546

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
CUBEJS_DB_HOST=postgres
2+
CUBEJS_DB_PORT=5432
3+
CUBEJS_DB_NAME=localDB
4+
CUBEJS_DB_USER=postgres
5+
CUBEJS_DB_PASS=example
6+
CUBEJS_DB_TYPE=postgres
7+
CUBEJS_API_SECRET=SECRET
8+
CUBEJS_DEV_MODE=true
9+
CUBEJS_CUBESTORE_HOST=cubestore
10+
CUBEJS_EXTERNAL_DEFAULT=true
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
FROM node:14-alpine
2+
3+
COPY . .
4+
RUN apk --no-cache add curl \
5+
&& npm install
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
module.exports = {};
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
const { Pool } = require('pg');
2+
3+
const pool = new Pool({
4+
host: `postgres`,
5+
port: 5432,
6+
user: `postgres`,
7+
password: `example`,
8+
database: `localDB`,
9+
});
10+
11+
const updatestatusQuery = `
12+
UPDATE
13+
orders
14+
SET
15+
status = (array ['shipped', 'processing', 'completed']) [floor(random() * 3 + 1)],
16+
updated_at = NOW()
17+
WHERE
18+
id = 1;
19+
`;
20+
21+
pool.query(updatestatusQuery, (err) => {
22+
if (err) {
23+
console.log(err);
24+
} else {
25+
console.log('Order successfully updated');
26+
}
27+
28+
pool.end();
29+
});
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
#!/bin/bash
2+
3+
host=cube
4+
port=4000
5+
readyzUrl=readyz
6+
7+
# Wait for the Cube API to become ready
8+
until curl -s "$host":"$port"/"$readyzUrl" > /dev/null; do
9+
sleep 1
10+
done
11+
12+
sleep 5
13+
14+
node node/data-updater/data-updater.js
Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
-- This script only contains the table creation statements and does not fully represent the table in the database. It's still missing: indices, triggers. Do not use it as a backup.
2+
DROP TABLE IF EXISTS "public"."orders";
3+
4+
-- Sequence and defined type
5+
CREATE SEQUENCE IF NOT EXISTS orders_id_seq;
6+
7+
-- Table Definition
8+
CREATE TABLE "public"."orders" (
9+
"id" int4 NOT NULL DEFAULT nextval('orders_id_seq'::regclass),
10+
"number" text,
11+
"status" text,
12+
"created_at" timestamp NOT NULL DEFAULT now(),
13+
"updated_at" timestamp NOT NULL DEFAULT now(),
14+
PRIMARY KEY ("id")
15+
);
16+
17+
INSERT INTO "public"."orders" ("id", "number", "status", "created_at", "updated_at") VALUES
18+
(1, '1', 'processing', '2021-08-10 14:26:40.387848', '2021-08-10 14:26:40.387848'),
19+
(2, '2', 'completed', '2021-08-20 13:21:38.773825', '2021-08-20 13:21:38.773825'),
20+
(3, '3', 'shipped', '2021-09-01 10:27:38.773825', '2021-09-01 10:27:38.773825'),
21+
(4, '4', 'completed', '2021-09-20 10:27:38.773825', '2021-09-20 10:27:38.773825');
Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
version: '2.2'
2+
3+
services:
4+
cubestore:
5+
image: cubejs/cubestore:arm64-experimental-v2021-07-29
6+
environment:
7+
- CUBESTORE_REMOTE_DIR=/cube/data
8+
volumes:
9+
- .cubestore:/cube/data
10+
11+
cube:
12+
image: cubejs/cube:latest
13+
ports:
14+
- 4000:4000
15+
- 3000:3000
16+
env_file: .env
17+
volumes:
18+
- .:/cube/conf
19+
depends_on:
20+
- cubestore
21+
links:
22+
- cubestore
23+
24+
postgres:
25+
image: postgres
26+
restart: always
27+
ports:
28+
- 5432:5432
29+
environment:
30+
POSTGRES_PASSWORD: example
31+
POSTGRES_DB: localDB
32+
volumes:
33+
- ./db-scripts:/docker-entrypoint-initdb.d
34+
35+
query:
36+
image: cfmanteiga/alpine-bash-curl-jq
37+
depends_on:
38+
- cube
39+
volumes:
40+
- .:/query
41+
entrypoint: ["sh", "query/queries/run.sh"]
42+
43+
node:
44+
build:
45+
context: .
46+
dockerfile: Dockerfile
47+
volumes:
48+
- .:/node
49+
entrypoint: ["sh", "node/data-updater/data-updater.sh"]

0 commit comments

Comments
 (0)