|
| 1 | +# Calculating filtered aggregates |
| 2 | + |
| 3 | +## Use case |
| 4 | + |
| 5 | +Sometimes, there's a need to calculate an aggregation over facts in a joined cube that |
| 6 | +is filtered by a dimension from another cube. For example, you might want to calculate |
| 7 | +the total sales for a retailer. Each retailer has multiple stores, and each store does |
| 8 | +its own sales. |
| 9 | + |
| 10 | +If you set sales goals at the retailer level and are only interested in |
| 11 | +the sales that happened after a certain date, you would need to calculate the total sales |
| 12 | +in a way that only includes sales that happened after that date. |
| 13 | + |
| 14 | +## Data modeling |
| 15 | + |
| 16 | +We can model this scenario by creating a cube for each entity: `retailer`, `store`, |
| 17 | +and `sales`. The `retailer` cube has a one-to-many relationship with the `store` cube, |
| 18 | +and the `store` cube has a one-to-many relationship with the `sales` cube: |
| 19 | + |
| 20 | +```yml |
| 21 | +cubes: |
| 22 | + - name: retailer |
| 23 | + sql: > |
| 24 | + SELECT 101 AS id, 'Retailer 1' AS name, 10 AS sales_goal, '2025-02-01Z'::TIMESTAMP AS goal_start UNION ALL |
| 25 | + SELECT 102 AS id, 'Retailer 2' AS name, 10 AS sales_goal, '2025-02-01Z'::TIMESTAMP AS goal_start UNION ALL |
| 26 | + SELECT 103 AS id, 'Retailer 3' AS name, 10 AS sales_goal, '2025-02-01Z'::TIMESTAMP AS goal_start |
| 27 | +
|
| 28 | + joins: |
| 29 | + - name: store |
| 30 | + sql: '{CUBE.id} = {store.retailer_id}' |
| 31 | + relationship: one_to_many |
| 32 | + |
| 33 | + dimensions: |
| 34 | + - name: id |
| 35 | + sql: "{CUBE}.id" |
| 36 | + type: number |
| 37 | + primary_key: true |
| 38 | + |
| 39 | + - name: name |
| 40 | + sql: "{CUBE}.name" |
| 41 | + type: string |
| 42 | + |
| 43 | + - name: goal_start |
| 44 | + sql: "{CUBE}.goal_start" |
| 45 | + type: time |
| 46 | + |
| 47 | + - name: sales |
| 48 | + sql: "{store.total_sales}" |
| 49 | + type: number |
| 50 | + sub_query: true |
| 51 | + |
| 52 | + - name: sales_for_goal |
| 53 | + sql: "{store.total_sales_for_goal}" |
| 54 | + type: number |
| 55 | + sub_query: true |
| 56 | + |
| 57 | + measures: |
| 58 | + - name: sales_goal |
| 59 | + sql: "{CUBE}.sales_goal" |
| 60 | + type: sum |
| 61 | + |
| 62 | + - name: sales_goal_achieved |
| 63 | + type: number |
| 64 | + sql: "({CUBE.sales_for_goal} / NULLIF({CUBE.sales_goal}, 0))" |
| 65 | + |
| 66 | + - name: store |
| 67 | + sql: > |
| 68 | + SELECT 201 AS id, 'Store 1' AS name, 101 AS retailer_id UNION ALL |
| 69 | + SELECT 202 AS id, 'Store 2' AS name, 101 AS retailer_id UNION ALL |
| 70 | + SELECT 203 AS id, 'Store 3' AS name, 101 AS retailer_id UNION ALL |
| 71 | + SELECT 204 AS id, 'Store 4' AS name, 102 AS retailer_id UNION ALL |
| 72 | + SELECT 205 AS id, 'Store 5' AS name, 102 AS retailer_id UNION ALL |
| 73 | + SELECT 206 AS id, 'Store 6' AS name, 102 AS retailer_id UNION ALL |
| 74 | + SELECT 207 AS id, 'Store 7' AS name, 103 AS retailer_id UNION ALL |
| 75 | + SELECT 208 AS id, 'Store 8' AS name, 103 AS retailer_id UNION ALL |
| 76 | + SELECT 209 AS id, 'Store 9' AS name, 103 AS retailer_id |
| 77 | +
|
| 78 | + joins: |
| 79 | + - name: sales |
| 80 | + sql: '{CUBE.id} = {sales.store_id}' |
| 81 | + relationship: one_to_many |
| 82 | + |
| 83 | + dimensions: |
| 84 | + - name: id |
| 85 | + sql: "{CUBE}.id" |
| 86 | + type: number |
| 87 | + primary_key: true |
| 88 | + |
| 89 | + - name: name |
| 90 | + sql: "{CUBE}.name" |
| 91 | + type: string |
| 92 | + |
| 93 | + - name: retailer_id |
| 94 | + sql: "{CUBE}.retailer_id" |
| 95 | + type: number |
| 96 | + |
| 97 | + - name: goal_start |
| 98 | + sql: "{retailer.goal_start}" |
| 99 | + type: time |
| 100 | + |
| 101 | + - name: sales |
| 102 | + sql: "{sales.total_sales}" |
| 103 | + type: number |
| 104 | + sub_query: true |
| 105 | + |
| 106 | + - name: sales_for_goal |
| 107 | + sql: "{sales.total_sales_for_goal}" |
| 108 | + type: number |
| 109 | + sub_query: true |
| 110 | + |
| 111 | + measures: |
| 112 | + - name: total_sales |
| 113 | + sql: "{CUBE.sales}" |
| 114 | + type: sum |
| 115 | + |
| 116 | + - name: total_sales_for_goal |
| 117 | + sql: "{CUBE.sales_for_goal}" |
| 118 | + type: sum |
| 119 | + |
| 120 | + - name: sales |
| 121 | + sql: > |
| 122 | + SELECT 301 AS id, 201 AS store_id, '2025-01-01Z'::TIMESTAMP AS order_date, 1 AS sales UNION ALL |
| 123 | + SELECT 302 AS id, 202 AS store_id, '2025-01-01Z'::TIMESTAMP AS order_date, 1 AS sales UNION ALL |
| 124 | + SELECT 303 AS id, 203 AS store_id, '2025-01-01Z'::TIMESTAMP AS order_date, 1 AS sales UNION ALL |
| 125 | + SELECT 304 AS id, 204 AS store_id, '2025-01-01Z'::TIMESTAMP AS order_date, 3 AS sales UNION ALL |
| 126 | + SELECT 305 AS id, 205 AS store_id, '2025-01-01Z'::TIMESTAMP AS order_date, 3 AS sales UNION ALL |
| 127 | + SELECT 306 AS id, 206 AS store_id, '2025-01-01Z'::TIMESTAMP AS order_date, 3 AS sales UNION ALL |
| 128 | + SELECT 307 AS id, 207 AS store_id, '2025-01-01Z'::TIMESTAMP AS order_date, 5 AS sales UNION ALL |
| 129 | + SELECT 308 AS id, 208 AS store_id, '2025-01-01Z'::TIMESTAMP AS order_date, 5 AS sales UNION ALL |
| 130 | + SELECT 309 AS id, 209 AS store_id, '2025-01-01Z'::TIMESTAMP AS order_date, 5 AS sales UNION ALL |
| 131 | + SELECT 310 AS id, 201 AS store_id, '2025-02-01Z'::TIMESTAMP AS order_date, 1 AS sales UNION ALL |
| 132 | + SELECT 311 AS id, 202 AS store_id, '2025-02-01Z'::TIMESTAMP AS order_date, 1 AS sales UNION ALL |
| 133 | + SELECT 312 AS id, 203 AS store_id, '2025-02-01Z'::TIMESTAMP AS order_date, 1 AS sales UNION ALL |
| 134 | + SELECT 313 AS id, 204 AS store_id, '2025-02-01Z'::TIMESTAMP AS order_date, 3 AS sales UNION ALL |
| 135 | + SELECT 314 AS id, 205 AS store_id, '2025-02-01Z'::TIMESTAMP AS order_date, 3 AS sales UNION ALL |
| 136 | + SELECT 315 AS id, 206 AS store_id, '2025-02-01Z'::TIMESTAMP AS order_date, 3 AS sales UNION ALL |
| 137 | + SELECT 316 AS id, 207 AS store_id, '2025-02-01Z'::TIMESTAMP AS order_date, 5 AS sales UNION ALL |
| 138 | + SELECT 317 AS id, 208 AS store_id, '2025-02-01Z'::TIMESTAMP AS order_date, 5 AS sales UNION ALL |
| 139 | + SELECT 318 AS id, 209 AS store_id, '2025-02-01Z'::TIMESTAMP AS order_date, 5 AS sales |
| 140 | +
|
| 141 | + dimensions: |
| 142 | + - name: id |
| 143 | + sql: "{CUBE}.id" |
| 144 | + type: number |
| 145 | + primary_key: true |
| 146 | + |
| 147 | + - name: store_id |
| 148 | + sql: "{CUBE}.store_id" |
| 149 | + type: number |
| 150 | + |
| 151 | + - name: order_date |
| 152 | + sql: "{CUBE}.order_date" |
| 153 | + type: time |
| 154 | + |
| 155 | + - name: goal_start |
| 156 | + sql: "{store.goal_start}" |
| 157 | + type: time |
| 158 | + |
| 159 | + - name: sales |
| 160 | + sql: "{CUBE}.sales" |
| 161 | + type: number |
| 162 | + |
| 163 | + measures: |
| 164 | + - name: total_sales |
| 165 | + sql: "{CUBE.sales}" |
| 166 | + type: sum |
| 167 | + |
| 168 | + - name: total_sales_for_goal |
| 169 | + sql: "{CUBE.sales}" |
| 170 | + type: sum |
| 171 | + filters: |
| 172 | + - sql: "{CUBE.order_date} >= {CUBE.goal_start}" |
| 173 | +``` |
| 174 | +
|
| 175 | +The total sales for a store and total sales for a retailer are calculated via [subquery |
| 176 | +dimensions][ref-subquery-dimension]. If you look at the join path (`retailer.store.sales`), |
| 177 | +you would see the _upstream flow_ of data: the aggregation over sales is passed from the |
| 178 | +`sales` cube to the `store` cube and then to the `retailer` cube. |
| 179 | + |
| 180 | +At the same time, the `goal_start` date is passed from the `retailer` cube to the `store` |
| 181 | +cube and then to the `sales` cube, creating the _downstream flow_ of data. This way, the |
| 182 | +`total_sales_for_goal` measure in the `sales` cube can be filtered by the `goal_start` date. |
| 183 | + |
| 184 | +This pattern of passing measures (aggregates) _upstream_ by the join path and passing |
| 185 | +dimensions _downstream_ is an effective way to solve many data modeling tasks, including |
| 186 | +calculating filtered aggregates. |
| 187 | + |
| 188 | + |
| 189 | +## Result |
| 190 | + |
| 191 | +Querying the `retailer` cube will return the total sales for each retailer and the total |
| 192 | +sales for each retailer after the `goal_start` date. The `sales_goal_achieved` measure |
| 193 | +will show the ratio of the sales goal that has been achieved: |
| 194 | + |
| 195 | +<Screenshot src="https://ucarecdn.com/80b74f38-a1ef-4b94-a3ff-146f06dc539f/"/> |
| 196 | + |
| 197 | + |
| 198 | +[ref-subquery-dimension]: /product/data-modeling/concepts/calculated-members#subquery-dimensions |
0 commit comments