-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
Querying the pre-aggregate using the REST API will intermittently generate incorrect results.
Upon inspecting the logs, it appears that:
- The generated SQL statement is correct
- However, the cubestore query planner forms incorrect aliases, causing measures to be swapped around
This behavior is not consistent -- similar REST API queries that should return the same result will occasionally be correct.
To Reproduce
Using the following REST API query:
{
"query": {
"measures": [
"sales.this_year_sales",
"sales.this_year_units",
"sales.yoy_sales_chg",
"sales.yoy_units_chg"
],
"dimensions": [
"sales.product_type"
],
"filters": [
{
"member": "sales.product_type",
"operator": "equals",
"values": ["PET_FOOD"]
},
{
"member": "sales.date",
"operator": "inDateRange",
"values": ["2025-01-18","2025-01-24"]
}
],
"limit": 5,
"order": {
"sales.this_year_sales": "desc"
},
"offset": 0
}
}
- Execute the query - observe that results may or may not be correct
- Modify the
limitparameter value, which should have no effect on the query results - Execute the modified query - observe that results may or may not be correct
- Repeat steps 2 and 3 as necessary until incorrect results are returned (personal observations yield a ~50% success rate)
In the screenshots section below, observe that the (implicitly added) last_year_sales and last_year_units values are swapped, causing the yoy_sales_chg and yoy_units_chg to return incorrect values.
For example, lets say that the values stored in the pre-aggregate are as follows:
| date_day | product_type | this_year_sales | last_year_sales | this_year_units | last_year_units |
|---|---|---|---|---|---|
| 2025-01-21 | PET_FOOD | 1000 | 900 | 10 | 9 |
In this example, the expected values are:
- yoy_sales_chg = 1000 - 900 = 100
- yoy_units_chg = 10 - 9 = 1
Instead, the following is returned:
- yoy_sales_chg = 1000 - 9 = 991
- yoy_units_chg = 10 - 900 = -890
Expected behavior
Query results should be consistent and correct.
Screenshots
The generated SQL query in the logs looks correct (formatted for readability):
SELECT
`sales__product_type` `sales__product_type`,
sum(`sales__this_year_sales`) `sales__this_year_sales`,
sum(`sales__this_year_units`) `sales__this_year_units`,
(sum(`sales__this_year_sales`) - sum(`sales__last_year_sales`)) `sales__yoy_sales_chg`,
(sum(`sales__this_year_units`) - sum(`sales__last_year_units`)) `sales__yoy_units_chg`
FROM cube_pre_aggregations.sales_sales_rollup_20250101_mwnaosch_js50fjs4_1jq1ihq AS `sales__sales_rollup`
WHERE (`sales__product_type` = ?)
AND (`sales__date_day` >= to_timestamp(?) AND `sales__date_day` <= to_timestamp(?))
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5
However, the following log was observed (formatted for readability) which appears to internally swap the last_year_sales and last_year_units leaf measures. I suspect this is leading to incorrectly calculated yoy_sales_chg and yoy_units_chg values:
2025-01-31T19:28:29.313Z WARN [cubestore::queryplanner::query_executor] <pid:41 sel1> Slow Partition Query (262.100554ms):
Projection, [
sales__product_type,
sales__this_year_sales,
sales__this_year_units,
sales__yoy_sales_chg,
sales__yoy_units_chg
]
Projection, [
sales__sales_rollup.sales__product_type:sales__product_type,
sales__this_year_sales,
sales__this_year_units,
SUM(sales__sales_rollup.sales__last_year_sales):SUM(sales__sales_rollup.sales__last_year_units),
SUM(sales__sales_rollup.sales__last_year_units):SUM(sales__sales_rollup.sales__last_year_sales)
]
ClusterAggregateTopK, limit: 5
Filter
Scan sales__sales_rollup,
source: CubeTable(index: default:3:[5, 6, 41]),
fields: [
sales__product_type,
sales__date_day,
sales__last_year_sales,
sales__last_year_units,
sales__this_year_sales,
sales__this_year_units
]
Minimally reproducible Cube Schema
cubes:
- name: sales
sql_table: "{{ env_var('CUBEJS_DB_BQ_TABLE_NAME') | safe }}"
data_source: default
refresh_key:
every: 30 minutes
pre_aggregations:
- name: sales_rollup
external: true
dimensions:
- date
- product_type
measures:
- this_year_sales
- last_year_sales
- this_year_units
- last_year_units
time_dimension: date
granularity: day
partition_granularity: month
refresh_key:
every: 30 minutes
build_range_start:
sql: select timestamp('2025-01-17')
build_range_end:
sql: select timestamp('2025-01-25')
dimensions:
- name: date
sql: TIMESTAMP(date)
type: time
- name: product_type
sql: product_type
type: string
- name: product_name
sql: product_name
type: string
- name: product_id
sql: product_id
type: string
measures:
- name: this_year_sales
sql: this_year_sales
type: sum
- name: last_year_sales
sql: last_year_sales
type: sum
- name: this_year_units
sql: this_year_units
type: sum
- name: last_year_units
sql: last_year_units
type: sum
- name: yoy_sales_chg
sql: "{this_year_sales} - {last_year_sales}"
type: number
- name: yoy_units_chg
sql: "{this_year_units} - {last_year_units}"
type: numberVersion:
Behavior was observed using Docker images with tags between v1.1.8 and v1.3.20:
- cubejs/cube:v1.3.20
- cubejs/cubestore:v1.3.20
This is not reproducible using cubejs/cubestore:v1.1.7, and appears to have been introduced in v1.1.8.
Additional context
- Running in production mode (1 api, 1 refresh worker, 1 router, 2 workers)
- Raw data is stored Google BigQuery
- Pre-aggregations are stored in GCS using export bucket strategy
_edit: retested using latest (1.2.20) on 2025-03-10 and issue is still present
_edit: retested using latest (1.3.8) on 2025-04-25 and issue is still present
_edit: retested using latest (1.3.20) on 2025-06-09 and issue is still present