Skip to content

Intermittently incorrect values when using calculated measures on pre-aggregated leaf measuresΒ #9186

@Kelvin4702

Description

@Kelvin4702

Describe the bug
Querying the pre-aggregate using the REST API will intermittently generate incorrect results.
Upon inspecting the logs, it appears that:

  • The generated SQL statement is correct
  • However, the cubestore query planner forms incorrect aliases, causing measures to be swapped around

This behavior is not consistent -- similar REST API queries that should return the same result will occasionally be correct.

To Reproduce
Using the following REST API query:

{
  "query": {
    "measures": [
      "sales.this_year_sales",
      "sales.this_year_units",
      "sales.yoy_sales_chg",
      "sales.yoy_units_chg"
    ],
    "dimensions": [
      "sales.product_type"
    ],
    "filters": [
      {
        "member": "sales.product_type",
        "operator": "equals",
        "values": ["PET_FOOD"]
      },
      {
        "member": "sales.date",
        "operator": "inDateRange",
        "values": ["2025-01-18","2025-01-24"]
      }
    ],
    "limit": 5,
    "order": {
      "sales.this_year_sales": "desc"
    },
    "offset": 0
  }
}
  1. Execute the query - observe that results may or may not be correct
  2. Modify the limit parameter value, which should have no effect on the query results
  3. Execute the modified query - observe that results may or may not be correct
  4. Repeat steps 2 and 3 as necessary until incorrect results are returned (personal observations yield a ~50% success rate)

In the screenshots section below, observe that the (implicitly added) last_year_sales and last_year_units values are swapped, causing the yoy_sales_chg and yoy_units_chg to return incorrect values.

For example, lets say that the values stored in the pre-aggregate are as follows:

date_day product_type this_year_sales last_year_sales this_year_units last_year_units
2025-01-21 PET_FOOD 1000 900 10 9

In this example, the expected values are:

  • yoy_sales_chg = 1000 - 900 = 100
  • yoy_units_chg = 10 - 9 = 1

Instead, the following is returned:

  • yoy_sales_chg = 1000 - 9 = 991
  • yoy_units_chg = 10 - 900 = -890

Expected behavior
Query results should be consistent and correct.

Screenshots
The generated SQL query in the logs looks correct (formatted for readability):

SELECT
    `sales__product_type` `sales__product_type`,
    sum(`sales__this_year_sales`) `sales__this_year_sales`,
    sum(`sales__this_year_units`) `sales__this_year_units`,
    (sum(`sales__this_year_sales`) - sum(`sales__last_year_sales`)) `sales__yoy_sales_chg`,
    (sum(`sales__this_year_units`) - sum(`sales__last_year_units`)) `sales__yoy_units_chg`
FROM cube_pre_aggregations.sales_sales_rollup_20250101_mwnaosch_js50fjs4_1jq1ihq AS `sales__sales_rollup`
WHERE (`sales__product_type` = ?)
  AND (`sales__date_day` >= to_timestamp(?) AND `sales__date_day` <= to_timestamp(?))
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5

However, the following log was observed (formatted for readability) which appears to internally swap the last_year_sales and last_year_units leaf measures. I suspect this is leading to incorrectly calculated yoy_sales_chg and yoy_units_chg values:

2025-01-31T19:28:29.313Z WARN  [cubestore::queryplanner::query_executor] <pid:41 sel1> Slow Partition Query (262.100554ms):
Projection, [
  sales__product_type,
  sales__this_year_sales,
  sales__this_year_units,
  sales__yoy_sales_chg,
  sales__yoy_units_chg
]
  Projection, [
    sales__sales_rollup.sales__product_type:sales__product_type,
    sales__this_year_sales,
    sales__this_year_units,
    SUM(sales__sales_rollup.sales__last_year_sales):SUM(sales__sales_rollup.sales__last_year_units),
    SUM(sales__sales_rollup.sales__last_year_units):SUM(sales__sales_rollup.sales__last_year_sales)
  ]
    ClusterAggregateTopK, limit: 5
      Filter
        Scan sales__sales_rollup, 
          source: CubeTable(index: default:3:[5, 6, 41]),
          fields: [
            sales__product_type, 
            sales__date_day,
            sales__last_year_sales,
            sales__last_year_units,
            sales__this_year_sales,
            sales__this_year_units
          ]

Minimally reproducible Cube Schema

cubes:
  - name: sales
    sql_table: "{{ env_var('CUBEJS_DB_BQ_TABLE_NAME') | safe }}"
    data_source: default
    refresh_key:
      every: 30 minutes

    pre_aggregations:
      - name: sales_rollup
        external: true
        dimensions:
          - date
          - product_type
        measures:
          - this_year_sales
          - last_year_sales
          - this_year_units
          - last_year_units
        time_dimension: date
        granularity: day
        partition_granularity: month
        refresh_key:
          every: 30 minutes
        build_range_start:
          sql: select timestamp('2025-01-17')
        build_range_end:
          sql: select timestamp('2025-01-25')

    dimensions:
      - name: date
        sql: TIMESTAMP(date)
        type: time

      - name: product_type
        sql: product_type
        type: string

      - name: product_name
        sql: product_name
        type: string

      - name: product_id
        sql: product_id
        type: string

    measures:
      - name: this_year_sales
        sql: this_year_sales
        type: sum

      - name: last_year_sales
        sql: last_year_sales
        type: sum

      - name: this_year_units
        sql: this_year_units
        type: sum

      - name: last_year_units
        sql: last_year_units
        type: sum

      - name: yoy_sales_chg
        sql: "{this_year_sales} - {last_year_sales}"
        type: number

      - name: yoy_units_chg
        sql: "{this_year_units} - {last_year_units}"
        type: number

Version:
Behavior was observed using Docker images with tags between v1.1.8 and v1.3.20:

  • cubejs/cube:v1.3.20
  • cubejs/cubestore:v1.3.20

This is not reproducible using cubejs/cubestore:v1.1.7, and appears to have been introduced in v1.1.8.

Additional context

  • Running in production mode (1 api, 1 refresh worker, 1 router, 2 workers)
  • Raw data is stored Google BigQuery
  • Pre-aggregations are stored in GCS using export bucket strategy

_edit: retested using latest (1.2.20) on 2025-03-10 and issue is still present
_edit: retested using latest (1.3.8) on 2025-04-25 and issue is still present
_edit: retested using latest (1.3.20) on 2025-06-09 and issue is still present

Metadata

Metadata

Assignees

Labels

cube storeIssues relating to Cube Store

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions