Skip to content

Minio as persistence fails during pre-aggregationΒ #9206

@LeftoversTodayAppAdmin

Description

@LeftoversTodayAppAdmin

Describe the bug
A clear and concise description of what the bug is.

When using Minio for storage for pre-aggregation, I am able to see Cube write files to the temp-uploads folder in the Minio bucket but then it fails with the following error from this line of code, and Cube generates lots of copies of the same file in temp-uploads

Line of code emitting the error:

format!("File {} can't be listed after upload. Either there's Cube Store cluster misconfiguration, or storage can't provide the required consistency.", remote_path),

Error: Error during upload of dev_pre_aggregations.order_main20240928_ookkyqrs_dp11nujr_1jqgh50-0.csv.gz create table: CREATE TABLE dev_pre_aggregations.order_main20240928_ookkyqrs_dp11nujr_1jqgh50 (order__customfieldsvendoridentifiervarchar(255),order__updatedat_daytimestamp,order__countint) WITH (build_range_end = '2024-09-28T23:59:59.999'): Internal: File temp-uploads/dev_pre_aggregations.order_main20240928_ookkyqrs_dp11nujr_1jqgh50-0.csv.gz can't be listed after upload. Either there's Cube Store cluster misconfiguration, or storage can't provide the required consistency.

Minio integration was added here: #3738
cc: @PieterVanZyl-Dev @paveltiunov

To Reproduce
Steps to reproduce the behavior:

  1. Use the following config in docker:
  cubestore_router:
    restart: always
    image: cubejs/cubestore:v1.2.3-non-avx
    environment:
      - CUBESTORE_WORKERS=cubestore_worker_1:10001,cubestore_worker_2:10002
      - CUBESTORE_META_PORT=9999
      - CUBESTORE_SERVER_NAME=cubestore_router:9999
      - CUBESTORE_DATA_DIR=/cube/data
      - CUBESTORE_MINIO_SERVER_ENDPOINT=http://leftoverstoday-dev-01:9000
      - CUBESTORE_MINIO_BUCKET=cube
      - CUBESTORE_MINIO_REGION=''
      - CUBESTORE_MINIO_ACCESS_KEY_ID=minio
      - CUBESTORE_MINIO_SECRET_ACCESS_KEY=<KEY>
    volumes:
      - .cubestore:/cube/data
  1. Create a pre-aggregation
  2. Open the playground and run the query
  3. Cube will try to generate the pre-aggregation files
  4. The files are created successfully in the Minio bucket under the folder temp-uploads but when it immediately tries to read it again, it fails and creates another copy of the file, repeating indefinitely

Expected behavior
A clear and concise description of what you expected to happen.
The file is created and read successfully

Screenshots
If applicable, add screenshots to help explain your problem.

Image

Minimally reproducible Cube Schema
In case your bug report is data modelling related please put your minimally reproducible Cube Schema here.
You can use selects without tables in order to achieve that as follows.

cubes:
  - name: order
    sql_table: vendure.order
    data_source: default

    joins: []

    dimensions:
      - name: id
        sql: id
        type: string
        primary_key: true

      - name: type
        sql: type
        type: string

      - name: code
        sql: code
        type: string

      - name: state
        sql: state
        type: string

      - name: couponcodes
        sql: "{CUBE}.`couponCodes`"
        type: string

      - name: shippingaddress
        sql: "{CUBE}.`shippingAddress`"
        type: string

      - name: billingaddress
        sql: "{CUBE}.`billingAddress`"
        type: string

      - name: currencycode
        sql: "{CUBE}.`currencyCode`"
        type: string

      - name: aggregateorderid
        sql: "{CUBE}.`aggregateOrderId`"
        type: string

      - name: customerid
        sql: "{CUBE}.`customerId`"
        type: string

      - name: taxzoneid
        sql: "{CUBE}.`taxZoneId`"
        type: string

      - name: customfieldstotalWeightLbs
        sql: "{CUBE}.`customFieldsTotalWeightLbs`"
        type: number

      - name: customfieldssavingsDollars
        sql: "{CUBE}.`customFieldsSavingsDollars`"
        type: number

      - name: customfieldsvendoridentifier
        sql: "{CUBE}.`customFieldsVendoridentifier`"
        type: string

      - name: customfieldssnapebt
        sql: "{CUBE}.`customFieldsSnapebt`"
        type: string

      - name: customfieldsdob
        sql: "{CUBE}.`customFieldsDob`"
        type: string

      - name: customfieldsphone
        sql: "{CUBE}.`customFieldsPhone`"
        type: string

      - name: createdat
        sql: "{CUBE}.`createdAt`"
        type: time

      - name: updatedat
        sql: "{CUBE}.`updatedAt`"
        type: time

      - name: orderplacedat
        sql: "{CUBE}.`orderPlacedAt`"
        type: time

    measures:
      - name: count
        type: count

      - name: subtotal
        sql: "{CUBE}.`subTotal`"
        type: sum
      
      - name: weight
        sql: "{CUBE}.`customfieldstotalWeightLbs`"
        type: sum
      
      - name: dollars
        sql: "{CUBE}.`customFieldsSavingsDollars`"
        type: sum

    pre_aggregations:
      # Pre-aggregation definitions go here.
      # Learn more in the documentation: https://cube.dev/docs/caching/pre-aggregations/getting-started
      - name: main
        measures:
          - order.count
          - order.weight
          - order.dollars
        dimensions:
          - order.customfieldsvendoridentifier
          - order.state
        refreshKey:
          every: 1 hour
          updateWindow: 3 day
          incremental: true
        partitionGranularity: day
        timeDimension: order.orderplacedat
        granularity: day

Version:
cubejs/cube:v1.2.3
cubejs/cubestore:v1.2.3-non-avx

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

Labels

enhancementLEGACY. Use the Feature issue type insteadhelp wantedCommunity contributions are welcome.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions