Skip to content

Pre-aggregation Build Failure with CubeStoreΒ #10244

@Moksh45

Description

@Moksh45

Problem

I'm experiencing a pre-aggregation build failure in my production Cube.js setup. When querying metrics through GraphQL, I receive an error indicating that CubeStore cannot find the temporary CSV file in the remote file system during pre-aggregation creation.

Error Message:

Error during create table: CREATE TABLE my_app_pre_aggregations.metrics_daily20250601_s2jpx1wd_qabmhjuk_1kjo49s (...) WITH (build_range_end = '2025-06-17T21:44:17.000') LOCATION ?: Create table failed: CorruptData: File temp-uploads/my_app_pre_aggregations.metrics_daily20250601_s2jpx1wd_qabmhjuk_1kjo49s-0.csv.gz doesn't exist in remote file system

Environment

  • Deployment: Kubernetes (production)
  • Cube.js Version: Latest (using official Cube.js Docker image)
  • Database: MySQL
  • CubeStore: Deployed as separate router service (cubestore-router.my-app-service.svc.cluster.local:3030)
  • Cache Driver: memory (set in cube.js config)
  • Environment: Production (NODE_ENV=production, CUBEJS_DEV_MODE=false)

Configuration

Cube.js Configuration (cube.js)

export default {
  apiSecret: config.apiSecret,
  basePath: '/api',
  driverFactory: () => configService.getDatabaseConfig(),
  schemaPath: 'schema',
  webSockets: config.webSockets,
  devServer: config.devMode,
  
  // Pre-aggregations configuration
  preAggregationsSchema: config.preAggregationsSchema,
  
  // Cache and Queue Configuration
  cacheAndQueueDriver: 'memory',
  
  // Orchestrator options
  orchestratorOptions: {
    redisPrefix: config.redisPrefix,
    queryCacheOptions: {
      refreshKeyRenewalThreshold: 30,
      backgroundRenew: true,
    },
    preAggregationsOptions: {
      queueOptions: {
        executionTimeout: 1800, // 30 minutes
      },
    },
  },
  
  scheduledRefreshTimer: true,
  // ... authentication and query rewrite logic
};

Environment Variables (Kubernetes ConfigMap)

CUBEJS_CUBESTORE_HOST: cubestore-router.my-app-service.svc.cluster.local
CUBEJS_CUBESTORE_PORT: "3030"
CUBEJS_DEV_MODE: "false"
CUBEJS_LOG_LEVEL: trace
CUBEJS_TELEMETRY: "false"
CUBEJS_DB_TYPE: mysql
NODE_ENV: production

Note: No external storage configuration (S3/GCS/MinIO) is currently set. No CUBESTORE_REMOTE_DIR, CUBESTORE_S3_BUCKET, or similar variables are configured.

Related Cube.js Schema

user_metrics.js (Pre-aggregation Definition)

cube(`user_metrics`, {
  sql: `
    SELECT
      t.*,
      t.id = (
        SELECT MIN(t2.id)
        FROM transactions t2
        WHERE t2.account_id = t.account_id
      ) as is_first_account_row,
      (
        SELECT COALESCE(SUM(
          COALESCE(fees, 0) - COALESCE(cost, 0)
        ), 0)
        FROM fee_transactions
        WHERE account_id = t.account_id
      ) as account_fees
    FROM transactions t
  `,
  sqlAlias: `user`,

  // ... dimensions and measures ...

  pre_aggregations: {
    user_daily: {
      type: `rollup`,
      sqlAlias: `user_daily`,
      measures: [
        `user_metrics.total_transactions`,
        `user_metrics.total_sales`,
        `user_metrics.fees`,
        `user_metrics.monthly_revenue`,
        `user_metrics.last_transaction`,
        `user_metrics.total_volume`
      ],
      dimensions: [
        `user_metrics.org_id`,
        `user_metrics.org_user_id`,
        `user_metrics.user_id`,
        `user_metrics.account_id`,
        `user_metrics.transaction_type`,
        `user_metrics.currency`,
        `user_metrics.status`
      ],
      time_dimension: `user_metrics.transaction_date`,
      granularity: `day`,
      partition_granularity: `month`,
      refresh_key: {
        every: `1 day`
      },
    },
    // ... other pre-aggregations ...
  }
});

Query That Triggers the Error

curl 'https://api.example.com/api/1.0/my-app-service/graphql' \
  -H 'authorization: Bearer <token>' \
  -H 'content-type: application/json' \
  --data-raw '{"query":"query { cube(limit: 10) { user_metrics { user_id total_sales total_transactions fees } } }"}'

Questions

  1. Is external storage (S3/GCS/MinIO) required for CubeStore in production? Our current setup doesn't have any external storage configured.

  2. Should we configure CUBESTORE_REMOTE_DIR? If so, what's the recommended setup for Kubernetes deployments?

  3. Is CUBESTORE_NO_UPLOAD=true a viable workaround? We saw this in staging config but want to understand the implications.

  4. What's the proper CubeStore configuration for a multi-pod Kubernetes deployment? We have separate deployments for:

    • Cube API server (api-server)
    • Cube refresh worker (refresh-worker)
    • CubeStore router (separate service)
  5. Are there any missing environment variables that would allow CubeStore to properly handle the temp-uploads directory?

Additional Context

  • We have multiple pre-aggregations defined across different cubes (admin_metrics, org_metrics, user_metrics, account_metrics)
  • The error occurs consistently when trying to build pre-aggregations
  • Our staging environment has CUBESTORE_NO_UPLOAD=true set, but we're not sure if this is the right approach for production

Any guidance on the proper CubeStore configuration for production Kubernetes deployments would be greatly appreciated!

Metadata

Metadata

Assignees

Labels

cube storeIssues relating to Cube StorequestionThe issue is a question. Please use Stack Overflow for questions.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions