-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Problem
I'm experiencing a pre-aggregation build failure in my production Cube.js setup. When querying metrics through GraphQL, I receive an error indicating that CubeStore cannot find the temporary CSV file in the remote file system during pre-aggregation creation.
Error Message:
Error during create table: CREATE TABLE my_app_pre_aggregations.metrics_daily20250601_s2jpx1wd_qabmhjuk_1kjo49s (...) WITH (build_range_end = '2025-06-17T21:44:17.000') LOCATION ?: Create table failed: CorruptData: File temp-uploads/my_app_pre_aggregations.metrics_daily20250601_s2jpx1wd_qabmhjuk_1kjo49s-0.csv.gz doesn't exist in remote file system
Environment
- Deployment: Kubernetes (production)
- Cube.js Version: Latest (using official Cube.js Docker image)
- Database: MySQL
- CubeStore: Deployed as separate router service (
cubestore-router.my-app-service.svc.cluster.local:3030) - Cache Driver:
memory(set in cube.js config) - Environment: Production (NODE_ENV=production, CUBEJS_DEV_MODE=false)
Configuration
Cube.js Configuration (cube.js)
export default {
apiSecret: config.apiSecret,
basePath: '/api',
driverFactory: () => configService.getDatabaseConfig(),
schemaPath: 'schema',
webSockets: config.webSockets,
devServer: config.devMode,
// Pre-aggregations configuration
preAggregationsSchema: config.preAggregationsSchema,
// Cache and Queue Configuration
cacheAndQueueDriver: 'memory',
// Orchestrator options
orchestratorOptions: {
redisPrefix: config.redisPrefix,
queryCacheOptions: {
refreshKeyRenewalThreshold: 30,
backgroundRenew: true,
},
preAggregationsOptions: {
queueOptions: {
executionTimeout: 1800, // 30 minutes
},
},
},
scheduledRefreshTimer: true,
// ... authentication and query rewrite logic
};Environment Variables (Kubernetes ConfigMap)
CUBEJS_CUBESTORE_HOST: cubestore-router.my-app-service.svc.cluster.local
CUBEJS_CUBESTORE_PORT: "3030"
CUBEJS_DEV_MODE: "false"
CUBEJS_LOG_LEVEL: trace
CUBEJS_TELEMETRY: "false"
CUBEJS_DB_TYPE: mysql
NODE_ENV: productionNote: No external storage configuration (S3/GCS/MinIO) is currently set. No CUBESTORE_REMOTE_DIR, CUBESTORE_S3_BUCKET, or similar variables are configured.
Related Cube.js Schema
user_metrics.js (Pre-aggregation Definition)
cube(`user_metrics`, {
sql: `
SELECT
t.*,
t.id = (
SELECT MIN(t2.id)
FROM transactions t2
WHERE t2.account_id = t.account_id
) as is_first_account_row,
(
SELECT COALESCE(SUM(
COALESCE(fees, 0) - COALESCE(cost, 0)
), 0)
FROM fee_transactions
WHERE account_id = t.account_id
) as account_fees
FROM transactions t
`,
sqlAlias: `user`,
// ... dimensions and measures ...
pre_aggregations: {
user_daily: {
type: `rollup`,
sqlAlias: `user_daily`,
measures: [
`user_metrics.total_transactions`,
`user_metrics.total_sales`,
`user_metrics.fees`,
`user_metrics.monthly_revenue`,
`user_metrics.last_transaction`,
`user_metrics.total_volume`
],
dimensions: [
`user_metrics.org_id`,
`user_metrics.org_user_id`,
`user_metrics.user_id`,
`user_metrics.account_id`,
`user_metrics.transaction_type`,
`user_metrics.currency`,
`user_metrics.status`
],
time_dimension: `user_metrics.transaction_date`,
granularity: `day`,
partition_granularity: `month`,
refresh_key: {
every: `1 day`
},
},
// ... other pre-aggregations ...
}
});Query That Triggers the Error
curl 'https://api.example.com/api/1.0/my-app-service/graphql' \
-H 'authorization: Bearer <token>' \
-H 'content-type: application/json' \
--data-raw '{"query":"query { cube(limit: 10) { user_metrics { user_id total_sales total_transactions fees } } }"}'Questions
-
Is external storage (S3/GCS/MinIO) required for CubeStore in production? Our current setup doesn't have any external storage configured.
-
Should we configure
CUBESTORE_REMOTE_DIR? If so, what's the recommended setup for Kubernetes deployments? -
Is
CUBESTORE_NO_UPLOAD=truea viable workaround? We saw this in staging config but want to understand the implications. -
What's the proper CubeStore configuration for a multi-pod Kubernetes deployment? We have separate deployments for:
- Cube API server (api-server)
- Cube refresh worker (refresh-worker)
- CubeStore router (separate service)
-
Are there any missing environment variables that would allow CubeStore to properly handle the
temp-uploadsdirectory?
Additional Context
- We have multiple pre-aggregations defined across different cubes (admin_metrics, org_metrics, user_metrics, account_metrics)
- The error occurs consistently when trying to build pre-aggregations
- Our staging environment has
CUBESTORE_NO_UPLOAD=trueset, but we're not sure if this is the right approach for production
Any guidance on the proper CubeStore configuration for production Kubernetes deployments would be greatly appreciated!