Skip to content

[cubestore] Infinite loop during metastore initialization (v1.3.16) #9610

@mahendranmahendran

Description

@mahendranmahendran

Environment

  • Cube.js Version: v1.3.16
  • Cube Store Version: v1.3.16
  • Deployment: Docker Compose
  • Infrastructure: ClickHouse 23.8, Redis 7
  • OS: Linux (specify if Ubuntu/CentOS/etc.)

Error Description

Cube Store enters an infinite loop during metastore initialization, repeatedly logging:

2025-05-23T06:01:10.578Z INFO [cubestore::metastore::rocks_fs] Creating metastore from scratch...
2025-05-23T06:01:25.602Z INFO [cubestore::metastore::rocks_fs] Creating cachestore from scratch...

Never progresses beyond this point. Health checks fail after start_period.

Steps to Reproduce

  1. Fresh install with docker-compose.yml (see below)
  2. Run docker-compose up -d cubestore
  3. Observe logs with docker-compose logs cubestore

Expected vs. Actual Behavior

  • Expected: Cube Store initializes within 2 minutes, responds to health checks.
  • Actual: Stuck in snapshot loop, never becomes healthy.

Debugging Attempts

Already tried:

  • Different versions (v1.3.16, v1.2.0)
  • Directory permission fixes (chown -R 1000:1000)
  • Reduced workers to 1
  • Disabled health checks temporarily

Critical Files

docker-compose.yml

version: '3.8'

services:
  clickhouse:
    image: clickhouse/clickhouse-server:23.8
    container_name: clickhouse
    hostname: clickhouse
    restart: unless-stopped
    ports:
      - "8123:8123"
      - "9000:9000"
    volumes:
      - /opt/ch-dbaas/data/clickhouse:/var/lib/clickhouse
      - /opt/ch-dbaas/config/clickhouse/config.d:/etc/clickhouse-server/config.d
      - /opt/ch-dbaas/config/clickhouse/users.d:/etc/clickhouse-server/users.d
      - /opt/ch-dbaas/scripts:/scripts
    environment:
      - CLICKHOUSE_LOG_LEVEL=debug
    healthcheck:
      test: ["CMD", "clickhouse-client", "--query", "SELECT 1"]
      interval: 10s
      timeout: 5s
      retries: 3

  redis:
    image: redis:7-alpine
    container_name: redis
    command: redis-server --save 60 1 --loglevel verbose
    volumes:
      - /opt/ch-dbaas/data/redis:/data
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s

  redis-commander:
    image: rediscommander/redis-commander:latest
    container_name: redis-commander
    environment:
      - REDIS_HOSTS=local:redis:6379
    ports:
      - "8081:8081"
    depends_on:
      redis:
        condition: service_healthy

  cubestore:
    image: cubejs/cubestore:v1.2.0  # Older stable version
    environment:
      - CUBESTORE_DIR=/cube/data
      - CUBESTORE_NO_HEARTBEAT=true
    volumes:
      - /opt/ch-dbaas/data/cubestore:/cube/data
    healthcheck:
      test: ["CMD", "sh", "-c", "test -f /cube/data/meta/ROOT"]
      interval: 30s
      timeout: 5s
      start_period: 60s

  cube:
    image: cubejs/cube:v1.3.16
    container_name: cube
    restart: unless-stopped
    ports:
      - "3000:3000"  # Playground
      - "4000:4000"  # API
    environment:
      - CUBEJS_DB_TYPE=clickhouse
      - CUBEJS_DB_HOST=clickhouse
      - CUBEJS_DB_PORT=9000
      - CUBEJS_DB_USER=admin
      - CUBEJS_DB_PASS=AdminPass123
      - CUBEJS_CUBESTORE_HOST=cubestore
      - CUBEJS_CACHE_AND_QUEUE_DRIVER=cubestore
      - CUBEJS_SCHEMA_PATH=/cube/conf/schema
      - CUBEJS_DEV_MODE=true
      - CUBEJS_API_SECRET_FILE=/run/secrets/cube_token
    volumes:
      - /opt/ch-dbaas/cube/conf:/cube/conf
      - /opt/ch-dbaas/cube/schema:/cube/conf/schema
    secrets:
      - cube_token
    depends_on:
      clickhouse:
        condition: service_healthy
      cubestore:
        condition: service_healthy
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:4000/readyz"]
      interval: 30s
      timeout: 10s
      retries: 5
      start_period: 60s
    deploy:
      resources:
        limits:
          cpus: '2'
          memory: 2G

secrets:
  cube_token:
    file: /opt/ch-dbaas/cube_secret.txt

networks:
  default:
    driver: bridge
    ipam:
      config:
        - subnet: 172.28.0.0/16

Logs

root@globalhost1 ~/opt/ch-dbaas # docker-compose logs cubestore --no-log-prefix
2025-05-23T06:33:41.301Z INFO  [cubestored] <pid:1> Cube Store version 1.2.0
2025-05-23T06:33:41.311Z INFO  [cubestore::http::status] <pid:1> Serving status probes at 0.0.0.0:3031
2025-05-23T06:33:41.312Z INFO  [cubestore::metastore::rocks_fs] <pid:1> Creating metastore from scratch in /cube/.cubestore/data/metastore
2025-05-23T06:33:41.340Z INFO  [cubestore::mysql] <pid:1> MySQL port open on 0.0.0.0:3306
2025-05-23T06:33:41.340Z INFO  [cubestore::http] <pid:1> Http Server is listening on 0.0.0.0:3030
2025-05-23T06:33:56.341Z INFO  [cubestore::metastore::rocks_fs] <pid:1> Creating cachestore from scratch in /cube/.cubestore/data/cachestore
2025-05-23T06:34:11.341Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting metastore snapshot: done (313.166µs)
2025-05-23T06:34:26.367Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting cachestore snapshot: done (4.77µs)
2025-05-23T06:34:41.341Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting metastore snapshot: done (166.567µs)
2025-05-23T06:34:56.367Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting cachestore snapshot: done (3.003µs)
2025-05-23T06:35:11.341Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting metastore snapshot: done (370.027µs)
2025-05-23T06:35:26.367Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting cachestore snapshot: done (6.648µs)
2025-05-23T06:35:41.342Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting metastore snapshot: done (404.68µs)
2025-05-23T06:35:56.368Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting cachestore snapshot: done (6.172µs)
2025-05-23T06:36:11.342Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting metastore snapshot: done (152.673µs)
2025-05-23T06:36:26.368Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting cachestore snapshot: done (6.739µs)
2025-05-23T06:36:41.343Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting metastore snapshot: done (397.83µs)
2025-05-23T06:36:56.368Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting cachestore snapshot: done (6.276µs)
2025-05-23T06:37:11.344Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting metastore snapshot: done (446.367µs)
2025-05-23T06:37:26.369Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting cachestore snapshot: done (6.215µs)
2025-05-23T06:37:41.344Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting metastore snapshot: done (436.017µs)
2025-05-23T06:37:56.369Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting cachestore snapshot: done (2.272µs)
2025-05-23T06:38:11.345Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting metastore snapshot: done (434.987µs)
2025-05-23T06:38:26.369Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting cachestore snapshot: done (6.189µs)
2025-05-23T06:38:41.346Z INFO  [cubestore::metastore::rocks_store] <pid:1> Uploading metastore check point
2025-05-23T06:38:41.373Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting metastore snapshot: done (27.962021ms)
2025-05-23T06:38:56.370Z INFO  [cubestore::metastore::rocks_store] <pid:1> Uploading cachestore check point
2025-05-23T06:38:56.399Z INFO  [cubestore::metastore::rocks_store] <pid:1> Persisting cachestore snapshot: done (29.512643ms)
root@globalhost1 ~/opt/ch-dbaas # 

docker-info.txt
Client: Docker Engine - Community
 Version:    28.1.1
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.23.0
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.35.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 6
  Running: 4
  Paused: 0
  Stopped: 2
 Images: 11
 Server Version: 28.1.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 05044ec0a9a75232cad458027ca83437aae3f4da
 runc version: v1.2.5-0-g59923ef
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.15.0-140-generic
 Operating System: Ubuntu 22.04.5 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 8
 Total Memory: 62.57GiB
 Name: globalhost1
 ID: 50aed89e-1d62-44c9-8d2b-decc6673af14
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: false
 Insecure Registries:
  ::1/128
  127.0.0.0/8
 Live Restore Enabled: false


Additional Context

  • Occurs on fresh installs and existing deployments.
  • Disk I/O is normal (iotop shows no bottlenecks).
  • No relevant errors in dmesg.

Metadata

Metadata

Assignees

Labels

cube storeIssues relating to Cube StorequestionThe issue is a question. Please use Stack Overflow for questions.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions