Skip to content

Etcd storage appears to not be migrated from v2 to v3 #5220

@Srokap

Description

@Srokap

I noticed my cluster not running and having issues with etcd service. I noticed it was related to enable-v2 flag as described in #5209, I removed the --enable-v2=true flag from /var/snap/microk8s/8384/args/etcd file to match the fix from #5212, however now I get the new error: illegal v2store content.

According to https://etcd.io/docs/v3.6/upgrades/upgrade_3_6/, there should be a migration performed, I cannot find a trace of a migration command (ETCDCTL_API=3 etcdctl migrate) in microk8s repository, so perhaps we're missing the migration step?

Below are logs of the failed startup of the etcd from the /var/log/syslog

Sep  8 13:04:12 ckube-1 systemd[1]: Started Service for snap application microk8s.daemon-etcd.
Sep  8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"warn","ts":"2025-09-08T13:04:12.556709Z","caller":"embed/config.go:1209","msg":"Running http and grpc server on single port. This is not recommended for production."}
Sep  8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"warn","ts":"2025-09-08T13:04:12.557227Z","caller":"embed/config.go:1320","msg":"it isn't recommended to use default name, please set a value for --name. Note that etcd might run into issue when multiple members have the same default name","name":"default"}
Sep  8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:12.557365Z","caller":"etcdmain/etcd.go:64","msg":"Running: ","args":["/snap/microk8s/8384/etcd","--data-dir=/var/snap/microk8s/common/var/run/etcd","--advertise-client-urls=https://192.168.5.70:12379","--listen-client-urls=https://0.0.0.0:12379","--client-cert-auth","--trusted-ca-file=/var/snap/microk8s/8384/certs/ca.crt","--cert-file=/var/snap/microk8s/8384/certs/server.crt","--key-file=/var/snap/microk8s/8384/certs/server.key"]}
Sep  8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:12.557580Z","caller":"etcdmain/etcd.go:107","msg":"server has already been initialized","data-dir":"/var/snap/microk8s/common/var/run/etcd","dir-type":"member"}
Sep  8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"warn","ts":"2025-09-08T13:04:12.557738Z","caller":"embed/config.go:1209","msg":"Running http and grpc server on single port. This is not recommended for production."}
Sep  8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"warn","ts":"2025-09-08T13:04:12.557857Z","caller":"embed/config.go:1320","msg":"it isn't recommended to use default name, please set a value for --name. Note that etcd might run into issue when multiple members have the same default name","name":"default"}
Sep  8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:12.557974Z","caller":"embed/etcd.go:138","msg":"configuring peer listeners","listen-peer-urls":["http://localhost:2380"]}
Sep  8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:12.558546Z","caller":"embed/etcd.go:146","msg":"configuring client listeners","listen-client-urls":["https://0.0.0.0:12379"]}
Sep  8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:12.558843Z","caller":"embed/etcd.go:323","msg":"starting an etcd server","etcd-version":"3.6.4","git-sha":"5400cdc","go-version":"go1.23.11","go-os":"linux","go-arch":"amd64","max-cpu-set":4,"max-cpu-available":4,"member-initialized":true,"name":"default","data-dir":"/var/snap/microk8s/common/var/run/etcd","wal-dir":"","wal-dir-dedicated":"","member-dir":"/var/snap/microk8s/common/var/run/etcd/member","force-new-cluster":false,"heartbeat-interval":"100ms","election-timeout":"1s","initial-election-tick-advance":true,"snapshot-count":10000,"max-wals":5,"max-snapshots":5,"snapshot-catchup-entries":5000,"initial-advertise-peer-urls":["http://localhost:2380"],"listen-peer-urls":["http://localhost:2380"],"advertise-client-urls":["https://192.168.5.70:12379"],"listen-client-urls":["https://0.0.0.0:12379"],"listen-metrics-urls":[],"experimental-local-address":"","cors":["*"],"host-whitelist":["*"],"initial-cluster":"","initial-cluster-state":"new","initial-cluster-token":"","quota-backend-bytes":2147483648,"max-request-bytes":1572864,"max-concurrent-streams":4294967295,"pre-vote":true,"feature-gates":"","initial-corrupt-check":false,"corrupt-check-time-interval":"0s","compact-check-time-interval":"1m0s","auto-compaction-mode":"periodic","auto-compaction-retention":"0s","auto-compaction-interval":"0s","discovery-url":"","discovery-proxy":"","discovery-token":"","discovery-endpoints":"","discovery-dial-timeout":"2s","discovery-request-timeout":"5s","discovery-keepalive-time":"2s","discovery-keepalive-timeout":"6s","discovery-insecure-transport":true,"discovery-insecure-skip-tls-verify":false,"discovery-cert":"","discovery-key":"","discovery-cacert":"","discovery-user":"","downgrade-check-interval":"5s","max-learners":1,"v2-deprecation":"write-only"}
Sep  8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:12.559479Z","logger":"bbolt","caller":"backend/backend.go:203","msg":"Opening db file (/var/snap/microk8s/common/var/run/etcd/member/snap/db) with mode -rw------- and with options: {Timeout: 0s, NoGrowSync: false, NoFreelistSync: true, PreLoadFreelist: false, FreelistType: hashmap, ReadOnly: false, MmapFlags: 8000, InitialMmapSize: 10737418240, PageSize: 0, NoSync: false, OpenFile: 0x0, Mlock: false, Logger: 0xc0003d0118}"}
Sep  8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:12.577801Z","logger":"bbolt","caller":"bbolt@v1.4.2/db.go:321","msg":"Opening bbolt db (/var/snap/microk8s/common/var/run/etcd/member/snap/db) successfully"}
Sep  8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:12.577867Z","caller":"storage/backend.go:80","msg":"opened backend db","path":"/var/snap/microk8s/common/var/run/etcd/member/snap/db","took":"18.489834ms"}
Sep  8 13:04:12 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:12.577900Z","caller":"etcdserver/bootstrap.go:220","msg":"restore consistentIndex","index":491955990}
Sep  8 13:04:13 ckube-1 microk8s.daemon-etcd[136524]: {"level":"error","ts":"2025-09-08T13:04:13.111288Z","caller":"etcdserver/bootstrap.go:409","msg":"illegal v2store content","error":"detected disallowed custom content in v2store for stage --v2-deprecation=write-only","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.recoverSnapshot\n\tgo.etcd.io/etcd/server/v3/etcdserver/bootstrap.go:409\ngo.etcd.io/etcd/server/v3/etcdserver.bootstrapBackend\n\tgo.etcd.io/etcd/server/v3/etcdserver/bootstrap.go:225\ngo.etcd.io/etcd/server/v3/etcdserver.bootstrap\n\tgo.etcd.io/etcd/server/v3/etcdserver/bootstrap.go:80\ngo.etcd.io/etcd/server/v3/etcdserver.NewServer\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:307\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:262\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:207\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:114\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:272"}
Sep  8 13:04:13 ckube-1 microk8s.daemon-etcd[136524]: {"level":"error","ts":"2025-09-08T13:04:13.118940Z","caller":"etcdserver/server.go:309","msg":"bootstrap failed","error":"detected disallowed custom content in v2store for stage --v2-deprecation=write-only","stacktrace":"go.etcd.io/etcd/server/v3/etcdserver.NewServer\n\tgo.etcd.io/etcd/server/v3/etcdserver/server.go:309\ngo.etcd.io/etcd/server/v3/embed.StartEtcd\n\tgo.etcd.io/etcd/server/v3/embed/etcd.go:262\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcd\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:207\ngo.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:114\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:272"}
Sep  8 13:04:13 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:13.119030Z","caller":"embed/etcd.go:426","msg":"closing etcd server","name":"default","data-dir":"/var/snap/microk8s/common/var/run/etcd","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["https://192.168.5.70:12379"]}
Sep  8 13:04:13 ckube-1 microk8s.daemon-etcd[136524]: {"level":"info","ts":"2025-09-08T13:04:13.119151Z","caller":"embed/etcd.go:428","msg":"closed etcd server","name":"default","data-dir":"/var/snap/microk8s/common/var/run/etcd","advertise-peer-urls":["http://localhost:2380"],"advertise-client-urls":["https://192.168.5.70:12379"]}
Sep  8 13:04:13 ckube-1 microk8s.daemon-etcd[136524]: {"level":"fatal","ts":"2025-09-08T13:04:13.119188Z","caller":"etcdmain/etcd.go:183","msg":"discovery failed","error":"detected disallowed custom content in v2store for stage --v2-deprecation=write-only","stacktrace":"go.etcd.io/etcd/server/v3/etcdmain.startEtcdOrProxyV2\n\tgo.etcd.io/etcd/server/v3/etcdmain/etcd.go:183\ngo.etcd.io/etcd/server/v3/etcdmain.Main\n\tgo.etcd.io/etcd/server/v3/etcdmain/main.go:40\nmain.main\n\tgo.etcd.io/etcd/server/v3/main.go:31\nruntime.main\n\truntime/proc.go:272"}
Sep  8 13:04:13 ckube-1 systemd[1]: snap.microk8s.daemon-etcd.service: Main process exited, code=exited, status=1/FAILURE
Sep  8 13:04:13 ckube-1 systemd[1]: snap.microk8s.daemon-etcd.service: Failed with result 'exit-code'.
Sep  8 13:04:13 ckube-1 systemd[1]: snap.microk8s.daemon-etcd.service: Scheduled restart job, restart counter is at 1.
Sep  8 13:04:13 ckube-1 systemd[1]: Stopped Service for snap application microk8s.daemon-etcd.

Summary

The new version of etcd expects storage to be migrated off the v2 version, which seems to never be executed when upgrading existing microk8s deployments.

What Should Happen Instead?

microk8s should perform the etcd storage upgrade prior to removal of the v2 storage.

Reproduction Steps

I have 3 clusters, 2 of them are using etcd and are experiencing the same issue. The one using Dqlite is unaffected.

My snap reports

snap-id:      EaXqgt1lyCaxKaQCU349mlodBkDCXRcg
tracking:     latest/stable
refresh-date: today at 10:42 UTC
installed:    v1.34.0 (8384) 183MB classic

My current content of the env file (/var/snap/microk8s/8384/args/etcd)

--data-dir=${SNAP_COMMON}/var/run/etcd
--advertise-client-urls=https://${DEFAULT_INTERFACE_IP_ADDR}:12379
--listen-client-urls=https://0.0.0.0:12379
--client-cert-auth
--trusted-ca-file=${SNAP_DATA}/certs/ca.crt
--cert-file=${SNAP_DATA}/certs/server.crt
--key-file=${SNAP_DATA}/certs/server.key

Introspection Report

Inspecting system
Inspecting Certificates
Inspecting services
  Service snap.microk8s.daemon-cluster-agent is running
  Service snap.microk8s.daemon-containerd is running
  Service snap.microk8s.daemon-kubelite is running
  Service snap.microk8s.daemon-flanneld is running
 FAIL:  Service snap.microk8s.daemon-etcd is not running
For more details look at: sudo journalctl -u snap.microk8s.daemon-etcd
  Service snap.microk8s.daemon-apiserver-kicker is running
  Copy service arguments to the final report tarball
Inspecting AppArmor configuration
Gathering system information
  Copy processes list to the final report tarball
  Copy disk usage information to the final report tarball
  Copy memory usage information to the final report tarball
  Copy server uptime to the final report tarball
  Copy openSSL information to the final report tarball
  Copy snap list to the final report tarball
  Copy VM name (or none) to the final report tarball
  Copy current linux distribution to the final report tarball
  Copy asnycio usage and limits to the final report tarball
  Copy inotify max_user_instances and max_user_watches to the final report tarball
  Copy network configuration to the final report tarball
Inspecting kubernetes cluster
  Inspect kubernetes cluster

WARNING:  Maximum number of inotify user instances is less than the recommended value of 1024.
          Increase the limit with:
                 echo fs.inotify.max_user_instances=1024 | sudo tee -a /etc/sysctl.conf
                 sudo sysctl --system
WARNING:  Maximum number of inotify user watches is less than the recommended value of 1048576.
          Increase the limit with:
                 echo fs.inotify.max_user_watches=1048576 | sudo tee -a /etc/sysctl.conf
                 sudo sysctl --system
Building the report tarball
  Report tarball is at /var/snap/microk8s/8384/inspection-report-20250908_132532.tar.gz

Can you suggest a fix?

It seems that at some point the migration from v2 to v3 storage should be performed, before we turn off the v2 storage.

See

Are you interested in contributing with a fix?

Yes, but would need some guidance on where to put the migration script.

inspection-report-20250908_132532.tar.gz

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions