-
Notifications
You must be signed in to change notification settings - Fork 10.2k
Description
Bug report criteria
- This bug report is not security related, security issues should be disclosed privately via [email protected].
- This is not a support request or question, support requests or questions should be raised in the etcd discussion forums.
- You have read the etcd bug reporting guidelines.
- Existing open issues along with etcd frequently asked questions have been checked and this is not a duplicate.
What happened?
I have come across a 3-replica Etcd v3.5.6 cluster running on kubernetes (bitnami/etcd helm chart v8.6.0). Which reaches a state where 2 of the replicas report correct data, while one is out-of-sync and still serves read requests and reports incorrect data.
I have no name!@mayastor-etcd-1:/opt/bitnami/etcd$ etcdctl endpoint status --cluster -w table
+-------------------------------------------------------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-------------------------------------------------------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| http://mayastor-etcd-0.mayastor-etcd-headless.mayastor.svc.cluster.local:2379 | 482c4e3b4b340c15 | 3.5.6 | 2.9 MB | false | false | 4 | 41386 | 41386 | |
| http://mayastor-etcd.mayastor.svc.cluster.local:2379 | 482c4e3b4b340c15 | 3.5.6 | 2.9 MB | false | false | 4 | 41386 | 41386 | |
| http://mayastor-etcd-1.mayastor-etcd-headless.mayastor.svc.cluster.local:2379 | 6b985f3365cade67 | 3.5.6 | 14 MB | false | false | 4 | 41386 | 41386 | |
| http://mayastor-etcd.mayastor.svc.cluster.local:2379 | 6b985f3365cade67 | 3.5.6 | 14 MB | false | false | 4 | 41386 | 41386 | |
| http://mayastor-etcd-2.mayastor-etcd-headless.mayastor.svc.cluster.local:2379 | 946609bde8186189 | 3.5.6 | 2.9 MB | true | false | 4 | 41386 | 41386 | |
| http://mayastor-etcd.mayastor.svc.cluster.local:2379 | 482c4e3b4b340c15 | 3.5.6 | 2.9 MB | false | false | 4 | 41386 | 41386 | |
+-------------------------------------------------------------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
Logs from instance 'mayastor-etcd-0'
{"level":"info","ts":"2025-08-06T10:36:13.435Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":15452}
{"level":"info","ts":"2025-08-06T10:36:13.436Z","caller":"mvcc/kvstore_compaction.go:66","msg":"finished scheduled compaction","compact-revision":15452,"took":"489.47µs","hash":637051441}
{"level":"info","ts":"2025-08-06T10:36:13.436Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":637051441,"revision":15452,"compact-revision":15423}
{"level":"info","ts":"2025-08-06T10:41:13.440Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":15482}
{"level":"info","ts":"2025-08-06T10:41:13.442Z","caller":"mvcc/kvstore_compaction.go:66","msg":"finished scheduled compaction","compact-revision":15482,"took":"1.046749ms","hash":1245049365}
{"level":"info","ts":"2025-08-06T10:41:13.442Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":1245049365,"revision":15482,"compact-revision":15452}
{"level":"info","ts":"2025-08-06T10:46:13.443Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":15511}
{"level":"info","ts":"2025-08-06T10:46:13.444Z","caller":"mvcc/kvstore_compaction.go:66","msg":"finished scheduled compaction","compact-revision":15511,"took":"771.238µs","hash":1343676884}
{"level":"info","ts":"2025-08-06T10:46:13.444Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":1343676884,"revision":15511,"compact-revision":15482}
Logs from 'mayastor-etcd-1'
{"level":"warn","ts":"2025-08-06T10:36:13.435Z","caller":"etcdserver/util.go:123","msg":"failed to apply request","took":"28.024µs","request":"header:<ID:7028316349384640945 > compaction:<revision:15452 > ","response":"","error":"mvcc: required revision is a future revision"}
{"level":"warn","ts":"2025-08-06T10:41:13.438Z","caller":"etcdserver/util.go:123","msg":"failed to apply request","took":"10.861µs","request":"header:<ID:7028316349384641026 > compaction:<revision:15482 > ","response":"","error":"mvcc: required revision is a future revision"}
{"level":"warn","ts":"2025-08-06T10:46:13.442Z","caller":"etcdserver/util.go:123","msg":"failed to apply request","took":"26.761µs","request":"header:<ID:7028316349384641111 > compaction:<revision:15511 > ","response":"","error":"mvcc: required revision is a future revision"}
Logs from 'mayastor-etcd-2'
{"level":"info","ts":"2025-08-06T10:36:13.432Z","caller":"v3compactor/revision.go:86","msg":"starting auto revision compaction","revision":15452,"revision-compaction-retention":100}
{"level":"info","ts":"2025-08-06T10:36:13.435Z","caller":"v3compactor/revision.go:94","msg":"completed auto revision compaction","revision":15452,"revision-compaction-retention":100,"took":"3.158624ms"}
{"level":"info","ts":"2025-08-06T10:36:13.435Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":15452}
{"level":"info","ts":"2025-08-06T10:36:13.437Z","caller":"mvcc/kvstore_compaction.go:66","msg":"finished scheduled compaction","compact-revision":15452,"took":"818.858µs","hash":2543608690}
{"level":"info","ts":"2025-08-06T10:36:13.437Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":2543608690,"revision":15452,"compact-revision":15423}
{"level":"info","ts":"2025-08-06T10:41:13.436Z","caller":"v3compactor/revision.go:86","msg":"starting auto revision compaction","revision":15482,"revision-compaction-retention":100}
{"level":"info","ts":"2025-08-06T10:41:13.439Z","caller":"v3compactor/revision.go:94","msg":"completed auto revision compaction","revision":15482,"revision-compaction-retention":100,"took":"2.874542ms"}
{"level":"info","ts":"2025-08-06T10:41:13.439Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":15482}
{"level":"info","ts":"2025-08-06T10:41:13.441Z","caller":"mvcc/kvstore_compaction.go:66","msg":"finished scheduled compaction","compact-revision":15482,"took":"1.027483ms","hash":4177801239}
{"level":"info","ts":"2025-08-06T10:41:13.441Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":4177801239,"revision":15482,"compact-revision":15452}
{"level":"info","ts":"2025-08-06T10:46:13.439Z","caller":"v3compactor/revision.go:86","msg":"starting auto revision compaction","revision":15511,"revision-compaction-retention":100}
{"level":"info","ts":"2025-08-06T10:46:13.442Z","caller":"v3compactor/revision.go:94","msg":"completed auto revision compaction","revision":15511,"revision-compaction-retention":100,"took":"2.913678ms"}
{"level":"info","ts":"2025-08-06T10:46:13.442Z","caller":"mvcc/index.go:214","msg":"compact tree index","revision":15511}
{"level":"info","ts":"2025-08-06T10:46:13.444Z","caller":"mvcc/kvstore_compaction.go:66","msg":"finished scheduled compaction","compact-revision":15511,"took":"842.829µs","hash":702843617}
{"level":"info","ts":"2025-08-06T10:46:13.444Z","caller":"mvcc/hash.go:137","msg":"storing new hash","hash":702843617,"revision":15511,"compact-revision":15482}
Complete logs here.
The instance mayastor-etcd-1 never recovers and stays in that state perpetually.
What did you expect to happen?
I expected that the etcd instance which is out of sync would recover, and would not serve read requests.
How can we reproduce it (as minimally and precisely as possible)?
This was seen after a helm upgrade
. The etcd didn't go through an upgrade. The helm upgrade
was that of a helm chart which uses bitnami's etcd chart as dependency. It was the mayastor chart (link). The upgrade was from mayastor v2.7.3 to v2.7.4.
On helm upgrades, etcd often restarts, however this is not always the case. Etcd restarts on the first helm upgrade, irrespective of if there is any change in the etcd or not (this is expected behaviour of the bitnami/etcd chart, due to it reading from the jwt-token kubernetes secret, I can elaborate further on this, if required). In this case, it had restarted. A restart would mean a StatefulSet rollout, one by one. in the order mayastor-etcd-0, mayastor-etcd-1, mayastor-etcd-2.
Anything else we need to know?
We know that the etcd instance mayastor-etcd-1 has outdated data from the etcd dumps we took when trying to debug this issue. However, we cannot share the data as the data is sensitive.
Is there some knob/configuration we have to tweak so that once an etcd instance knows it's generation no. is behind othes, it stops serving reads, and prefers to be strongly consistent?
Thank you folks!
Etcd version (please run commands below)
$ etcd --version
# paste output here
$ etcdctl version
# paste output here
Etcd configuration (command line flags or environment variables)
paste your configuration here
Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)
$ etcdctl member list -w table
# paste output here
$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here