Skip to content

etcd failed to send a snapshot. #21418

@Black-max12138

Description

@Black-max12138

Bug report criteria

What happened?

We have a three-node etcd cluster. The master node fails to send snapshots. The log is as follows:

{"level":"warn","ts":"2026-03-03T12:31:17.794522Z","caller":"etcdserver/snapshot_merge.go:72","msg":"failed to send database snapshot to writer","size":"274 MB","error":"EOF"}
{"level":"warn","ts":"2026-03-03T12:31:17.794889Z","caller":"rafthttp/snapshot_sender.go:102","msg":"failed to send database snapshot","snapshot-index":1427265,"remote-peer-id":"adb3a32b44be819c","bytes":486880304,"size":"487 MB","error":"ioutil: short read"}
{"level":"info","ts":"2026-03-03T12:31:17.795075Z","caller":"etcdserver/server.go:2218","msg":"sent merged snapshot","from":"265ab714481be3b3","to":"adb3a32b44be819c","bytes":486880304,"size":"487 MB","took":"1.66390738s"}
{"level":"info","ts":"2026-03-03T12:31:18.130443Z","caller":"etcdserver/server.go:2200","msg":"sending merged snapshot","from":"265ab714481be3b3","to":"adb3a32b44be819c","bytes":486880304,"size":"487 MB"}
{"level":"info","ts":"2026-03-03T12:31:18.130728Z","caller":"rafthttp/snapshot_sender.go:84","msg":"sending database snapshot","snapshot-index":1427265,"remote-peer-id":"adb3a32b44be819c","bytes":486880304,"size":"487 MB"}
{"level":"warn","ts":"2026-03-03T12:31:19.871649Z","caller":"etcdserver/snapshot_merge.go:72","msg":"failed to send database snapshot to writer","size":"274 MB","error":"EOF"}
{"level":"warn","ts":"2026-03-03T12:31:19.872006Z","caller":"rafthttp/snapshot_sender.go:102","msg":"failed to send database snapshot","snapshot-index":1427265,"remote-peer-id":"adb3a32b44be819c","bytes":486880304,"size":"487 MB","error":"ioutil: short read"}
{"level":"info","ts":"2026-03-03T12:31:19.872133Z","caller":"etcdserver/server.go:2218","msg":"sent merged snapshot","from":"265ab714481be3b3","to":"adb3a32b44be819c","bytes":486880304,"size":"487 MB","took":"1.74165973s"}
{"level":"info","ts":"2026-03-03T12:31:20.129962Z","caller":"etcdserver/server.go:2200","msg":"sending merged snapshot","from":"265ab714481be3b3","to":"adb3a32b44be819c","bytes":486880304,"size":"487 MB"}
{"level":"info","ts":"2026-03-03T12:31:20.130151Z","caller":"rafthttp/snapshot_sender.go:84","msg":"sending database snapshot","snapshot-index":1427265,"remote-peer-id":"adb3a32b44be819c","bytes":486880304,"size":"487 MB"}

At the same time, the size of the etcd database file is much smaller than the 487 MB displayed in the log.

-rw-r----- 1 3001 2000      9235 Mar  3 14:50 0000000000000012-0000000000154a8e.snap
-rw-r----- 1 3001 2000      9235 Mar  3 14:53 0000000000000012-0000000000155e1b.snap
-rw-r----- 1 3001 2000      9235 Mar  3 15:08 0000000000000014-0000000000158eb8.snap
-rw-r----- 1 3001 2000      9234 Mar  3 15:12 0000000000000015-000000000015a241.snap
-rw-r----- 1 3001 2000      9235 Mar  3 15:15 0000000000000015-000000000015b5d3.snap
-rw------- 1 3001 2000 274079744 Mar  3 15:08 db

+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|        ENDPOINT         |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://111.9.0.26:2379 | 265ab714481be3b3 |  3.5.11 |  487 MB |      true |      false |        23 |    1427238 |            1427238 |        |
| https://111.9.0.27:2379 | adb3a32b44be819c |  3.5.11 |  414 MB |     false |      false |        20 |    1412792 |            1412792 |        |
| https://111.9.0.28:2379 | 7cec064e78275691 |  3.5.11 |  476 MB |     false |      false |        23 |    1427238 |            1427238 |        |
+-------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

What did you expect to happen?

The master etcd node successfully sends snapshots, and the cluster is running properly.

How can we reproduce it (as minimally and precisely as possible)?

N/A

Anything else we need to know?

No response

Etcd version (please run commands below)

Details
$ etcd --version
# paste output here

$ etcdctl version
# paste output here

3.5

Etcd configuration (command line flags or environment variables)

Details

paste your configuration here

Etcd debug information (please run commands below, feel free to obfuscate the IP address or FQDN in the output)

Details
$ etcdctl member list -w table
# paste output here

$ etcdctl --endpoints=<member list> endpoint status -w table
# paste output here

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions