Skip to content

Stream replicas stuck in UNSYNCED state leading to undeleted messages [v2.12.3] #7791

@superlevure

Description

@superlevure

Observed behavior

We have a stream stream-a with 3 replicas that sometimes get into the UNSYNCED state with one of the replicas keeping messages (that have lost any interest) around.

nats stream info stream-a:

Information for Stream stream-a created 2026-01-19 15:44:14

             Description: XX
                Subjects: a.b.c.d.e.*.g
                Replicas: 3
                 Storage: File

Options:

               Retention: Interest
         Acknowledgments: true
          Discard Policy: Old
        Duplicate Window: 2m0s
    Allows Batch Publish: false
         Allows Counters: false
       Allows Msg Delete: true
  Allows Per-Message TTL: false
            Allows Purge: true
        Allows Schedules: false
          Allows Rollups: false

Limits:

        Maximum Messages: unlimited
     Maximum Per Subject: unlimited
           Maximum Bytes: unlimited
             Maximum Age: 14d0h0m0s
    Maximum Message Size: unlimited
       Maximum Consumers: unlimited

Cluster Information:

                    Name: nats
           Cluster Group: S-R3F-U6PRz9cW
                  Leader: nats-0 (1d4h57m57s)
                 Replica: nats-1, current, seen 535ms ago
                 Replica: nats-2, current, seen 535ms ago

State:

            Host Version: 2.12.3
      Required API Level: 0 hosted at level 2
                Messages: 0
                   Bytes: 0 B
          First Sequence: 93,692,581
           Last Sequence: 93,692,580 @ 2026-02-03 10:12:30
        Active Consumers: 11

nats server stream-check --stream stream-a --csv:

Stream Replica,Raft,Account,Account ID,Node,Messages,Bytes,Subjects,Deleted,Consumers,First,Last,Status,Leader,Peers,
stream-a,S-R3F-U6PRz9cW,default,default,nats-0*,0,0,0,0,11,93692581,93692580,UNSYNCED,nats-0,"nats-2(current=true \,offline=false)      nats-1(current=true \,offline=false)",
stream-a,S-R3F-U6PRz9cW,default,default,nats-1,1,9720,1,9376255,11,84316325,93692580,UNSYNCED,nats-0,"nats-2(current=false\,offline=false)      nats-0(current=true \,offline=false)",
stream-a,S-R3F-U6PRz9cW,default,default,nats-2,0,0,0,0,11,93692581,93692580,UNSYNCED,nats-0,"nats-1(current=false\,offline=false)      nats-0(current=true \,offline=false)",

nats server consumer-check --stream stream-a --csv:

Consumer,Stream,Raft,Account,Account ID,Node,"Delivered (S\,C)","ACK Floor (S\,C)",Counters,Status,Leader,Stream Cluster Leader,Peers,
consumer-a,stream-a,C-R3F-31pNRJCA,default,default,nats-0*,"93692580 [93692581\, 93692580] 100% | 0",0 | 0,"(ap:0\, nr:0\, nw:5\, np:0)","EMPTY\, / INTERSECT",nats-0,nats-0,"nats-2(current=true \,offline=false)      nats-1(current=true \,offline=false)",
consumer-a,stream-a,C-R3F-31pNRJCA,default,default,nats-1,"93692580 [84316325\, 93692580] 100% | 0",0 | 0,"(ap:0\, nr:0\, nw:0\, np:0)","EMPTY\,",nats-0,nats-0,"nats-2(current=false\,offline=false)      nats-0(current=true \,offline=false)",
consumer-a,stream-a,C-R3F-31pNRJCA,default,default,nats-2,"93692580 [93692581\, 93692580] 100% | 0",0 | 0,"(ap:0\, nr:0\, nw:0\, np:0)","EMPTY\,",nats-0,nats-0,"nats-1(current=false\,offline=false)      nats-0(current=true \,offline=false)",
consumer-b,stream-a,C-R3F-3D86X3pe,default,default,nats-0,"93692580 [93692581\, 93692580] 100% | 16470483",92948054 | 16470483,"(ap:0\, nr:0\, nw:0\, np:0)",IN SYNC,nats-1,nats-0,"nats-2(current=false\,offline=false)      nats-1(current=true \,offline=false)",
consumer-b,stream-a,C-R3F-3D86X3pe,default,default,nats-1*,"93692580 [84316325\, 93692580] 100% | 16470483",92948054 | 16470483,"(ap:0\, nr:0\, nw:4\, np:0)",IN SYNC,nats-1,nats-0,"nats-2(current=true \,offline=false)      nats-0(current=true \,offline=false)",
consumer-b,stream-a,C-R3F-3D86X3pe,default,default,nats-2,"93692580 [93692581\, 93692580] 100% | 16470483",92948054 | 16470483,"(ap:0\, nr:0\, nw:0\, np:0)",IN SYNC,nats-1,nats-0,"nats-1(current=true \,offline=false)      nats-0(current=false\,offline=false)",
consumer-c,stream-a,C-R3F-45MdMs8B,default,default,nats-0*,"93692580 [93692581\, 93692580] 100% | 0",0 | 0,"(ap:0\, nr:0\, nw:4\, np:0)","EMPTY\, / INTERSECT",nats-0,nats-0,"nats-2(current=true \,offline=false)      nats-1(current=true \,offline=false)",
consumer-c,stream-a,C-R3F-45MdMs8B,default,default,nats-1,"93692580 [84316325\, 93692580] 100% | 0",0 | 0,"(ap:0\, nr:0\, nw:0\, np:0)","EMPTY\,",nats-0,nats-0,"nats-2(current=false\,offline=false)      nats-0(current=true \,offline=false)",
consumer-c,stream-a,C-R3F-45MdMs8B,default,default,nats-2,"93692580 [93692581\, 93692580] 100% | 0",0 | 0,"(ap:0\, nr:0\, nw:0\, np:0)","EMPTY\,",nats-0,nats-0,"nats-1(current=false\,offline=false)      nats-0(current=true \,offline=false)",
consumer-d,stream-a,C-R3F-GDR31AZn,default,default,nats-0*,"93692580 [93692581\, 93692580] 100% | 41013591",93647295 | 41013591,"(ap:0\, nr:0\, nw:5\, np:0)",IN SYNC / INTERSECT,nats-0,nats-0,"nats-2(current=true \,offline=false)      nats-1(current=true \,offline=false)",
consumer-d,stream-a,C-R3F-GDR31AZn,default,default,nats-1,"93692580 [84316325\, 93692580] 100% | 41013591",93647295 | 41013591,"(ap:0\, nr:0\, nw:0\, np:0)",IN SYNC,nats-0,nats-0,"nats-2(current=false\,offline=false)      nats-0(current=true \,offline=false)",
consumer-d,stream-a,C-R3F-GDR31AZn,default,default,nats-2,"93692580 [93692581\, 93692580] 100% | 41013591",93647295 | 41013591,"(ap:0\, nr:0\, nw:0\, np:0)",IN SYNC,nats-0,nats-0,"nats-1(current=false\,offline=false)      nats-0(current=true \,offline=false)",
consumer-e,stream-a,C-R3F-JilVDpuc,default,default,nats-0*,"93692580 [93692581\, 93692580] 100% | 13305090",93594853 | 13305090,"(ap:0\, nr:0\, nw:2\, np:0)",IN SYNC / INTERSECT,nats-0,nats-0,"nats-2(current=true \,offline=false)      nats-1(current=true \,offline=false)",
consumer-e,stream-a,C-R3F-JilVDpuc,default,default,nats-1,"93692580 [84316325\, 93692580] 100% | 13305090",93594853 | 13305090,"(ap:0\, nr:0\, nw:0\, np:0)",IN SYNC,nats-0,nats-0,"nats-2(current=false\,offline=false)      nats-0(current=true \,offline=false)",
consumer-e,stream-a,C-R3F-JilVDpuc,default,default,nats-2,"93692580 [93692581\, 93692580] 100% | 13305090",93594853 | 13305090,"(ap:0\, nr:0\, nw:0\, np:0)",IN SYNC,nats-0,nats-0,"nats-1(current=false\,offline=false)      nats-0(current=true \,offline=false)",
consumer-f,stream-a,C-R3F-MZYbu1ZB,default,default,nats-0*,"0 [93692581\, 93692580] 0  % | 0",0 | 0,"(ap:0\, nr:0\, nw:0\, np:0)","EMPTY\, / INTERSECT",nats-0,nats-0,"nats-2(current=true \,offline=false)      nats-1(current=true \,offline=false)",
consumer-f,stream-a,C-R3F-MZYbu1ZB,default,default,nats-1,"0 [84316325\, 93692580] 0  % | 0",0 | 0,"(ap:0\, nr:0\, nw:0\, np:0)","EMPTY\,",nats-0,nats-0,"nats-2(current=false\,offline=false)      nats-0(current=true \,offline=false)",
consumer-f,stream-a,C-R3F-MZYbu1ZB,default,default,nats-2,"0 [93692581\, 93692580] 0  % | 0",0 | 0,"(ap:0\, nr:0\, nw:0\, np:0)","EMPTY\,",nats-0,nats-0,"nats-1(current=false\,offline=false)      nats-0(current=true \,offline=false)",
consumer-g,stream-a,C-R3F-RqMl7jmD,default,default,nats-0,"93692580 [93692581\, 93692580] 100% | 13305090",93602883 | 13305090,"(ap:0\, nr:0\, nw:0\, np:0)",IN SYNC,nats-1,nats-0,"nats-2(current=false\,offline=false)      nats-1(current=true \,offline=false)",
consumer-g,stream-a,C-R3F-RqMl7jmD,default,default,nats-1*,"93692580 [84316325\, 93692580] 100% | 13305090",93602883 | 13305090,"(ap:0\, nr:0\, nw:4\, np:0)",IN SYNC,nats-1,nats-0,"nats-2(current=true \,offline=false)      nats-0(current=true \,offline=false)",
consumer-g,stream-a,C-R3F-RqMl7jmD,default,default,nats-2,"93692580 [93692581\, 93692580] 100% | 13305090",93602883 | 13305090,"(ap:0\, nr:0\, nw:0\, np:0)",IN SYNC,nats-1,nats-0,"nats-1(current=true \,offline=false)      nats-0(current=false\,offline=false)",
consumer-h,stream-a,C-R3F-Z0uSjhbt,default,default,nats-0*,"93692580 [93692581\, 93692580] 100% | 9027779",93682210 | 9027779,"(ap:0\, nr:0\, nw:3\, np:0)",IN SYNC / INTERSECT,nats-0,nats-0,"nats-2(current=true \,offline=false)      nats-1(current=true \,offline=false)",
consumer-h,stream-a,C-R3F-Z0uSjhbt,default,default,nats-1,"93692580 [84316325\, 93692580] 100% | 9027779",93682210 | 9027779,"(ap:0\, nr:0\, nw:0\, np:0)",IN SYNC,nats-0,nats-0,"nats-2(current=false\,offline=false)      nats-0(current=true \,offline=false)",
consumer-h,stream-a,C-R3F-Z0uSjhbt,default,default,nats-2,"93692580 [93692581\, 93692580] 100% | 9027779",93682210 | 9027779,"(ap:0\, nr:0\, nw:0\, np:0)",IN SYNC,nats-0,nats-0,"nats-1(current=false\,offline=false)      nats-0(current=true \,offline=false)",
consumer-i,stream-a,C-R3F-eFO8Adrd,default,default,nats-0,"93692580 [93692581\, 93692580] 100% | 93692547",93692580 | 93692547,"(ap:0\, nr:0\, nw:0\, np:0)",IN SYNC,nats-1,nats-0,"nats-2(current=false\,offline=false)      nats-1(current=true \,offline=false)",
consumer-i,stream-a,C-R3F-eFO8Adrd,default,default,nats-1*,"93692580 [84316325\, 93692580] 100% | 93692547",93692580 | 93692547,"(ap:0\, nr:0\, nw:2\, np:0)",IN SYNC,nats-1,nats-0,"nats-2(current=true \,offline=false)      nats-0(current=true \,offline=false)",
consumer-i,stream-a,C-R3F-eFO8Adrd,default,default,nats-2,"93692580 [93692581\, 93692580] 100% | 93692547",93692580 | 93692547,"(ap:0\, nr:0\, nw:0\, np:0)",IN SYNC,nats-1,nats-0,"nats-1(current=true \,offline=false)      nats-0(current=false\,offline=false)",
consumer-j,stream-a,C-R3F-neL5nT53,default,default,nats-0*,"93692580 [93692581\, 93692580] 100% | 1075030",93692580 | 1075030,"(ap:0\, nr:0\, nw:1\, np:0)",IN SYNC / INTERSECT,nats-0,nats-0,"nats-2(current=true \,offline=false)      nats-1(current=true \,offline=false)",
consumer-j,stream-a,C-R3F-neL5nT53,default,default,nats-1,"93692580 [84316325\, 93692580] 100% | 1075030",93692580 | 1075030,"(ap:0\, nr:0\, nw:0\, np:0)",IN SYNC,nats-0,nats-0,"nats-2(current=false\,offline=false)      nats-0(current=true \,offline=false)",
consumer-j,stream-a,C-R3F-neL5nT53,default,default,nats-2,"93692580 [93692581\, 93692580] 100% | 1075030",93692580 | 1075030,"(ap:0\, nr:0\, nw:0\, np:0)",IN SYNC,nats-0,nats-0,"nats-1(current=false\,offline=false)      nats-0(current=true \,offline=false)",
consumer-k,stream-a,C-R3F-z9m72F9o,default,default,nats-0*,"93692579 [93692581\, 93692580] 100% | 12803270",93682672 | 12803270,"(ap:0\, nr:0\, nw:6\, np:0)",IN SYNC / INTERSECT,nats-0,nats-0,"nats-2(current=true \,offline=false)      nats-1(current=true \,offline=false)",
consumer-k,stream-a,C-R3F-z9m72F9o,default,default,nats-1,"93692579 [84316325\, 93692580] 100% | 12803270",93682672 | 12803270,"(ap:0\, nr:0\, nw:0\, np:0)",IN SYNC,nats-0,nats-0,"nats-2(current=false\,offline=false)      nats-0(current=true \,offline=false)",
consumer-k,stream-a,C-R3F-z9m72F9o,default,default,nats-2,"93692579 [93692581\, 93692580] 100% | 12803270",93682672 | 12803270,"(ap:0\, nr:0\, nw:0\, np:0)",IN SYNC,nats-0,nats-0,"nats-1(current=false\,offline=false)      nats-0(current=true \,offline=false)",

Scaling down the stream to 1 replicas and then back to 3 solves the issue, until it happens again.

This issue was originally posted on slack: https://natsio.slack.com/archives/CM3T6T7JQ/p1765895249243939

Expected behavior

  • The stream's replicas state should stay IN SYNC
  • The stream should delete messages that have lost any interest

Server and client version

  • server version: 2.12.3

Host environment

Kubernetes based deployment, using the official helm chart.

Steps to reproduce

Unfortunately this issue happens ~ once a month and it's not clear to us how to reproduce it consistently.

Metadata

Metadata

Assignees

No one assigned

    Labels

    defectSuspected defect such as a bug or regression

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions