Skip to content

Messages Lost with JetStream, work-queue Retention and R3 on max-deliver reached [v2.12.4, v2.12.3] #7817

@jgriegershs

Description

@jgriegershs

Observed behavior

An arbitrary number of dead-lettered messages is missing, i.e., unavailable in the worker queue.

Example for 200 sent messages with every 10th message explicitly not being n/ack'ed:

=== Analyzing API Monitor Logs ===

20 dead-lettered messages
Obtaining Stream stats

╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                Stream Report                                                │
├────────────┬─────────┬───────────┬───────────┬──────────┬────────┬──────┬─────────┬───────────┬─────────────┤
│ Stream     │ Storage │ Placement │ Consumers │ Messages │ Bytes  │ Lost │ Deleted │ API Level │ Replicas    │
├────────────┼─────────┼───────────┼───────────┼──────────┼────────┼──────┼─────────┼───────────┼─────────────┤
│ wq         │ Memory  │           │ 1         │ 9        │ 837 B  │ 0    │ 72      │ 0         │ s1*, s2, s3 │
│ monitoring │ Memory  │           │ 1         │ 229      │ 79 KiB │ 0    │ 0       │ 0         │ s1, s2*, s3 │
╰────────────┴─────────┴───────────┴───────────┴──────────┴────────┴──────┴─────────┴───────────┴─────────────╯

╭───────────────────────────────────────────────────────────────────────────╮
│                      9 Subjects in stream monitoring                      │
├───────────────────────────────────────────────────────────────────┬───────┤
│ Subject                                                           │ Count │
├───────────────────────────────────────────────────────────────────┼───────┤
│ $JS.EVENT.ADVISORY.CONSUMER.CREATED.monitoring.dlq-monitor        │ 1     │
│ $JS.EVENT.ADVISORY.STREAM.CREATED.monitoring                      │ 1     │
│ $JS.EVENT.ADVISORY.STREAM.LEADER_ELECTED.wq                       │ 1     │
│ $JS.EVENT.ADVISORY.CONSUMER.LEADER_ELECTED.monitoring.dlq-monitor │ 1     │
│ $JS.EVENT.ADVISORY.CONSUMER.CREATED.wq.c-wq-0                     │ 1     │
│ $JS.EVENT.ADVISORY.CONSUMER.LEADER_ELECTED.wq.c-wq-0              │ 1     │
│ $JS.EVENT.ADVISORY.STREAM.CREATED.wq                              │ 1     │
│ $JS.EVENT.ADVISORY.CONSUMER.MAX_DELIVERIES.wq.c-wq-0              │ 20    │
│ $JS.EVENT.ADVISORY.API                                            │ 204   │
╰───────────────────────────────────────────────────────────────────┴───────╯

  Checking seq 10 on stream wq.. NOT FOUND
  Checking seq 20 on stream wq.. NOT FOUND
  Checking seq 30 on stream wq.. NOT FOUND
  Checking seq 40 on stream wq.. NOT FOUND
  Checking seq 50 on stream wq.. NOT FOUND
  Checking seq 60 on stream wq.. NOT FOUND
  Checking seq 70 on stream wq.. NOT FOUND
  Checking seq 80 on stream wq.. NOT FOUND
  Checking seq 90 on stream wq.. NOT FOUND
  Checking seq 100 on stream wq.. NOT FOUND
  Checking seq 110 on stream wq.. NOT FOUND
  Checking seq 120 on stream wq.. OK
  Checking seq 130 on stream wq.. OK
  Checking seq 140 on stream wq.. OK
  Checking seq 150 on stream wq.. OK
  Checking seq 160 on stream wq.. OK
  Checking seq 170 on stream wq.. OK
  Checking seq 180 on stream wq.. OK
  Checking seq 190 on stream wq.. OK
  Checking seq 200 on stream wq.. OK

DNFs: 11 out of 20 dead-lettered messages missing

In this example run, 11 out of 20 messages are missing in the worker queue, even though being neither ack'ed nor TTL'ed.

See Test Results.

Expected behavior

The dead-lettered messages stay in the worker queue and one advisory per dead-lettered message is emitted.

Server and client version

Reproduced with server images:

  • nats:2.12.4-scratch
  • nats:2.12.4-alpine
  • nats:2.12.3-scratch
  • nats:2.12.3-alpine

Producer and NATS configuration: NATS CLI: v0.3.1

Consumer client:

  • github.com/nats-io/nats.go v1.48.0
  • go 1.25.7

See configuration in Test Setup Repository.

Host environment

Containerized contexts only

1. Windows 11

OS: Version 23H2 (OS Build 22631.6491)

1.1 Docker Desktop (Linux Docker VM on WSL2)

Client:
Version:           29.1.4
API version:       1.52
Go version:        go1.25.5
Git commit:        0e6fee6
Built:             Thu Jan  8 19:59:26 2026
OS/Arch:           windows/amd64
Context:           desktop-linux

Server: Docker Desktop 4.59.0 (217644)
Engine:
Version:          29.2.0
API version:      1.53 (minimum version 1.44)
Go version:       go1.25.6
Git commit:       9c62384
Built:            Mon Jan 26 19:26:07 2026
OS/Arch:          linux/amd64
Experimental:     false
containerd:
Version:          v2.2.1
GitCommit:        dea7da592f5d1d2b7755e3a161be07f43fad8f75
runc:
Version:          1.3.4
GitCommit:        v1.3.4-0-gd6d73eb8
docker-init:
Version:          0.19.0
GitCommit:        de40ad0

1.2 K2s K8s distro (Debian VM)

Debian node:

  • K8s v1.34.3
  • 12 CPUs
  • 16 GiB RAM
  • Debian GNU/Linux 12 (bookworm)
  • Kernel: 6.1.0-42-cloud-amd64
  • Runtime: cri-o://1.34.4

2. Linux Mint

OS:

System:
  Host: LMVM Kernel: 6.17.0-14-generic arch: x86_64 bits: 64 compiler: gcc
    v: 13.3.0 clocksource: tsc
  Desktop: Cinnamon v: 6.6.7 tk: GTK v: 3.24.41 wm: Muffin v: 6.6.3 vt: 7
    dm: LightDM v: 1.30.0 Distro: Linux Mint 22.3 Zena base: Ubuntu 24.04 noble

Docker:

Client: Docker Engine - Community
 Version:           29.2.1
 API version:       1.53
 Go version:        go1.25.6
 Git commit:        a5c7197
 Built:             Mon Feb  2 17:17:26 2026
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          29.2.1
  API version:      1.53 (minimum version 1.44)
  Go version:       go1.25.6
  Git commit:       6bc6209
  Built:            Mon Feb  2 17:17:26 2026
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v2.2.1
  GitCommit:        dea7da592f5d1d2b7755e3a161be07f43fad8f75
 runc:
  Version:          1.3.4
  GitCommit:        v1.3.4-0-gd6d73eb8
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Steps to reproduce

Please visit the GitHub repo nats-js-wq-issue for further details and scripts to reproduce the issue.

In short:

The setup looks as follows:

                                    NATS Cluster: "nats_cluster"
    ┌────────────────────────────────────────────────────────────────────────────────────────────────┐
    │                                                                                                │
    │       ┌───────────┐                   ┌───────────┐                   ┌───────────┐            │
    │       │    s1     │◄─────────────────►│    s2     │◄─────────────────►│    s3     │            │
    │       │  :4222    │                   │           │                   │           │            │
    │       │  :8222    │                   │           │                   │           │            │
    │       │  :6222    │                   │  :6222    │                   │  :6222    │            │
    │       └─────┬─────┘                   └─────┬─────┘                   └─────┬─────┘            │
    │             │                               │                               │                  │
    │             └───────────────────────────────┼───────────────────────────────┘                  │
    │                                             │                                                  │
    │                                      ┌──────┴──────┐                                           │
    │                                      │  JetStream  │                                           │
    │                                      └──────┬──────┘                                           │
    │                                             │                                                  │
    │             ┌───────────────────────────────┴───────────────────────────────┐                  │
    │             │                                                               │                  │
    │   ┌─────────┴─────────────────────────────────────────────┐    ┌────────────┴───────────────┐  │
    │   │ Stream "monitoring" | R=3 | memory                    │    │ Stream "wq" | R=3 | memory │  │
    │   │ subjects: $JS.EVENT.ADVISORY.>                        │    │ subjects: wq.*             │  │
    │   │ retention: limits                                     │    │ retention: work-queue      │  │
    │   └─────────────────────────┬─────────────────────────────┘    └──────────────────┬─────────┘  │
    │                             │                                                     │            │
    │   ┌─────────────────────────┴─────────────────────────────┐    ┌──────────────────┴─────────┐  │
    │   │ Consumer "dlq-monitor"                                │    │ Consumer "c-wq-0"          │  │
    │   │ filter: $JS.EVENT.ADVISORY.CONSUMER.MAX_DELIVERIES.>  │    │ filter: wq.0               │  │
    │   └───────────────────────────────────────────────────────┘    └────────────────────────────┘  │
    │                                                                                                │
    └────────────────────────────────────────────────────────────────────────────────────────────────┘
                  │
                  │ client port :4222
                  ▼
            ┌───────────┐
            │  Clients  │
            └───────────┘

Metadata

Metadata

Labels

defectSuspected defect such as a bug or regression

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions