Skip to content

[BUG] Kafka source loses messages when sink returns 401/403 with acknowledgements enabled #24543

@yoelk

Description

@yoelk

A note for the community

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Problem

When using a Kafka source with end-to-end acknowledgements enabled, events that are rejected by the sink (e.g., HTTP 401/403 authentication errors) are permanently lost instead of being reprocessed or sent to a dead letter queue.

Impact

  • Severity: Critical - Data loss
  • Affected versions: v0.52.0, v0.53.0 (likely earlier versions too)
  • Production impact: Confirmed data loss in production environment with Microsoft Sentinel integration

Reproduction

A complete, working reproduction setup is available in the reproduction/ directory:

  • Automated test script that demonstrates the bug
  • Docker Compose environment with Kafka, Vector, and mock HTTP sink
  • Successfully reproduces the bug: 1 out of 3 messages permanently lost

Test Results:

Expected: 3 messages sent (IDs 1, 2, 3)
Actual:   2 messages received (IDs 1, 3)
Result:   Message 2 (rejected with 401) was PERMANENTLY LOST

Root Cause

The bug is in src/sources/kafka.rs (lines 622-628):

ack = ack_stream.next() => match ack {
    Some((status, entry)) => {
        if status == BatchStatus::Delivered
            && let Err(error) = consumer.store_offset(&entry.topic, entry.partition, entry.offset) {
                emit!(KafkaOffsetUpdateError { error });
            }
    }
}

The problem:

  1. Only BatchStatus::Delivered events trigger consumer.store_offset()
  2. Rejected events (401/403) do NOT store their offsets (correct)
  3. BUT subsequent successful events DO store their offsets
  4. Kafka's auto-commit (every 5 seconds) commits the latest stored offset
  5. This effectively skips over the rejected messages
  6. Result: Rejected messages are permanently lost

Expected Behavior

When acknowledgements are enabled, rejected events should either:

  1. Be sent to a Dead Letter Queue (DLQ) for manual review
  2. Be reprocessed with backoff
  3. At minimum, NOT advance the Kafka offset past rejected events

Configuration

See reproduction setup at: 
https://github.com/yoelk/vector/tree/kafka-data-loss-reproduction/reproduction

Version

0.52.0

Debug Output

Full debug outputs of a single run can be seen here:
https://github.com/yoelk/vector/blob/kafka-data-loss-reproduction/reproduction/reproduction_debug_output.txt

Example Data

No response

Additional Context

No response

References

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    type: bugA code related bug.

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions