-
Notifications
You must be signed in to change notification settings - Fork 2k
Labels
type: bugA code related bug.A code related bug.
Description
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
When using a Kafka source with end-to-end acknowledgements enabled, events that are rejected by the sink (e.g., HTTP 401/403 authentication errors) are permanently lost instead of being reprocessed or sent to a dead letter queue.
Impact
- Severity: Critical - Data loss
- Affected versions: v0.52.0, v0.53.0 (likely earlier versions too)
- Production impact: Confirmed data loss in production environment with Microsoft Sentinel integration
Reproduction
A complete, working reproduction setup is available in the reproduction/ directory:
- Automated test script that demonstrates the bug
- Docker Compose environment with Kafka, Vector, and mock HTTP sink
- Successfully reproduces the bug: 1 out of 3 messages permanently lost
Test Results:
Expected: 3 messages sent (IDs 1, 2, 3)
Actual: 2 messages received (IDs 1, 3)
Result: Message 2 (rejected with 401) was PERMANENTLY LOST
Root Cause
The bug is in src/sources/kafka.rs (lines 622-628):
ack = ack_stream.next() => match ack {
Some((status, entry)) => {
if status == BatchStatus::Delivered
&& let Err(error) = consumer.store_offset(&entry.topic, entry.partition, entry.offset) {
emit!(KafkaOffsetUpdateError { error });
}
}
}The problem:
- Only
BatchStatus::Deliveredevents triggerconsumer.store_offset() - Rejected events (401/403) do NOT store their offsets (correct)
- BUT subsequent successful events DO store their offsets
- Kafka's auto-commit (every 5 seconds) commits the latest stored offset
- This effectively skips over the rejected messages
- Result: Rejected messages are permanently lost
Expected Behavior
When acknowledgements are enabled, rejected events should either:
- Be sent to a Dead Letter Queue (DLQ) for manual review
- Be reprocessed with backoff
- At minimum, NOT advance the Kafka offset past rejected events
Configuration
See reproduction setup at:
https://github.com/yoelk/vector/tree/kafka-data-loss-reproduction/reproduction
Version
0.52.0
Debug Output
Full debug outputs of a single run can be seen here:
https://github.com/yoelk/vector/blob/kafka-data-loss-reproduction/reproduction/reproduction_debug_output.txt
Example Data
No response
Additional Context
No response
References
No response
Metadata
Metadata
Assignees
Labels
type: bugA code related bug.A code related bug.