How does disk buffer cleanup work in Vector when using EFS with stateless pods #24572
Replies: 2 comments
-
|
anyone has any idea? |
Beta Was this translation helpful? Give feedback.
-
|
@wangzhihaocom There is documentation in this file: lib/vector-buffers/src/variants/disk_v2/mod.rs My reading is that if an instance is terminated abruptly, i.e not with a graceful shutdown, then the data might be sitting in buffer files and not yet written to the destination. Depending on how you have acknowledgements, those events might be replayed and sent via another node or not. i.e. if the source has not yet acknowledged the data and it hasn't been deleted from the source, then another node would see those events and process them. Someone may need to confirm or deny that, but the documentation for how it works is in the file linked above. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Vector team
I’m trying to better understand how Vector disk buffers work internally, especially buffer cleanup behavior, when running Vector in Kubernetes with EFS-backed persistence and Spot instances.
Environment / Setup
Purpose:
Handle frequent pod termination (Spot interruptions, rescheduling)
Avoid data loss during abrupt shutdowns
Sinks: ClickHouse
Buffer type: disk
Acknowledgements: true
Each pod uses a pod-specific data_dir to avoid concurrent writers:
Filesystem layout (inside a running pod)
EFS directory structure
Observed Behavior:
Data is successfully written to ClickHouse
Incoming traffic has been stopped
However:
buffer-data-0.dat .This file size dose not decreasing,.. (Does this filie store all the buffer)
Disk usage does not immediately decrease
Files remain even when there is no more traffic
Questions
1. Disk buffer lifecycle
What exactly is stored in the buffer-data-0*.dat files? Do they contain all of the buffer data?
I stopped the traffic, but the file size did not decrease. Should the file size decrease when there is no incoming traffic?
2. Handling Spot instance termination
If a pod is terminated because the Spot instance is taken away, what is the recommended way to handle unflushed buffer data?
3. Current design approach
In my current setup:
When a Spot instance is terminated, the old pod is killed.
A new pod is created.
I copy the old pod’s buffer data from:
to the new pod’s directory:
The idea is that the new pod will continue flushing the old pod’s buffer data.
Is this the correct approach for handling buffer persistence when using Spot instances?
Any help is very appreciated
Beta Was this translation helpful? Give feedback.
All reactions