Prevent Vector from ingesting old log lines upon restart after a maintenance #18041
Replies: 4 comments 2 replies
-
How will you identify a vector restart after a maintenance vs a restart after a crash? |
Beta Was this translation helpful? Give feedback.
-
Great Question !!! So, the trick is having the
part outside of vector and in the maintenance workflow code. To be very specific to our use case which you are very familiar with @gurudeepdialpad :-) , I will have the code to delete the checkpointing files when we make the machine ready to start Vector in the Ansible playbook which only executes on a FRESH start. Nothing changes for crashes. When vector restarts after a crash it resumes from where it left which is the correct and intentional behaviour. |
Beta Was this translation helpful? Give feedback.
-
checkpoint takes preference here.
The main catch is that without the checkpoint file it's as if vector is "discovering" a new log file (from the config) upon (re)start so it will follow the "read_from" config. With the checkpoint, vector is NOT discovering and irrespective of the read_from config it follows the checkpoint. TL;DR |
Beta Was this translation helpful? Give feedback.
-
That sounds like a reasonable approach to me @atibdialpad ! Thanks for sharing. Maybe it'll help other Vector users with similar scenarios. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a use case where Vector is tailing logs from a file (file source) on a linux machine. Now, there are times when I transition the machine to under maintenance at which point we stop vector (and many other processes). Once the maintenance is done, Vector is restarted and the machine is marked operational.
Sometimes during the maintenance, there are lot of errors that gets logged in some of the log files vector was tailing, so when Vector comes back up after the maintenance it ingests all those old logs (due to the checkpoint memory) and these old (not important) logs cause false alarms in our monitoring system. The idea is "errors are expected when in maintenance and we do not want to ingest and alert on them"
To solve this issue, I am thinking of doing the following :
I am yet to test this but does that sound okay @jszwedko ?
Beta Was this translation helpful? Give feedback.
All reactions