You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Vector runs in K8S as our internal log ingestor/aggregator/normalizer, service is behind a metallb LoadBalancer IP. The normalized events are sent to Kafka and beyond after normalization.
We ran into an issue when ingesting events through Socket/UDP when we upgraded from 0.43.1 to 0.51.1.
On our current load, we had around ~15-17k events per second (Vector's internal metrics for received events on component_kind="source" and Kafka metrics for the sink's topic align on the numbers), after upgrading with the same config in place, we only see ~6-8k eps in both, so pretty much a 2x drop in both Vector & Kafka metrics. From some testing, we noticed that this issue appears starting with 0.46.0.
Extra info:
We looked at the container logs, enabled debug, but couldn't pin down any useful errors/traces to create an issue.
We tested a bunch of versions (starting with 0.45.0 up to 0.53.0), everything looks good on 0.45.0, then the "issue" starts on 0.46.0 and onwards.
We stripped the config down to just the sources and sinks (only metrics).
The containers typically never reach even ~50% resource usage
We thought it could be a possible cluster/node issue at first (investigation still on-going), but switching back and forth between 0.43.1 and the later versions always produces the same results.
Have there been any changes in 0.46.0 and onwards that could have this effect? I realize the shortcomings of using UDP right now, but we don't have a choice at this time.
Could the containers be dropping packets silently, even though there is no serious load?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Question
Vector runs in K8S as our internal log ingestor/aggregator/normalizer, service is behind a metallb LoadBalancer IP. The normalized events are sent to Kafka and beyond after normalization.
We ran into an issue when ingesting events through Socket/UDP when we upgraded from
0.43.1to0.51.1.On our current load, we had around ~15-17k events per second (Vector's internal metrics for received events on
component_kind="source"and Kafka metrics for the sink's topic align on the numbers), after upgrading with the same config in place, we only see ~6-8k eps in both, so pretty much a 2x drop in both Vector & Kafka metrics. From some testing, we noticed that this issue appears starting with 0.46.0.Extra info:
0.45.0up to0.53.0), everything looks good on0.45.0, then the "issue" starts on0.46.0and onwards.We thought it could be a possible cluster/node issue at first (investigation still on-going), but switching back and forth between
0.43.1and the later versions always produces the same results.Have there been any changes in
0.46.0and onwards that could have this effect? I realize the shortcomings of using UDP right now, but we don't have a choice at this time.Could the containers be dropping packets silently, even though there is no serious load?
Vector Config
No response
Vector Logs
No response
Beta Was this translation helpful? Give feedback.
All reactions