Skip to content

chore(buffers): Rework gauge metric handling#23561

Merged
pront merged 22 commits intomasterfrom
bruceg/rework-buffer-counters
Aug 18, 2025
Merged

chore(buffers): Rework gauge metric handling#23561
pront merged 22 commits intomasterfrom
bruceg/rework-buffer-counters

Conversation

@bruceg
Copy link
Member

@bruceg bruceg commented Aug 8, 2025

Summary

This is a step-by-step refactoring of the changes made in #23453 and #23507 in order to remove the need for the shared BUFFER_COUNTERS dashmap. While the whole change is rather large, I attempted to keep each commit from altering externally-visible behavior, with the exception of additional labels in the buffer create gauges. As such, reviewing each individual commit is likely more useful than reviewing the whole code change.

If you compare this PR to the code base starting immediately before the two PRs referenced above (have to manually scroll down to the buffer source files, the direct GitHub link doesn't quite work), the improvements are evident:

  1. The gauges now have an additional buffer_id label to disambiguate the metrics with the same ID but different stages. Note that the buffers are created in the context of a component and, as such, should (may?) already have the component ID in their tags. If not, this is a real bug, maybe even the real bug. Since the buffer_id corresponds to a component ID, maybe that particular tag name is less than ideal.
  2. The buffer usage tracker also maintains a current values set, allowing us to set the gauges instead of incrementing them, resulting in more reliable values there.
  3. All the counters are updated using a saturating_add, which prevents them from overflowing, and a clamp, which prevents them from going negative.

Note: This is a rework of #23542 to fix some bugs I introduced in my line of changes, so see that PR for more context.

Vector configuration

N/A

How did you test this PR?

Tested using the unit tests, all of the above should be behavior-preserving.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Notes

  • Please read our Vector contributor resources.
  • Do not hesitate to use @vectordotdev/vector to reach out to us regarding this PR.
  • Some CI checks run only after we manually approve them.
    • We recommend adding a pre-push hook, please see this template.
    • Alternatively, we recommend running the following locally before pushing to the remote branch:
      • cargo fmt --all
      • cargo clippy --workspace --all-targets -- -D warnings
      • cargo nextest run --workspace (alternatively, you can run cargo test --all)
  • After a review is requested, please avoid force pushes to help us review incrementally.
    • Feel free to push as many commits as you want. They will be squashed into one before merging.
    • For example, you can run git merge origin master and git push.
  • If this PR introduces changes Vector dependencies (modifies Cargo.lock), please
    run cargo vdev build licenses to regenerate the license inventory and commit the changes (if any). More details here.

bruceg added 14 commits August 8, 2025 10:39
Note that the tests on the state of the gauges have been dropped as they now
would only test the proper operation of the metrics recorder.
Corresponds to the existing `buffer_discarded_events_total` counter and
`buffer_byte_size` gauge.
The "safe" conversion from u64 values into f64 clamped integers that cannot be
exactly represented to the maximum safe integer value. These converted values
were always used to `set` a gauge value. When looking at that gauge, however,
retaining the magnitude is more important than keeping it exactly representable
as an integer, so the safe conversion is actually lossy in the wrong direction.
@bruceg bruceg requested a review from a team as a code owner August 8, 2025 16:49
@bruceg bruceg added type: tech debt A code change that does not add user value. domain: buffers Anything related to Vector's memory/disk buffers no-changelog Changes in this PR do not need user-facing explanations in the release changelog labels Aug 8, 2025
@bruceg bruceg requested a review from Copilot August 8, 2025 16:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This pull request refactors the gauge metric handling in the buffer system to eliminate the need for a shared BUFFER_COUNTERS DashMap, improving reliability and adding better labeling. The changes shift from delta-based gauge updates to absolute value tracking using a current metric state.

Key changes:

  • Removes the shared BUFFER_COUNTERS DashMap and replaces it with absolute value tracking
  • Adds buffer_id labels to buffer gauge metrics for better disambiguation
  • Implements thread-safe counter updates with overflow protection using saturating arithmetic
  • Introduces a current state tracker in BufferUsageData to maintain accurate gauge values

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
lib/vector-buffers/src/lib.rs Removes the now-unused cast_utils module reference
lib/vector-buffers/src/internal_events.rs Refactors events to include buffer_id labels and use absolute gauge values instead of delta updates
lib/vector-buffers/src/cast_utils.rs Removes the entire file containing safe casting utilities that are no longer needed
lib/vector-buffers/src/buffer_usage_data.rs Adds current state tracking and thread-safe counter updates with overflow protection

Copy link
Contributor

@graphcareful graphcareful left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just want to leave a note, this PR was super easy to review when looking at it commit at a time.

@vparfonov
Copy link
Contributor

LGTM, I closed my PR, this solution looks better.

@pront
Copy link
Member

pront commented Aug 13, 2025

LGTM, I closed my PR, this solution looks better.

@vparfonov perhaps you can help us verify with your setup that this PR yields the same metrics as origin/master? Thanks!

@bruceg bruceg requested review from pront and vparfonov and removed request for vparfonov August 13, 2025 17:04
@datadog-vectordotdev
Copy link

datadog-vectordotdev bot commented Aug 13, 2025

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: de17764 | Docs | Was this helpful? Give us feedback!

Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. It would also be nice to get validation per #23561 (comment)

@vparfonov
Copy link
Contributor

LGTM. It would also be nice to get validation per #23561 (comment)

Sure, I will test tomorrow morning and back here with results

@vparfonov
Copy link
Contributor

vparfonov commented Aug 14, 2025

@bruceg @pront I've tested this changes, works just fine. Thanks

@bruceg
Copy link
Member Author

bruceg commented Aug 14, 2025

@vparfonov this testing you are able to do, would there be any way that could be worked into a unit test? It would be good to add that as evidence things are working as intended.

@pront pront enabled auto-merge August 18, 2025 17:44
@pront
Copy link
Member

pront commented Aug 18, 2025

@bruceg @pront I've tested this changes, works just fine. Thanks

👍 thanks

@vparfonov this testing you are able to do, would there be any way that could be worked into a unit test? It would be good to add that as evidence things are working as intended.

we can do it as a follow-up if possible.

@pront pront added this pull request to the merge queue Aug 18, 2025
Merged via the queue into master with commit efe2b0c Aug 18, 2025
86 checks passed
@pront pront deleted the bruceg/rework-buffer-counters branch August 18, 2025 18:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: buffers Anything related to Vector's memory/disk buffers no-changelog Changes in this PR do not need user-facing explanations in the release changelog type: tech debt A code change that does not add user value.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants