Skip to content

fix(buffers): buffer_id handling in buffer usage metrics reporting#23507

Merged
pront merged 6 commits intovectordotdev:masterfrom
vparfonov:issue-21702-buffer-id
Aug 6, 2025
Merged

fix(buffers): buffer_id handling in buffer usage metrics reporting#23507
pront merged 6 commits intovectordotdev:masterfrom
vparfonov:issue-21702-buffer-id

Conversation

@vparfonov
Copy link
Contributor

Summary

This change improves how the buffer_id is used in buffer usage metrics. It ensures the buffer_id is properly owned and has the right lifetime to be safely included as a label in emitted metrics.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>
@vparfonov vparfonov requested a review from a team as a code owner August 1, 2025 17:21
@thomasqueirozb thomasqueirozb changed the title fix(buffers):buffer_id handling in buffer usage metrics reporting fix(buffers): buffer_id handling in buffer usage metrics reporting Aug 1, 2025
Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>
@bruceg
Copy link
Member

bruceg commented Aug 1, 2025

I've dug into the counter data usage here, and I'm wondering if the whole BUFFER_COUNTERS data couldn't be moved into the async reporting loop in BufferUsage::install, where it would not need any index on the buffer_id. It might not even need atomics, since there would not be conflicting usage.

Thinking further, this would suggest those values could be lifted into struct CategoryMetrics, with the update mechanism integrated into the consume method there.

What do you think?

@vparfonov
Copy link
Contributor Author

I've dug into the counter data usage here, and I'm wondering if the whole BUFFER_COUNTERS data couldn't be moved into the async reporting loop in BufferUsage::install, where it would not need any index on the buffer_id. It might not even need atomics, since there would not be conflicting usage.

Thinking further, this would suggest those values could be lifted into struct CategoryMetrics, with the update mechanism integrated into the consume method there.

What do you think?

Maybe it makes sense — the main win would be removing AtomicI64.
But in that case, metrics would only update every 2 seconds in the reporting loop, instead of on each event.
Not sure if that trade‑off is worth it.

@bruceg
Copy link
Member

bruceg commented Aug 1, 2025

Maybe it makes sense — the main win would be removing AtomicI64.

Actually, the main I was thinking of is removing the global Dashmap, which removes a layer of locking and hashing.

But in that case, metrics would only update every 2 seconds in the reporting loop, instead of on each event. Not sure if that trade‑off is worth it.

If I'm not mistaken, the events that update the counters are only emitted from the reporting loop, and so such a move does not change the behavior. Are the counters updated elsewhere?

@vparfonov
Copy link
Contributor Author

Maybe it makes sense — the main win would be removing AtomicI64.

Actually, the main I was thinking of is removing the global Dashmap, which removes a layer of locking and hashing.

But in that case, metrics would only update every 2 seconds in the reporting loop, instead of on each event. Not sure if that trade‑off is worth it.

If I'm not mistaken, the events that update the counters are only emitted from the reporting loop, and so such a move does not change the behavior. Are the counters updated elsewhere?

i mean this one:

/// Metrics are reported every 2 seconds.

@bruceg
Copy link
Member

bruceg commented Aug 1, 2025

Right. I was saying that that loop is the only place that the metrics I am referring to are emitted. That is, it appears they are currently only emitted every two seconds, and I didn't see anything in the changes here or previously that alter that behavior.

@vparfonov
Copy link
Contributor Author

Right. I was saying that that loop is the only place that the metrics I am referring to are emitted. That is, it appears they are currently only emitted every two seconds, and I didn't see anything in the changes here or previously that alter that behavior.

Ok, let me review the code again to be sure. From what I understand, since metrics are only emitted every 2 seconds (regardless of when the underlying counts change), we could move the counters inside the loop. I need some time to figured out how to avoid using HashMap. It does sound reasonable and should make code better
I’ll think more about how to wire that up — thanks a lot for clarifying, I really appreciate the explanation.

@bruceg
Copy link
Member

bruceg commented Aug 3, 2025

To be clear, I think this PR addresses a real deficiency in the solution to #21702 that was merged in #23453, and as such it should possibly be merged even with the above deficiencies. I just think there is a much cleaner solution hiding in the combination of these two PRs.

@vparfonov
Copy link
Contributor Author

To be clear, I think this PR addresses a real deficiency in the solution to #21702 that was merged in #23453, and as such it should possibly be merged even with the above deficiencies. I just think there is a much cleaner solution hiding in the combination of these two PRs.

@bruceg
Yes, I agree — let’s move forward with this. As a next step, your proposal sounds reasonable. Since it involves more significant changes, I’ll open a draft PR with my proposal so we can continue the discussion there.

Please take a look #23518

Copy link
Member

@pront pront left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per #23507 (comment) - approving this to make sure we have the fix in before the next release. I do agree this PR #23518 addresses the issue in a better/cleaner way. We can obviously review that PR independently later.

Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>
@pront pront added the no-changelog Changes in this PR do not need user-facing explanations in the release changelog label Aug 5, 2025
@pront pront enabled auto-merge August 5, 2025 13:29
@pront pront added this pull request to the merge queue Aug 6, 2025
Merged via the queue into vectordotdev:master with commit a1aeae8 Aug 6, 2025
54 checks passed
@pront pront deleted the issue-21702-buffer-id branch August 6, 2025 12:54
vparfonov added a commit to vparfonov/vector that referenced this pull request Aug 6, 2025
…ectordotdev#23507)

* buffer_id handling in buffer usage metrics reporting

Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>

* fix 'cargo clippy' warn

Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>

* add changelog fragments to changelog.d

Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>

* cargo fmt

* remove changelog - no need to explain bug fix for unreleased change

---------

Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
openshift-merge-bot bot pushed a commit to ViaQ/vector that referenced this pull request Aug 7, 2025
…ectordotdev#23507)

* buffer_id handling in buffer usage metrics reporting

Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>

* fix 'cargo clippy' warn

Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>

* add changelog fragments to changelog.d

Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>

* cargo fmt

* remove changelog - no need to explain bug fix for unreleased change

---------

Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
vparfonov added a commit to vparfonov/vector that referenced this pull request Aug 7, 2025
…ectordotdev#23507)

* buffer_id handling in buffer usage metrics reporting

Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>

* fix 'cargo clippy' warn

Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>

* add changelog fragments to changelog.d

Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>

* cargo fmt

* remove changelog - no need to explain bug fix for unreleased change

---------

Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
thomasqueirozb added a commit that referenced this pull request Aug 7, 2025
@pront pront added domain: buffers Anything related to Vector's memory/disk buffers domain: metrics Anything related to Vector's metrics events labels Aug 8, 2025
vparfonov added a commit to vparfonov/vector that referenced this pull request Aug 11, 2025
…ectordotdev#23507)

* buffer_id handling in buffer usage metrics reporting

Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>

* fix 'cargo clippy' warn

Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>

* add changelog fragments to changelog.d

Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>

* cargo fmt

* remove changelog - no need to explain bug fix for unreleased change

---------

Signed-off-by: Vitalii Parfonov <vparfono@redhat.com>
Co-authored-by: Pavlos Rontidis <pavlos.rontidis@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: buffers Anything related to Vector's memory/disk buffers domain: metrics Anything related to Vector's metrics events no-changelog Changes in this PR do not need user-facing explanations in the release changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants