prevent duplicate targets after OTel Collector Prometheus relabeling#4381
Conversation
|
Hi @swiatekm @jaronoff97 This is the bugfix-related PR that I've split out from the original changes. During testing, I noticed that the unit test exhibits flaky behavior - it sometimes passes and sometimes fails. I suspect this is due to race conditions caused by asynchronous event processing and cache synchronization delays.If this flaky test issue persists during your review, would it be appropriate to include a fix for this unit test in this PR as well? |
If you are able to figure out why it fails, I'd love a fix! |
|
Hello @swiatekm Evidence: |
|
@mike9421 Feel free to include that timeout increase in this PR, then, and we'll see if it helps universally. |
swiatekm
left a comment
There was a problem hiding this comment.
This largely looks good to me. What I'd like to understand as well is the performance impact. Can you run this benchmark first on main, and then on this PR, and post the benchstat comparison results?
Hi @swiatekm I've run benchmark tests before and after the changes to compare performance impact: Before changes: a7855fa8.txt Summary:
Benchmark output:
|
|
I did 10 runs of both main and this branch, here's the benchstat: So we have around a 15% increase in CPU usage and memory for targets with relabeling enabled. This is a fair amount, but considering this is fixing a bug and we're going to improve the overall efficiency of the whole system by doing target relabeling in the target allocator in a subsequent change, I think this is acceptable. WDYT @jaronoff97 @pavolloffay ? |
|
This issue is rendering Target Allocator unusable for us as well, could you please review this MR or propose a workaround in the meantime? @swiatekm |
There is no workaround other than not rewriting the address label. The bug only happens as a result of that. For this PR, I've already approved it, but it has a performance impact, so I'd like opinions from my fellow maintainers before merging it. The earliest it might be released is two weeks from now, in |
jaronoff97
left a comment
There was a problem hiding this comment.
I think the perf hit is worth the actual functionality fix. We've done a fair amount of work in making the TA more efficient, so I think a minor regression is acceptable.
Description:
Extract hash calculation fixes into a separate PR
Link to tracking Issue(s):
Testing:
Documentation: