compute-domain-daemon: skip SIGUSR1 when DNS mapping update only removes nodes#919
compute-domain-daemon: skip SIGUSR1 when DNS mapping update only removes nodes#919AkshatDudeja77 wants to merge 4 commits intoNVIDIA:mainfrom
Conversation
Signed-off-by: AkshatDudeja77 <akshat.dudeja77@gmail.com>
a620c5d to
f8ad4c0
Compare
That's a great goal. But how do we detect this situation with sets being our central data structures? Write two sets down, on a piece of paper. This could look like this: What do you need to do in order to find out if set A has an item removed compared to set B? Do you need to look at more than just set size? I genuinely wonder what your goals and incentives are. This is I think your sixth pull request of this style against this repository, and I believe we need to think about providing candid feedback soon :). These are wild times! |
jgehrcke
left a comment
There was a problem hiding this comment.
logically incorrect, as expressed in regular comment
|
For context, I still find it quite likely that this is AI-generated noise. Last time I looked into this, I noted: |
|
@jgehrcke thank you for your point about comparing sets rather than just sizes, I finally do understand what you're saying. My intention with the current change was only to avoid sending SIGUSR1 when the update strictly removes nodes (i.e. when the mapping shrinks). In cases where For transparency: I used AI to draft the initial idea here, but I’ve reviewed and adjusted the code and tests myself. |
…ves nodes (fixes NVIDIA#913) Signed-off-by: AkshatDudeja77 <akshat.dudeja77@gmail.com>
f8ad4c0 to
a7ddddf
Compare
|
@AkshatDudeja77 Can you write some unit testcases around it |
Signed-off-by: AkshatDudeja77 <akshat.dudeja77@gmail.com>
13461a6 to
e40c7ed
Compare
@rajatchopra Refactored the SIGUSR1 decision into a helper and added unit tests covering no-change, pure-removal, additions, replacements, remove+add, and fresh-process cases. This now explicitly checks for newly added peers rather than relying on size or subset assumptions. Please let me know if there are additional edge cases you'd like covered |
internal/common/util.go
Outdated
| // output didn't work. | ||
| func StartDebugSignalHandlers() { | ||
| go func() { | ||
| if runtime.GOOS == "windows" { |
There was a problem hiding this comment.
@AkshatDudeja77 As commented previously by @jgehrcke @rajatchopra please remove this change.
There was a problem hiding this comment.
Removed the unrelated Windows-specific change from this PR.
| // addresses compared to the old set (then we don't need to force | ||
| // the daemon to re-resolve & re-connect). | ||
| if !updated || fresh { | ||
| if !shouldSendSIGUSR1(oldIPs, dnsNameManager.ipToDNSName, updated, fresh) { |
There was a problem hiding this comment.
Please retain rest of the comment here other than the TODO part and update based on the current logic.
There was a problem hiding this comment.
Updated the comment to reflect the current logic and removed the outdated TODO.
…ge, update comment) Signed-off-by: AkshatDudeja77 <akshat.dudeja77@gmail.com>
| return false | ||
| } | ||
|
|
||
| for ip := range newIPs { |
There was a problem hiding this comment.
Let's use IPSet.Diff() here:
It returns added and removed, and we can check if added is length 0, and removed is larger than zero.
| } | ||
| } | ||
|
|
||
| func shouldSendSIGUSR1(oldIPs, newIPs map[string]struct{}, updated, fresh bool) bool { |
There was a problem hiding this comment.
Add a docstring that describes what fresh and updated mean -- Before moving code, from the context, it was vaguely clear what they represent. Here, it's not at all obvious.
Fixes #913
The IMEXDaemonUpdateLoopWithDNSNames currently sends SIGUSR1 to the IMEX
daemon whenever the DNS/IP mapping changes.
However, when the update only removes nodes (and does not add new ones),
forcing the daemon to re-resolve and reconnect is unnecessary.
This change tracks the previous mapping size and skips sending SIGUSR1 when
the updated mapping strictly removes nodes.
Before:
SIGUSR1 triggered whenever the mapping changed.
After:
SIGUSR1 triggered only when nodes are added or mappings otherwise change,
but not when nodes are only removed.
This implements the TODO in the code suggesting that reconnects can be
skipped when the new IP set only removes addresses compared to the old set.