Skip to content

Conversation

@santhoshmprabhu
Copy link
Contributor

Reason for Change:
We previously added fetch of secondary IPs to the NMAgent client in CNS. This PR makes the fetch asynchronous, with the ability to adjust the frequency dynamically. Specifically,

  1. Min and max interval between fetches can be configured
  2. Frequency is cut in half (interval doubled) if no diff is seen, up to the max interval.
  3. Min interval is used as soon as a diff is seen

This is implemented using a time Ticker, but we wrap it with a wrapper so as to mock it for tests. This PR also includes some additional checks for empty responses from the nmagent (this was raised in this comment). Also have UTs.

Issue Fixed:
NA

Requirements:

Notes:

@santhoshmprabhu santhoshmprabhu added cns Related to CNS. go Pull requests that update Go code labels Sep 19, 2024
@santhoshmprabhu santhoshmprabhu self-assigned this Sep 19, 2024
@santhoshmprabhu santhoshmprabhu requested a review from a team as a code owner September 19, 2024 21:28
@santhoshmprabhu santhoshmprabhu changed the title Sanprabhu/cilium node subnet intelligent refresh Make throttling nmagent fetches for nodesubnet more dynamic Sep 19, 2024
@santhoshmprabhu
Copy link
Contributor Author

/azp run Azure Container Networking PR

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@rbtr rbtr requested a review from timraymond September 20, 2024 20:48
@santhoshmprabhu
Copy link
Contributor Author

/azp run Azure Container Networking PR

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@timraymond timraymond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm going to recant some of what I said in the last PR now that I understand why we're doing this debouncing. As I mentioned during the design review, rate limits and wireserver are a fact of life, so it now feels weird to have arbitrary logic treating the nmagent client carefully because it might blow our rate limit. The design intention of the NMAgent client was always that users can do nmagent.DoSomethingForMe() and it will block while it deals with all of the weirdness that happens with NMAgent and Wireserver. This includes async response handling... and I'm thinking it should also include respecting rate limits. If the request is going to blow the rate limit, the WireserverTransport in the NMAgent client needs to block that request until the request will succeed. Otherwise, we're just setting the user up for failure.

Doing this with tickers is definitely the right way (in particular the whole ticker resetting thing is how I've done respecting rate limits in the past). However, it should be done in https://github.com/Azure/azure-container-networking/blob/master/nmagent/internal/wireserver.go so that it's shared across the client.

@santhoshmprabhu santhoshmprabhu requested a review from a team as a code owner September 24, 2024 18:50
@santhoshmprabhu
Copy link
Contributor Author

/azp run Azure Container Networking PR

@santhoshmprabhu
Copy link
Contributor Author

/azp run Azure Container Networking PR

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@santhoshmprabhu
Copy link
Contributor Author

/azp run Azure Container Networking PR

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@santhoshmprabhu
Copy link
Contributor Author

/azp run Azure Container Networking PR

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@santhoshmprabhu
Copy link
Contributor Author

/azp run Azure Container Networking PR

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

timraymond
timraymond previously approved these changes Oct 9, 2024
@santhoshmprabhu santhoshmprabhu added this pull request to the merge queue Oct 9, 2024
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 9, 2024
@santhoshmprabhu santhoshmprabhu added this pull request to the merge queue Oct 14, 2024
@santhoshmprabhu santhoshmprabhu removed this pull request from the merge queue due to a manual request Oct 14, 2024
@santhoshmprabhu
Copy link
Contributor Author

/azp run Azure Container Networking PR

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@santhoshmprabhu santhoshmprabhu added this pull request to the merge queue Oct 14, 2024
Merged via the queue into master with commit b5046a0 Oct 14, 2024
14 checks passed
@santhoshmprabhu santhoshmprabhu deleted the sanprabhu/cilium-node-subnet-intelligent-refresh branch October 14, 2024 23:45
sivakami-projects pushed a commit that referenced this pull request Oct 23, 2025
* feat(CNS): Early work on better throttling in NMAgent fetch for nodesubnet

* feat(CNS): Update NMAgent fetches to be async with binary exponential backoff

* chore: check for empty nmagent response

* test: update test for empty response

* style: make linter happy

* chore: fix some comments

* fix: Fix bug in refresh

* refactor: Address comments

* refactor: ignore primary ip

* refactor: move refresh out of ipfetcher

* test: add ip fetcher tests

* fix: remove broken import

* fix: fix import

* fix: fix linting

* fix: fix some failing tests

* chore: Remove unused function

* test: test updates

* fix: address comments

* chore: add missed file

* chore: add comment about static interval

* feat: address Evan's comment to require Equal method on cached results

* chore: add missed file

* feat: more efficient equality

* refactor: address Evan's comment

* refactor: address Tim's comment

* fix: undo accidental commit

* fix: make linter happy

* fix: make linter happy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cns Related to CNS. go Pull requests that update Go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants