Skip to content

Commit ebddca1

Browse files
perf: [NPM] [LINUX] add NetPols in background (#1969)
* wip: apply dirty NetPols every 500ms in Linux * only build npm linux image * fix: check for empty cache * feat: toggle for netpol interval. default 500 ms * ci: remove stages "build binaries" and "run windows tests" * wip: max batched netpols (toggle-specified) * ci: remove manifest build/push for win npm * wip: handle ipset deletion properly and max batch for delete too * fix: correct remove policy * fix: only remove policy if it was in kernel * finalize toggles, allowing ability to turn off iptablesInBackground * ci: conf + cyc use PR's configmaps * fix: lints * fix dp toggle: iptablesInBackground * fix lock typo and config logging * fix background thread. add comments. only add tmp ref when enabled * copy pod selector list * fix: removepolicy needs namespace too * rename opInfo to event * fix: fix references and prevent concurrent map read/write * tmp: debug logging * fix: missing set references by swap keys and values * Revert "tmp: debug logging" This reverts commit 70ed34c. * fix: add podSelectorList to fake NetPol * log: do not print error when failing to delete non-existent nft rule * log: verbose iptables bootup * log: use fmt.Errorf for clean logging * log: never return error for iptables in background and fix some lints * fix: activate/deactivate azure chain rules * fix: correctly decrement netpols in kernel * ci: run UTs again * ci: update profiles. default to placefirst=false * address comment: rename batch to pendingPolicy * refactor: make dirty cache OS-specific * test: UTs * test: put UT cfg back to placefirst to not break things * ci: update cyclonus workflows * fmt: address comment & lint * fmt: rename numInKernel to policiesInKernel * log: switch to fmt.Errorf * fmt: whitespace * feat: resiliency to errors while reconciling dirty netpols * log: temporarily print everything for ipset restore * fix: remove nomatch from ipset -D for cidr blocks * test: UTs for non-happy path * test: fix hns fake * fix: don't change windows. let it delete ipsets when removing policies * fix windows lint * fix: ignore chain doesn't exist errors for iptables -D * feat: latency and failure metrics * test: update exit code for UT * metrics: new metrics should go in node-metrics path * style: simplify nesting * style: move identical windows & linux code to shared file * ci: remove v1 conformance and cyclonus * feat: add NetPols in background from the DP (revert background code in pMgr) * style: remove "background" from iptables metrics * revert changes in ipsetmanager, const.go, and dp.Remove/UpdatePolicy * style: whitespace * perf: use len() instead of creating slice from map * remove verbosity for iptables bootup * build: add return statement * style: whitespace * build: fix variable shadowing * build: fix more import shadowing * build: windows pointer issue and UT issue * test: fix UT for iptables error code 2 * ci: enable linux scale test * ci: revert to master pipeline.yaml * revert changes to chain-management. do changes in PR #2012 * log: change wording * test: UTs for netpol in background * log: wording * feat: apply ipsets for each netpol individually * config: rearrange ConfigMap & update capz yaml * fix: windows bootup phase logic for addpolicy * feat: restrict netpol in background to linux + nftables * test: skip nftables check for UT * style: netpols[0] instead of loop * log: address log comments * style: lint for long line --------- Co-authored-by: Vamsi Kalapala <[email protected]>
1 parent 99843e9 commit ebddca1

33 files changed

+875
-269
lines changed

.github/workflows/cyclonus-netpol-extended-nightly-test.yaml

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,10 @@ jobs:
1515
# run cyclonus tests in parallel for NPM with the given ConfigMaps
1616
profile:
1717
[
18-
v1-default.yaml,
19-
v1-place-azure-chain-first.yaml,
20-
v2-default.yaml,
2118
v2-apply-on-need.yaml,
22-
v2-place-azure-after-kube-services.yaml,
19+
v2-background.yaml,
20+
v2-foreground.yaml,
21+
v2-place-first.yaml,
2322
]
2423
steps:
2524
- name: Checkout

.github/workflows/cyclonus-netpol-test.yaml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,13 @@ jobs:
2020
strategy:
2121
matrix:
2222
# run cyclonus tests in parallel for NPM with the given ConfigMaps
23-
profile: [v1-default.yaml, v1-place-azure-chain-first.yaml, v2-default.yaml, v2-apply-on-need.yaml, v2-place-azure-after-kube-services.yaml]
23+
profile:
24+
[
25+
v2-apply-on-need.yaml,
26+
v2-background.yaml,
27+
v2-foreground.yaml,
28+
v2-place-first.yaml,
29+
]
2430
steps:
2531
- name: Checkout
2632
uses: actions/checkout@v3

.pipelines/npm/npm-conformance-tests.yaml

Lines changed: 16 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -90,21 +90,21 @@ jobs:
9090
displayName: "Run Kubernetes Network Policy Test Suite"
9191
strategy:
9292
matrix:
93-
v1-default:
94-
AZURE_CLUSTER: "conformance-v1-default"
95-
PROFILE: "v1-default"
93+
v2-foreground:
94+
AZURE_CLUSTER: "conformance-v2-foreground"
95+
PROFILE: "v2-foreground"
9696
IS_STRESS_TEST: "false"
97-
v2-default:
98-
AZURE_CLUSTER: "conformance-v2-default"
99-
PROFILE: "v2-default"
97+
v2-background:
98+
AZURE_CLUSTER: "conformance-v2-background"
99+
PROFILE: "v2-background"
100100
IS_STRESS_TEST: "false"
101-
v2-default-ws22:
102-
AZURE_CLUSTER: "conformance-v2-default-ws22"
101+
v2-ws22:
102+
AZURE_CLUSTER: "conformance-v2-ws22"
103103
PROFILE: "v2-default-ws22"
104104
IS_STRESS_TEST: "false"
105-
v2-default-stress:
106-
AZURE_CLUSTER: "conformance-v2-default-stress"
107-
PROFILE: "v2-default"
105+
v2-linux-stress:
106+
AZURE_CLUSTER: "conformance-v2-linux-stress"
107+
PROFILE: "v2-background"
108108
IS_STRESS_TEST: "true"
109109
pool:
110110
name: $(BUILD_POOL_NAME_DEFAULT)
@@ -117,7 +117,7 @@ jobs:
117117
TAG: $[ dependencies.setup.outputs['EnvironmentalVariables.TAG'] ]
118118
FQDN: empty
119119
steps:
120-
- checkout: none
120+
- checkout: self
121121
- download: current
122122
artifact: Test
123123

@@ -200,7 +200,7 @@ jobs:
200200
fi
201201
202202
az aks get-credentials -n $(AZURE_CLUSTER) -g $(RESOURCE_GROUP) --file ./kubeconfig
203-
./kubectl --kubeconfig=./kubeconfig apply -f https://raw.githubusercontent.com/Azure/azure-container-networking/master/npm/examples/windows/azure-npm.yaml
203+
./kubectl --kubeconfig=./kubeconfig apply -f $(Pipeline.Workspace)/s/npm/examples/windows/azure-npm.yaml
204204
./kubectl --kubeconfig=./kubeconfig set image daemonset/azure-npm-win -n kube-system azure-npm=$IMAGE_REGISTRY/azure-npm:windows-amd64-ltsc2022-$(TAG)
205205
206206
else
@@ -219,13 +219,13 @@ jobs:
219219
az aks get-credentials -n $(AZURE_CLUSTER) -g $(RESOURCE_GROUP) --file ./kubeconfig
220220
221221
# deploy azure-npm
222-
./kubectl --kubeconfig=./kubeconfig apply -f https://raw.githubusercontent.com/Azure/azure-container-networking/master/npm/azure-npm.yaml
222+
./kubectl --kubeconfig=./kubeconfig apply -f $(Pipeline.Workspace)/s/npm/azure-npm.yaml
223223
224224
# swap azure-npm image with one built during run
225225
./kubectl --kubeconfig=./kubeconfig set image daemonset/azure-npm -n kube-system azure-npm=$IMAGE_REGISTRY/azure-npm:linux-amd64-$(TAG)
226226
227227
# swap NPM profile with one specified as parameter
228-
./kubectl --kubeconfig=./kubeconfig apply -f https://raw.githubusercontent.com/Azure/azure-container-networking/master/npm/profiles/$(PROFILE).yaml
228+
./kubectl --kubeconfig=./kubeconfig apply -f $(Pipeline.Workspace)/s/npm/profiles/$(PROFILE).yaml
229229
./kubectl --kubeconfig=./kubeconfig rollout restart ds azure-npm -n kube-system
230230
fi
231231
@@ -437,7 +437,7 @@ jobs:
437437
chmod +x kubectl
438438
439439
# deploy azure-npm
440-
./kubectl --kubeconfig=./kubeconfig apply -f https://raw.githubusercontent.com/Azure/azure-container-networking/master/npm/examples/windows/azure-npm.yaml
440+
./kubectl --kubeconfig=./kubeconfig apply -f $(Pipeline.Workspace)/s/npm/examples/windows/azure-npm.yaml
441441
442442
# swap azure-npm image with one built during run
443443
./kubectl --kubeconfig=./kubeconfig set image daemonset/azure-npm-win -n kube-system azure-npm=$IMAGE_REGISTRY/azure-npm:windows-amd64-ltsc2022-$(TAG)

.pipelines/npm/npm-scale-test.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -78,10 +78,10 @@ jobs:
7878
FQDN: empty
7979
strategy:
8080
matrix:
81-
# v2-linux:
82-
# PROFILE: "sc-lin"
83-
# NUM_NETPOLS: 800
84-
# INITIAL_CONNECTIVITY_TIMEOUT: 60
81+
v2-linux:
82+
PROFILE: "sc-lin"
83+
NUM_NETPOLS: 800
84+
INITIAL_CONNECTIVITY_TIMEOUT: 60
8585
ws22:
8686
PROFILE: "sc-ws22"
8787
NUM_NETPOLS: 50

network/hnswrapper/hnsv2wrapperfake.go

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -150,10 +150,12 @@ func (f Hnsv2wrapperFake) ModifyNetworkSettings(network *hcn.HostComputeNetwork,
150150
if setpol.PolicyType != hcn.SetPolicyTypeIpSet && setpol.Values != "" {
151151
// Check Nested SetPolicy members
152152
members := strings.Split(setpol.Values, ",")
153-
for _, memberID := range members {
154-
_, ok := networkCache.Policies[memberID]
155-
if !ok {
156-
return newErrorFakeHNS(fmt.Sprintf("Member Policy %s not found for hcn.RequestTypeUpdate", memberID))
153+
if setpol.Values != "" {
154+
for _, memberID := range members {
155+
_, ok := networkCache.Policies[memberID]
156+
if !ok {
157+
return newErrorFakeHNS(fmt.Sprintf("Member Policy %s not found for hcn.RequestTypeUpdate", memberID))
158+
}
157159
}
158160
}
159161
}

npm/azure-npm.yaml

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -148,19 +148,22 @@ metadata:
148148
data:
149149
azure-npm.json: |
150150
{
151-
"ResyncPeriodInMinutes": 15,
152-
"ListeningPort": 10091,
153-
"ListeningAddress": "0.0.0.0",
154-
"ApplyMaxBatches": 100,
155-
"ApplyIntervalInMilliseconds": 500,
156-
"MaxBatchedACLsPerPod": 30,
151+
"ResyncPeriodInMinutes": 15,
152+
"ListeningPort": 10091,
153+
"ListeningAddress": "0.0.0.0",
154+
"ApplyIntervalInMilliseconds": 500,
155+
"ApplyMaxBatches": 100,
156+
"MaxBatchedACLsPerPod": 30,
157+
"NetPolInvervalInMilliseconds": 500,
158+
"MaxPendingNetPols": 100,
157159
"Toggles": {
158160
"EnablePrometheusMetrics": true,
159161
"EnablePprof": true,
160162
"EnableHTTPDebugAPI": true,
161163
"EnableV2NPM": true,
162-
"PlaceAzureChainFirst": true,
164+
"PlaceAzureChainFirst": false,
163165
"ApplyIPSetsOnNeed": false,
164-
"ApplyInBackground": true
166+
"ApplyInBackground": true,
167+
"NetPolInBackground": true
165168
}
166169
}

npm/cmd/start.go

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,19 @@ func start(config npmconfig.Config, flags npmconfig.Flags) error {
125125
// update the dataplane config
126126
npmV2DataplaneCfg.MaxBatchedACLsPerPod = config.MaxBatchedACLsPerPod
127127

128+
npmV2DataplaneCfg.NetPolInBackground = config.Toggles.NetPolInBackground
129+
if config.NetPolInvervalInMilliseconds > 0 {
130+
npmV2DataplaneCfg.NetPolInterval = time.Duration(config.NetPolInvervalInMilliseconds * int(time.Millisecond))
131+
} else {
132+
npmV2DataplaneCfg.NetPolInterval = time.Duration(npmconfig.DefaultConfig.NetPolInvervalInMilliseconds * int(time.Millisecond))
133+
}
134+
135+
if config.MaxPendingNetPols > 0 {
136+
npmV2DataplaneCfg.MaxPendingNetPols = config.MaxPendingNetPols
137+
} else {
138+
npmV2DataplaneCfg.MaxPendingNetPols = npmconfig.DefaultConfig.MaxPendingNetPols
139+
}
140+
128141
npmV2DataplaneCfg.ApplyInBackground = config.Toggles.ApplyInBackground
129142
if config.ApplyMaxBatches > 0 {
130143
npmV2DataplaneCfg.ApplyMaxBatches = config.ApplyMaxBatches

npm/config/config.go

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ const (
77
defaultApplyMaxBatches = 100
88
defaultApplyInterval = 500
99
defaultMaxBatchedACLsPerPod = 30
10+
defaultMaxPendingNetPols = 100
11+
defaultNetPolInterval = 500
1012
defaultListeningPort = 10091
1113
defaultGrpcPort = 10092
1214
defaultGrpcServicePort = 9002
@@ -35,14 +37,20 @@ var DefaultConfig = Config{
3537
ApplyIntervalInMilliseconds: defaultApplyInterval,
3638
MaxBatchedACLsPerPod: defaultMaxBatchedACLsPerPod,
3739

40+
MaxPendingNetPols: defaultMaxPendingNetPols,
41+
NetPolInvervalInMilliseconds: defaultNetPolInterval,
42+
3843
Toggles: Toggles{
3944
EnablePrometheusMetrics: true,
4045
EnablePprof: true,
4146
EnableHTTPDebugAPI: true,
4247
EnableV2NPM: true,
43-
PlaceAzureChainFirst: util.PlaceAzureChainFirst,
48+
PlaceAzureChainFirst: util.PlaceAzureChainAfterKubeServices,
4449
ApplyIPSetsOnNeed: false,
45-
ApplyInBackground: true,
50+
// ApplyInBackground is currently used in Windows to apply the following in background: IPSets and NetPols for new/updated Pods
51+
ApplyInBackground: true,
52+
// NetPolInBackground is currently used in Linux to apply NetPol controller Add events in the background
53+
NetPolInBackground: true,
4654
},
4755
}
4856

@@ -69,8 +77,10 @@ type Config struct {
6977
// MaxBatchedACLsPerPod is the maximum number of ACLs that can be added to a Pod at once in Windows.
7078
// The zero value is valid.
7179
// A NetworkPolicy's ACLs are always in the same batch, and there will be at least one NetworkPolicy per batch.
72-
MaxBatchedACLsPerPod int `json:"MaxBatchedACLsPerPod,omitempty"`
73-
Toggles Toggles `json:"Toggles,omitempty"`
80+
MaxBatchedACLsPerPod int `json:"MaxBatchedACLsPerPod,omitempty"`
81+
MaxPendingNetPols int `json:"MaxPendingNetPols,omitempty"`
82+
NetPolInvervalInMilliseconds int `json:"NetPolInvervalInMilliseconds,omitempty"`
83+
Toggles Toggles `json:"Toggles,omitempty"`
7484
}
7585

7686
type Toggles struct {
@@ -82,6 +92,8 @@ type Toggles struct {
8292
ApplyIPSetsOnNeed bool
8393
// ApplyInBackground applies for Windows only
8494
ApplyInBackground bool
95+
// NetPolInBackground
96+
NetPolInBackground bool
8597
}
8698

8799
type Flags struct {

npm/examples/windows/azure-npm-capz.yaml

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -145,17 +145,17 @@ data:
145145
"ResyncPeriodInMinutes": 15,
146146
"ListeningPort": 10091,
147147
"ListeningAddress": "0.0.0.0",
148+
"ApplyIntervalInMilliseconds": 500,
149+
"ApplyMaxBatches": 100,
150+
"MaxBatchedACLsPerPod": 30,
148151
"Toggles": {
149152
"EnablePrometheusMetrics": true,
150153
"EnablePprof": true,
151154
"EnableHTTPDebugAPI": true,
152155
"EnableV2NPM": true,
153156
"PlaceAzureChainFirst": true,
154-
"ApplyIPSetsOnNeed": false
157+
"ApplyIPSetsOnNeed": false,
158+
"ApplyInBackground": true,
159+
"NetPolInBackground": false
155160
},
156-
"Transport": {
157-
"Address": "azure-npm.kube-system.svc.cluster.local",
158-
"Port": 10092,
159-
"ServicePort": 9001
160-
}
161161
}

npm/examples/windows/azure-npm.yaml

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -140,19 +140,22 @@ metadata:
140140
data:
141141
azure-npm.json: |
142142
{
143-
"ResyncPeriodInMinutes": 15,
144-
"ListeningPort": 10091,
145-
"ListeningAddress": "0.0.0.0",
146-
"ApplyMaxBatches": 100,
147-
"ApplyIntervalInMilliseconds": 500,
148-
"MaxBatchedACLsPerPod": 30,
143+
"ResyncPeriodInMinutes": 15,
144+
"ListeningPort": 10091,
145+
"ListeningAddress": "0.0.0.0",
146+
"ApplyIntervalInMilliseconds": 500,
147+
"ApplyMaxBatches": 100,
148+
"MaxBatchedACLsPerPod": 30,
149+
"NetPolInvervalInMilliseconds": 500,
150+
"MaxPendingNetPols": 100,
149151
"Toggles": {
150152
"EnablePrometheusMetrics": true,
151153
"EnablePprof": true,
152154
"EnableHTTPDebugAPI": true,
153155
"EnableV2NPM": true,
154-
"PlaceAzureChainFirst": true,
156+
"PlaceAzureChainFirst": false,
155157
"ApplyIPSetsOnNeed": false,
156-
"ApplyInBackground": true
158+
"ApplyInBackground": true,
159+
"NetPolInBackground": true
157160
}
158161
}

0 commit comments

Comments
 (0)