[Windows] Reconcile host-network Pods after agent is restarted#6944
[Windows] Reconcile host-network Pods after agent is restarted#6944antoninbas merged 1 commit intoantrea-io:mainfrom
Conversation
|
@wenyingd there is a golang-lint issue, please fix it. |
547d964 to
da952ef
Compare
updated. |
|
/test-all |
| } | ||
|
|
||
| // getPodsListOptions returns the Pods running on the current Node. Note, the host-network Pods are not filtered | ||
| // out on Windows because they are also managed by antrea as long as "spec.SecurityContext.windowsOptions.hostProcess" |
There was a problem hiding this comment.
what will happen if "spec.SecurityContext.windowsOptions.hostProcess" is configured and the Pod is not filtered out?
There was a problem hiding this comment.
Nothing was performed since we didn't create OVS interfaces for such Pods.
Having offline sync with @tnqn , we would not perform other filters on Windows in reconcile logic except for the node name to get local Pods and resourceVersion=0 to avoid performance issue, this is to try our best to ensure the reconciled Pod set can cover those Antrea have handled.
da952ef to
065af4a
Compare
|
/test-all |
luolanzone
left a comment
There was a problem hiding this comment.
LGTM, one nit.
@tnqn @antoninbas can you take a look? thanks.
065af4a to
1193dc7
Compare
|
/test-all |
|
/test-windows-e2e |
|
/test-windows-networkpolicy |
antoninbas
left a comment
There was a problem hiding this comment.
could we add an e2e test for this as well?
|
|
||
| // Re-install Pod1 flows | ||
| podFlowsInstalled := make(chan string, 2) | ||
| expReInstalledPodCount := 3 |
There was a problem hiding this comment.
nit: expectedReinstalledPodCount
reinstall is one word
There was a problem hiding this comment.
updated.
I will add an e2e case to verify the host-network Pod traffic is working even if antrea-agent is restarted on Windows.
1193dc7 to
f6c3466
Compare
|
/test-windows-all |
| func (c *CNIConfig) getInfraContainer() string { | ||
| return getInfraContainer(c.ContainerId, c.Netns) | ||
| } | ||
|
|
There was a problem hiding this comment.
| | Pod Type | Spec.HostNetwork | windowsOptions.hostProcess | Kubelet CMDADD | OVS interface needed? | Reconcile needed? | | |
| | 1. Normal Pod (non-HostNetwork) | false | false / not set | Yes | Yes | Yes | | |
| | 2. Windows HostNetwork Pod | true | false / not set | Yes | Yes | Yes | | |
| | 3. Windows HostProcess Pod | (any) | true | No | No | No (no real network config) | |
@wenyingd Can we add a table to the code comments to explain the different behaviors of Windows Pods for developers?
f6c3466 to
c3d9724
Compare
|
/test-windows-e2e |
1 similar comment
|
/test-windows-e2e |
|
/test-windows-all |
c3d9724 to
127aaf4
Compare
|
/test-windows-e2e |
test/e2e/connectivity_test.go
Outdated
| assert.NoErrorf(t, err, "The Openflow entries should be consistent after Antrea agent restarts") | ||
|
|
||
| data.runPingMesh(t, podInfos, toolboxContainerName, true) | ||
| time.Sleep(10 * time.Minute) |
There was a problem hiding this comment.
I think you had this for debugging purposes, that's a very long sleep :)
There was a problem hiding this comment.
Yes, I will remove it after finishing the debug on the new e2e case.
| // This case only run on Windows. The Pods in the test includes both host-network Pod and none host-network Pod, | ||
| // because CNI on Windows is also responsible for the host-network Pod's networking as long as it is not using | ||
| // host-process containers. | ||
| func testWindowsPodConnectivityAfterAntreaRestart(t *testing.T, data *TestData) { |
There was a problem hiding this comment.
I guess one issue with the test as currently written is that if there is a failure, it's not clear which one of the 2 Pods may be responsible (assuming only 1 Pod is broken after restart).
Maybe we could have 1 Linux Pod as the client, and then use it to ping each Windows Pod independently? This way we could split this test into 2 subtests, one for the regular Pod and one for the hostNetwork Pod. Would that make sense?
There was a problem hiding this comment.
Make sense, updated.
127aaf4 to
1cf8dcf
Compare
1cf8dcf to
6234815
Compare
|
/test-windows-all |
6234815 to
471d2a7
Compare
|
/test-windows-all |
|
/test-all |
|
The new added e2e is succeeded, the failed case in windows-e2e is See: http://10.164.243.223/view/Windows/job/antrea-windows-e2e-for-pull-request/89 |
antoninbas
left a comment
There was a problem hiding this comment.
couple nits, otherwise LGTM
| if err := data.createToolboxPodOnNode(clientPod.Name, clientPod.Namespace, clientPod.NodeName, false); err != nil { | ||
| t.Fatalf("Error when creating Pod '%s': %v", clientPod.Name, err) | ||
| } | ||
| defer deletePodWrapper(t, data, clientPod.Namespace, clientPod.Name) |
There was a problem hiding this comment.
feel free to omit this, especially for a simple Linux Pod like this one, the Pod will be automatically deleted when the test Namespace is deleted at the end of the test.
There was a problem hiding this comment.
I would keep this code since personally I prefer to clean up the new resources created dedicated for the case after the test.
test/e2e/connectivity_test.go
Outdated
| data.runPingMesh(t, testPodInfos, toolboxContainerName, true) | ||
|
|
||
| // Count the OVS flows. | ||
| oriOVSFlows := data.dumpOVSFlows(t, winWorkerNode) |
There was a problem hiding this comment.
spell out original
originalOVSFlows or initialOVSFlows
There was a problem hiding this comment.
Sure, updated accordingly.
test/e2e/connectivity_test.go
Outdated
| } | ||
| } | ||
|
|
||
| // verifyWindowsPodConnectivity checks the Pods' connectivity after antrea-agent is restarted on Windows. |
There was a problem hiding this comment.
checks Pod connectivity
test/e2e/connectivity_test.go
Outdated
| } | ||
|
|
||
| // verifyWindowsPodConnectivity checks the Pods' connectivity after antrea-agent is restarted on Windows. | ||
| // The Pods in the test includes both host-network Pod and the generic Pod, because CNI on Windows is also |
There was a problem hiding this comment.
We test both the generic Pod case and the host-network Pod case, because ...
This change is to support reconciling the host-network Pods on Windows because k8s expects to let CNI manage such Pods as long as they are not using host-process containers. Antrea has received the CmdAdd request for such Pods when they are created, so they should be included in the Pod reconcile list after agent is restarted. Signed-off-by: Wenying Dong <wenyingd@vmware.com>
471d2a7 to
a8a7ef7
Compare
|
/test-all |
|
@wenyingd will you backport this? |
Sure, I will back port it to release-2.1 and release-2.2. |
This change is to support reconciling the host-network Pods on Windows because k8s expects to let CNI manage such Pods as long as they are not using host-process containers. Antrea received the CmdAdd request for such Pods when they were created, so they should be included in the Pod reconcile list after agent is restarted. Signed-off-by: Wenying Dong <wenyingd@vmware.com>
This change is to support reconciling the host-network Pods on Windows because k8s expects to let CNI manage such Pods as long as they are not using host-process containers. Antrea received the CmdAdd request for such Pods when they were created, so they should be included in the Pod reconcile list after agent is restarted. Signed-off-by: Wenying Dong <wenyingd@vmware.com>
This change is to support reconciling the host-network Pods on Windows because k8s expects to let CNI manage such Pods as long as they are not using host-process containers. Antrea received the CmdAdd request for such Pods when they were created, so they should be included in the Pod reconcile list after agent is restarted. Signed-off-by: Wenying Dong <wenyingd@vmware.com>
This change is to support reconciling the host-network Pods on Windows because k8s expects to let CNI manage such Pods as long as they are not using host-process containers. Antrea received the CmdAdd request for such Pods when they were created, so they should be included in the Pod reconcile list after agent is restarted. Signed-off-by: Wenying Dong <wenyingd@vmware.com>
This change is to support reconciling the host-network Pods on Windows because k8s expects to let CNI manage such Pods as long as they are not using host-process containers. Antrea received the CmdAdd request for such Pods when they were created, so they should be included in the Pod reconcile list after agent is restarted. Signed-off-by: Wenying Dong <wenyingd@vmware.com>
This change is to support reconciling the host-network Pods on Windows because k8s expects to let CNI manage such Pods as long as they are not using host-process containers. Antrea received the CmdAdd request for such Pods when they were created, so they should be included in the Pod reconcile list after agent is restarted. Signed-off-by: Wenying Dong <wenyingd@vmware.com>
This change is to support reconciling the host-network Pods on Windows because k8s expects to let CNI manage such Pods as long as they are not using host-process containers.
Antrea has received the CmdAdd request for such Pods when they are created, so they should be included in the Pod reconcile list after agent is restarted.
Fix: #6943