-
Notifications
You must be signed in to change notification settings - Fork 23
Automator: merge upstream changes to openshift-service-mesh/istio@master #574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
openshift-merge-bot
merged 10 commits into
openshift-service-mesh:master
from
openshift-service-mesh-bot:none-master-merge_upstream_istio_master-6253864e
Feb 11, 2026
Merged
Automator: merge upstream changes to openshift-service-mesh/istio@master #574
openshift-merge-bot
merged 10 commits into
openshift-service-mesh:master
from
openshift-service-mesh-bot:none-master-merge_upstream_istio_master-6253864e
Feb 11, 2026
+370
−81
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Reduce flakiness of the TestServiceRestart test
In my local environment the test flakes 2-3 times per 1000 runs. Looking
at the causes of the flakes I found two:
1. The echo process gets terminated prematurely - before the pod is
removed from the list of service endpoints
2. DNS resolution fails with timeout - I've only seen this once, so it's
quite rare and it's not an Istio issue, so I'm going to ignore this
cause for now (we can potentially harden echo implementation by
allowing retries of DNS failures, though)
The way a restart works in k8s deployment is roughly happens in the
following order:
- deployment brings up a new pod
- deployment waits for the new pod to become healthy
- deployment terminates the old pod
In parallel with termination of the new pod, if deployment pods are
behind a service, the terminating pod gets removed from the list of
the service endpoints.
The important thing here is that terminating pod and removing it from
the service endpoints list happens concurrently, so you could end up in
a situation, when the processes in the pod already terminated, but the
pod is still listed as a service endpoint and can get requests routed to
it. Obviously, such requests will fail.
In echo app implementation used in our tests we protect against that by
delaying echo process termination. The way it works is roughly as
follows:
1. Echo receives a SIGTERM from Kubelet - this is Kubelet's way to ask
the pod politely to shut down.
2. Echo starts a 1 second timer:
- if during this 1 second interval echo receives a request it rests
the timer and start waiting for another 1 second
3. If Echo does not receive a request before timer elapses, Echo shuts
down.
And basically the intention here is to wait for as long as we get
requests routed to the pod. Once requests stop, we assume that the pod
has been deleted from service endpoints list and it's safe to shut down.
The `TestServiceRestart` test currently sends a request every 100ms, so
during the 1 second the Echo waits for requests to come we will have ~10
attempts - that seems like enough, but let's count.
In the test setup normally we have 2 deployments (v1 and v2) each with 1
pod, so 2 pods in total. However, during the restart in one of the
deployments we first bring up a new pod and only after that start
terminating the old pod, so for a period of time we will have 3 pods to
choose from.
That means that each request, assuming uniformity, has about 2/3 chance
of *not* getting routed to the terminating pod. And assuming
independence all ~10 attempts can miss the terminating pod with about
1-2% chance, so it's small, but not insignificant.
If all 10 attempts miss the terminating pod, it will stop listening on
its ports and shut down. If by the time it happens the pod is not yet
removed from the service endpoint list it can still get new requests and
those requests, if routed to the pod, will fail, failing the test along
the way.
This PR changes the call interval from 100ms to 40ms. Due to an
exponential nature of the probaility calculation function, this change
results in a significantly lower chance of hitting the flake (e.g. more
than 100 times lower chance, though the calculations I did are rather
rough).
Related to #58226
Signed-off-by: Mikhail Krinkin <[email protected]>
* Fix spelling and formatting
Signed-off-by: Mikhail Krinkin <[email protected]>
---------
Signed-off-by: Mikhail Krinkin <[email protected]>
Signed-off-by: xin.li <[email protected]>
Contributor
|
/retest |
Contributor
|
/test istio-integration-telemetry |
Contributor
|
/test unit |
Contributor
|
/test istio-integration-sail-pilot |
…echo client/server (#58918) * test: add support for customizing min TLS version and ECDH curves in echo client/server Signed-off-by: Jacek Ewertowski <[email protected]> * Break line to fix linter error Signed-off-by: Jacek Ewertowski <[email protected]> * Set TLS min version and curve preferences in the deployment template Signed-off-by: Jacek Ewertowski <[email protected]> * Deduplicate implementation of parse TLS settings Signed-off-by: Jacek Ewertowski <[email protected]> * Support only standard curve names Signed-off-by: Jacek Ewertowski <[email protected]> * Update curve names in flag descriptions Signed-off-by: Jacek Ewertowski <[email protected]> * Return error on invalid TLS version or curve preference Signed-off-by: Jacek Ewertowski <[email protected]> * Fix ineffassign issue Signed-off-by: Jacek Ewertowski <[email protected]> * Wrap parse errors Signed-off-by: Jacek Ewertowski <[email protected]> --------- Signed-off-by: Jacek Ewertowski <[email protected]>
Contributor
|
/retest |
a1b74c8 to
c21412e
Compare
Contributor
|
/retest |
1 similar comment
Contributor
|
/retest |
* wip: add tls-inspector when using tls dynmaic dns so we have sni information for the dfp Signed-off-by: Ian Rudie <[email protected]> * release note Signed-off-by: Ian Rudie <[email protected]> * simplify, test Signed-off-by: Ian Rudie <[email protected]> --------- Signed-off-by: Ian Rudie <[email protected]>
* upstream/master: gatewayapi: make the cors origin stricter with wildcards (#59026) Automator: update istio/client-go@master dependency in istio/istio@master (#59036) Automator: update istio/client-go@master dependency in istio/istio@master (#59034) add tls-inspector for listener with only TLS ports (#59028) test: add support for customizing min TLS version and ECDH curves in echo client/server (#58918) Automator: update go-control-plane in istio/istio@master (#59025) improve bugreport output (#58977) update EOL information (#58991) Reduce flakiness of the TestServiceRestart test (#59022)
c21412e to
d1b8fcd
Compare
Contributor
|
/retest |
e1ade98
into
openshift-service-mesh:master
13 checks passed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Generated by Automator - 2026-02-11T05:04:32+00:00