Fix flakes in integration suite #991
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note to reviewers: remember to look at the commits in this PR and consider if they can be squashed
Note to contributors: remember to re-generate client set if there are any API changes
Summary Of Changes
Additional Context
The "retain" tests were had a flaky behaviour. Sometimes they were passing due to a flake. The "retain" tests were
creating and deleting their topology objects one call right after
the other, without asserting that the object was created (from etcd PoV). When the object deletion logic started
"too soon", before etcd recorded the object creation, the tests were passing because the controller has a logic
to skip deletion if the object is not found. They were also very sensible to test pollution (which was already
present, it was not introduced by the "retain" contribution). Other controllers could overwrite the retain policy
to its default
delete, causing the "retain" tests to rightfuly fail.To fix the test pollution, the refactor changed the manager setup, to use label selectors for the controller
informer cache. In addition to the existing namespace filtering. Due to this change, the manager has to start
later in the test setup, because the tests use the topology object name as label, and this value is not known
to the setup logic until its very close to the test.
Finally, the last remain of test pollution was in the vhost "limits" test. These tests left behind a variable
that is always used in the vhost object initialisation. When the "limits" tests ran before the "delete" tests,
these would cause a flake. This did not always happen because we (correctly) randomise the test execution order
with ginkgo.