-
Notifications
You must be signed in to change notification settings - Fork 17
CLOUDP-316922 - Fix racy and slow auth tests like in openshift clusters #384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
MCK 1.3.0 Release NotesNew FeaturesMulti-Architecture SupportWe've added comprehensive multi-architecture support for the kubernetes operator. This enhancement enables deployment on IBM Power (ppc64le) and IBM Z (s390x) architectures alongside
Bug Fixes
Other Changes
|
@@ -76,6 +77,63 @@ def fetch(self, context: OIDCCallbackContext) -> OIDCCallbackResult: | |||
return OIDCCallbackResult(access_token=u.id_token) | |||
|
|||
|
|||
def _wait_for_mongodbuser_reconciliation() -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a kind of catch - all approach. Instead I could take the time to find all places where we update the secret/user and add this there. But I don't know where it is and might end up as a rabbit chase. I think that would be the correct approach - but I rather have this as a dedicated item instead.
|
||
def assert_scram_sha_authentication_fails( | ||
self, | ||
username: str, | ||
password: str, | ||
retries: int = 20, | ||
attempts: int = 50, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lot of auth times we were really close to the 1m (20*5) mark. That is not a good timeout
525b3a1
to
3a66cf5
Compare
|
||
if exitstatus != 0: | ||
try: | ||
automation_config_tester = tester.get_automation_config_tester() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a lot of times the ac would help a lot (especially in those auth change cases)
@@ -1029,6 +1029,8 @@ def assert_reaches_phase( | |||
# This can be an intermediate error, right before we check for this secret we create it. | |||
# The cluster might just be slow | |||
"failed to locate the api key secret", | |||
# etcd might be slow | |||
"etcdserver: request timed out", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the reason that the multi reconcile test keeps failing is due to a large amount of appdb concurrent creation of configmaps (the automation config).
("Error creating automation config map in cluster __default: etcdserver: request timed out")
Summary
fixes:
both were mostly failing on master merges on openshift.
PoW shows 3x passing in a row each one
Reliability improvements for authentication tests:
Added a
_wait_for_mongodbuser_reconciliation
function to ensure allMongoDBUser
resources reach the "Updated" phase before authentication attempts, preventing race conditions after user/password changes. This function is now called at the start of all authentication assertion methodsIncreased the default number of authentication attempts 50 across all relevant methods
Error handling and diagnostics:
fixed logging error, we were passing
msg
but that doesn't exist and thus we caused a panic when the test was failing - masking the errorEnhanced diagnostics collection in
tests/conftest.py
to also save the automation config JSON for each project when tests fail, aiding post-mortem analysis.Proof of Work
Checklist
skip-changelog
label if not needed