Skip to content

Conversation

@eleo007
Copy link
Contributor

@eleo007 eleo007 commented Nov 11, 2024

CHANGE DESCRIPTION

Problem:

  1. demand-backup-physical-sharded and demand-backup-physical failing. In e2e tests, we start 2nd restore just after 1st restore finishes and it fails because of running resync operation.
  2. pvc-resize on openshift fails
  3. serviceless-external-nodes on openshift fails

Cause:

  1. demand-backup-physical-sharded and demand-backup-physical failing due to long PBM resync. After physical restore finishes, operator automatically starts resync. Especially on storages with lots of backups, resync takes a long time.
  2. pvc-resize on openshift fails - starting with 1.18.0 pvc-resize is disabled by default. On EKS and openshift we recreate cluster in the middle of pvc-reside (due to limits for resize requests) but did not enable pvc-resize again
  3. serviceless-external-nodes on openshift fails - check of secrets number was added to tests. secrets on openshift have dockercfg secrets:
k get secret
NAME                                              TYPE                      DATA   AGE
builder-dockercfg-xzpz2                           kubernetes.io/dockercfg   1      144m
default-dockercfg-dnthr                           kubernetes.io/dockercfg   1      144m
deployer-dockercfg-spbmz                          kubernetes.io/dockercfg   1      144m
internal-my-cluster-name-users                    Opaque                    10     7m55s
my-cluster-name-mongodb-encryption-key            Opaque                    1      7m54s
my-cluster-name-mongodb-keyfile                   Opaque                    1      7m54s
my-cluster-name-secrets                           Opaque                    10     7m55s
my-cluster-name-ssl                               kubernetes.io/tls         3      7m54s
my-cluster-name-ssl-internal                      kubernetes.io/tls         3      7m54s
percona-server-mongodb-operator-dockercfg-ndbth   kubernetes.io/dockercfg   1      7m58s

As a result count of secrets is bigger than expected. + test failed due to securityContext diff.

Solution:

  1. Wait for resync to finish before starting another restore. Increase timeouts
  2. Enable pvc-reize for the recreated cluster
  3. Add diff file for openshift and count only $cluster secrets for check.

CHECKLIST

Jira

  • Is the Jira ticket created and referenced properly?
  • Does the Jira ticket have the proper statuses for documentation (Needs Doc) and QA (Needs QA)?
  • Does the Jira ticket link to the proper milestone (Fix Version field)?

Tests

  • Is an E2E test/test case added for the new feature/change?
  • Are unit tests added where appropriate?
  • Are OpenShift compare files changed for E2E tests (compare/*-oc.yml)?

Config/Logging/Testability

  • Are all needed new/changed options added to default YAML files?
  • Are all needed new/changed options added to the Helm Chart?
  • Did we add proper logging messages for operator actions?
  • Did we ensure compatibility with the previous version or cluster upgrade process?
  • Does the change support oldest and newest supported MongoDB version?
  • Does the change support oldest and newest supported Kubernetes version?

eleo007 and others added 2 commits November 11, 2024 11:20
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
@eleo007 eleo007 closed this Nov 11, 2024
@eleo007 eleo007 deleted the release-1.18.0_check_tests branch May 2, 2025 10:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L 100-499 lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants