Skip to content

No longer drop zen db during restore, change zenservice check, update retry count, change iam-config-job check#2505

Merged
ibm-ci-bot merged 4 commits intoscripts-devfrom
bluzarraga-patch-4
May 2, 2025
Merged

No longer drop zen db during restore, change zenservice check, update retry count, change iam-config-job check#2505
ibm-ci-bot merged 4 commits intoscripts-devfrom
bluzarraga-patch-4

Conversation

@bluzarraga
Copy link
Member

@bluzarraga bluzarraga commented May 2, 2025

What this PR does / why we need it: It is possible for the zenservice's zenstatus field to be set to InMaintenance before the restore is started. This means that, even though the zenservice is ready for the restore job to start, the restore job will stall because it is looking specifically for the zenstatus to read Completed. Progress should be a more representative value as it is the 100% we are really looking for.
I am not sure why the zenservice would be set to InMaintenance before running the zenservice restore job as that is something the job is supposed to handle but I have run into it as a possibility with a CP4BA restore scenario. One possible explanation is idempotent scenarios where the job is attempted a second time as it could very easily be stuck set to InMaintenance if the job failed midway through.

I also needed to update the retry count logic to actually decrement the retry count otherwise the job will hang indefinitely.

Update on the scenario:

  • the zen database got dropped as part of the zen restore script
  • this prevents the pg_restore command from working causing the zen restore job to fail and exit
  • Then the restore job restarts (I saw one pod restart originally) and hangs because the zenservice was left InMaintenance by the failed job
  • new restore job hangs indefinitely due to faulty retry_count logic

Which issue(s) this PR fixes:
Fixes # https://github.ibm.com/IBMPrivateCloud/roadmap/issues/66587

Special notes for your reviewer:

  1. How the test is done?

How to backport this PR to other branch:

  1. Add label to this PR with the target branch name backport <branch-name>
  2. The PR will be automatically created in the target branch after merging this PR
  3. If this PR is already merged, you can still add the label with the target branch name backport <branch-name> and leave a comment /backport to trigger the backport action

@ibm-ci-bot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bluzarraga

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ibm-ci-bot ibm-ci-bot added size/M and removed size/S labels May 2, 2025
@bluzarraga bluzarraga changed the title change zenservice check, update retry count [WIP] change zenservice check, update retry count May 2, 2025
@bluzarraga bluzarraga changed the title [WIP] change zenservice check, update retry count No longer drop zen db during restore, change zenservice check, update retry count, change iam-config-job check May 2, 2025
@qpdpQ
Copy link
Contributor

qpdpQ commented May 2, 2025

/lgtm

@ibm-ci-bot ibm-ci-bot added the lgtm label May 2, 2025
@ibm-ci-bot ibm-ci-bot merged commit 654134d into scripts-dev May 2, 2025
3 checks passed
@YCShen1010 YCShen1010 deleted the bluzarraga-patch-4 branch June 10, 2025 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants