You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<!-- Provide a brief summary of your changes -->
## Motivation and Context
<!-- Why is this change needed? What problem does it solve? -->
The following PR addresses the failing sync job -
https://github.com/modelcontextprotocol/registry/actions/runs/18374711984/job/52345853050
**Root Cause**
1. Job detection failure: The workflow was looking for a job with a
specific labeling pattern (k8up.io/owned-by=restore), but k8up creates
jobs with a naming convention `restore-<restore-name>`
2. Race condition: The cleanup step (with if: always()) was deleting the
prod-to-staging-sync-credentials secret even when the restore job hadn't
been created yet or was still running
**The Fix**
1. Updated job detection logic: Now searches for jobs by name pattern
restore-$RESTORE_NAME instead of relying on labels
2. Added better diagnostics: If the job isn't found, the workflow now
shows:
- Restore resource status
- Any existing restore jobs
- k8up operator logs
3. Fixed cleanup race condition: The cleanup step now waits up to 2
minutes for any running restore jobs to complete before deleting
credentials
4. Improved error handling: Better logging when restore jobs fail,
including checking if credentials still exist
**The changes ensure:**
- The workflow correctly finds the k8up restore job
- Credentials aren't deleted while jobs are still running
- Better visibility into failures for easier debugging
## How Has This Been Tested?
<!-- Have you tested this in a real application? Which scenarios were
tested? -->
## Breaking Changes
<!-- Will users need to update their code or configurations? -->
## Types of changes
<!-- What types of changes does your code introduce? Put an `x` in all
the boxes that apply: -->
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Breaking change (fix or feature that would cause existing
functionality to change)
- [ ] Documentation update
## Checklist
<!-- Go over all the following points, and put an `x` in all the boxes
that apply. -->
- [ ] I have read the [MCP
Documentation](https://modelcontextprotocol.io)
- [ ] My code follows the repository's style guidelines
- [ ] New and existing tests pass locally
- [ ] I have added appropriate error handling
- [ ] I have added or updated documentation as needed
## Additional context
<!-- Add any other context, implementation notes, or design decisions
-->
---------
Signed-off-by: Radoslav Dimitrov <[email protected]>
Copy file name to clipboardExpand all lines: .github/workflows/sync-db.yml
+53-6Lines changed: 53 additions & 6 deletions
Original file line number
Diff line number
Diff line change
@@ -139,29 +139,48 @@ jobs:
139
139
sleep 15
140
140
141
141
# Find the job created by k8up for this restore
142
-
for i in {1..30}; do
143
-
JOB_NAME=$(kubectl get jobs -n default -l k8up.io/owned-by=restore -o jsonpath='{.items[?(@.metadata.ownerReferences[0].name=="'$RESTORE_NAME'")].metadata.name}' 2>/dev/null)
142
+
# k8up creates jobs with name pattern "restore-<restore-name>"
143
+
# Since our restore is named "restore-from-prod-*", the job will be "restore-restore-from-prod-*"
144
+
for i in {1..60}; do
145
+
JOB_NAME=$(kubectl get jobs -n default --no-headers 2>/dev/null | grep "^restore-$RESTORE_NAME" | awk '{print $1}' | head -1)
144
146
if [ -n "$JOB_NAME" ]; then
145
147
echo "Found restore job: $JOB_NAME"
146
148
break
147
149
fi
148
-
echo "Waiting for job to be created... ($i/30)"
150
+
echo "Waiting for job to be created... ($i/60)"
149
151
sleep 2
150
152
done
151
153
152
154
if [ -z "$JOB_NAME" ]; then
153
-
echo "ERROR: Restore job not found"
154
-
kubectl get restore $RESTORE_NAME -n default -o yaml
155
+
echo "ERROR: Restore job not found after 120 seconds"
156
+
echo "Checking restore resource status:"
157
+
kubectl get restore $RESTORE_NAME -n default
158
+
kubectl describe restore $RESTORE_NAME -n default
159
+
160
+
echo "Checking for any restore jobs:"
161
+
kubectl get jobs -n default | grep restore || echo "No restore jobs found"
0 commit comments