Skip to content

Commit 8ba4bc1

Browse files
authored
Merge pull request #3530 from MicrosoftDocs/repo_sync_working_branch
Confirm merge from repo_sync_working_branch to main to sync with https://github.com/MicrosoftDocs/azure-ai-docs (branch main)
2 parents 593bb4f + f30c109 commit 8ba4bc1

File tree

1 file changed

+48
-0
lines changed

1 file changed

+48
-0
lines changed

articles/machine-learning/how-to-troubleshoot-kubernetes-extension.md

Lines changed: 48 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,54 @@ When you request support, we recommend that you run the following command and se
7878
```bash
7979
kubectl logs healthcheck -n azureml
8080
```
81+
## Extension-operator pod in azure-arc/kube-system namespace is crashing due to OOMKill
82+
This issue happens if the extension's helm chart size is large and there are multiple Helm releases on the cluster. Here is a sample script to help clean up the helm history on the cluster:
83+
```
84+
#!/bin/bash
85+
86+
# Set release name and namespace
87+
RELEASE_NAME=$1
88+
NAMESPACE=$2
89+
90+
# Validate input
91+
if [[ -z "$RELEASE_NAME" || -z "$NAMESPACE" ]]; then
92+
echo "Usage: $0 <release-name> <namespace>"
93+
exit 1
94+
fi
95+
96+
echo "Fetching Helm history for release: $RELEASE_NAME in namespace: $NAMESPACE"
97+
98+
# Get stuck revisions (PENDING_ROLLBACK or PENDING_UPGRADE) using grep + awk for accurate parsing
99+
STUCK_REVISIONS=$(helm history "$RELEASE_NAME" -n "$NAMESPACE" | grep 'pending-' | awk '{print $1}')
100+
101+
if [[ -z "$STUCK_REVISIONS" ]]; then
102+
echo "No stuck Helm revisions found. Nothing to delete."
103+
exit 0
104+
fi
105+
106+
echo "Found stuck Helm revisions: $STUCK_REVISIONS"
107+
108+
# Loop through each stuck revision and delete the corresponding secret
109+
for REVISION in $STUCK_REVISIONS; do
110+
SECRET_NAME="sh.helm.release.v1.${RELEASE_NAME}.v${REVISION}"
111+
112+
echo "Deleting Helm history secret: $SECRET_NAME"
113+
114+
kubectl delete secret -n "$NAMESPACE" "$SECRET_NAME" --ignore-not-found
115+
done
116+
117+
echo "Cleanup complete. Verify with 'helm history $RELEASE_NAME -n $NAMESPACE'"
118+
119+
exit 0
120+
121+
```
122+
123+
How to run the script:
124+
```
125+
chmod +x delete_stuck_helm_secrets.sh
126+
127+
./delete_stuck_helm_secrets.sh my-release my-namespace
128+
```
81129
82130
### Error Code of HealthCheck
83131
This table shows how to troubleshoot the error codes returned by the HealthCheck report.

0 commit comments

Comments
 (0)