Skip to content

Commit deec3af

Browse files
Merge pull request #261791 from neilverse/main
Troubleshooting guide to resolve csn storage stuck pod
2 parents 6d68ddc + d31d9ee commit deec3af

File tree

2 files changed

+65
-0
lines changed

2 files changed

+65
-0
lines changed

articles/operator-nexus/TOC.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,8 @@
155155
href: troubleshoot-kubernetes-cluster-dual-stack-configuration.md
156156
- name: Troubleshoot BMM reboot issues
157157
href: troubleshoot-bmm-node-reboot.md
158+
- name: Troubleshoot Resolve CSN storage pod stuck in ContainerCreating
159+
href: troubleshoot-csn-storage-pod-container-stuck-in-creating.md
158160
- name: Enable node down cleaner
159161
href: troubleshoot-enable-node-down-cleaner.md
160162
- name: BareMetal Actions
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
---
2+
title: Troubleshoot CSN storage pod container stuck in creating
3+
description: Learn what to do when you get CSN storage pod container remains in creating state.
4+
ms.service: azure-operator-nexus
5+
ms.custom: troubleshooting
6+
ms.topic: troubleshooting
7+
ms.date: 12/21/2023
8+
ms.author: soumyamaitra
9+
author: neilverse
10+
---
11+
12+
# CSN storage pod container stuck in `ContainerCreating`
13+
14+
This document details user experience of a rare issue that may render CSN storage pods in `ContainerCreating` state. It also provides a workaround to resolve the issue.
15+
16+
## Cause
17+
18+
A runtime-upgrade replaces the operating system of the Baremetal nodes, which recreates the IQN (iSCSI Qualified Name)
19+
and can cause iscsi login failure in rare occasions.
20+
The iscsi failure occurs on particular nodes where portals login isn't successful. This guide provides a solution for this particular issue.
21+
22+
The guide briefly lays down the process to delete Volumeattachment and restart the pod to resolve the issue.
23+
24+
25+
## Process
26+
27+
Check to see why the pod remains in `ContainerCreating` state:
28+
29+
```Warning FailedMapVolume 52s (x19 over 23m) kubelet MapVolume.SetUpDevice failed for volume "pvc-b38dcc54-5e57-435a-88a0-f91eac594e18" : rpc error: code = Internal desc = required at least 2 portals but found 0 portals```
30+
31+
Here we focus only on `baremetal_machine` where the issue has occurred.
32+
33+
Execute the following run command to solve the issue of pod stuck in containerCreating
34+
```azurecli
35+
az networkcloud baremetalmachine run-command --bare-metal-machine-name <control-plane-baremetal-machine> \
36+
--subscription <subscription> \
37+
--resource-group <cluster-managed-resource-group> \
38+
--limit-time-seconds 60 \
39+
--script "cG9kcz0kKGt1YmVjdGwgZ2V0IHBvZHMgLW4gbmMtc3lzdGVtIHxncmVwIC1pIGNvbnRhaW5lcmNyZWF0aW5nIHwgYXdrICd7cHJpbnQgJDF9JykKCmZvciBwb2RuYW1lIGluICRwb2RzOyBkbwogICAga3ViZWN0bCBkZXNjcmliZSBwbyAkcG9kbmFtZSAtbiBuYy1zeXN0ZW0KCiAgICBwdmNuYW1lPSQoa3ViZWN0bCBnZXQgcG8gJHBvZG5hbWUgLW4gbmMtc3lzdGVtIC1vIGpzb24gfCBqcSAtciAnLnNwZWMudm9sdW1lc1swXS5wZXJzaXN0ZW50Vm9sdW1lQ2xhaW0uY2xhaW1OYW1lJykKCiAgICBwdm5hbWU9JChrdWJlY3RsIGdldCBwdmMgJHB2Y25hbWUgLW4gbmMtc3lzdGVtIC1vIGpzb24gfCBqcSAtciAnLnNwZWMudm9sdW1lTmFtZScpCgogICAgbm9kZW5hbWU9JChrdWJlY3RsIGdldCBwbyAkcG9kbmFtZSAtbiBuYy1zeXN0ZW0gLW9qc29uIHwganEgLXIgJy5zcGVjLm5vZGVOYW1lJykKCiAgICB2b2xhdHRhY2hOYW1lPSQoa3ViZWN0bCBnZXQgdm9sdW1lYXR0YWNobWVudCB8IGdyZXAgLWkgJHB2bmFtZSB8IGF3ayAne3ByaW50ICQxfScpCgogICAga3ViZWN0bCBkZWxldGUgdm9sdW1lYXR0YWNobWVudCAkdm9sYXR0YWNoTmFtZQoKICAgIGt1YmVjdGwgY29yZG9uICRub2RlbmFtZSAtbiBuYy1zeXN0ZW07a3ViZWN0bCBkZWxldGUgcG8gLW4gbmMtc3lzdGVtICRwb2RuYW1lCmRvbmU="
40+
```
41+
42+
The run command executes the following script.
43+
44+
``` console
45+
pods=$(kubectl get pods -n nc-system |grep -i containercreating | awk '{print $1}')
46+
47+
for podname in $pods; do
48+
kubectl describe po $podname -n nc-system
49+
50+
pvcname=$(kubectl get po $podname -n nc-system -o json | jq -r '.spec.volumes[0].persistentVolumeClaim.claimName')
51+
52+
pvname=$(kubectl get pvc $pvcname -n nc-system -o json | jq -r '.spec.volumeName')
53+
54+
nodename=$(kubectl get po $podname -n nc-system -ojson | jq -r '.spec.nodeName')
55+
56+
volattachName=$(kubectl get volumeattachment | grep -i $pvname | awk '{print $1}')
57+
58+
kubectl delete volumeattachment $volattachName
59+
60+
kubectl cordon $nodename -n nc-system;kubectl delete po -n nc-system $podname
61+
done
62+
```
63+
The command retrieves the pvc from the pod and then deletes the `volumeattachment` object. It then deletes the pod. The pod later gets recreated on another node along with a successful volume attachment object.

0 commit comments

Comments
 (0)