This is part of the How-To guide collection. This guide covers KNE troubleshooting techniques and common issues.
The first step in troubleshooting general issues is familiarizing yourself with
common kubectl
commands.
-
get pods/services: Useful for determining basic state info -
describe pods: Useful for more verbose state info -
logs: Useful to get a dump of all pod logs
The -n <namespace> flag is necessary to specify the namespace to inspect. In
KNE there are several namespaces:
- one namespace per topology
- one namespace for
meshnetCNI - one namespace for
metallbingress - one namespace per vendor controller
defaultkube namespace
For an exhaustive list use the -A flag instead of -n.
Use the --progress option to monitor the state of the pods as KNE is coming up.
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
multivendor r1 0/1 ImagePullBackOff 0 44mThis is due to an issue with your cluster fetching container images. Follow the
container image access steps and then
delete/recreate the topology. To check which image is causing the issue, use
kubectl describe command on the problematic pod.
KNE should normally detect this condition and fail the deployment.
After creating a topology, some pods may get stuck in an Init:0/1 state while
others may be stuck Pending.
$ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
multivendor r1 0/1 Init:0/1 0 12m
multivendor r2 0/1 Pending 0 12m
...You may also see logs for the init-container on one of the Init:0/1 pods
that look like this:
$ kubectl logs r1 init-r1 -n multivendor
Waiting for all 2 interfaces to be connected
Connected 1 interfaces out of 2
Connected 1 interfaces out of 2
Connected 1 interfaces out of 2
...The Pending pods may have an error like the following:
$ kubectl describe pod r2 -n multivendor
...
Events:
Type Reason Age From Message
Warning FailedScheduling 9s (x19 over 18m) default-scheduler 0/1 nodes are available: 1 Insufficient cpu.Fix To fix this issue you will need a VM instance with more vCPUs and memory. A machine with 16 vCPUs should be sufficient. Optionally you can deploy a smaller topology instead.
$ ssh 192.168.18.100
ssh: connect to host 192.168.18.100 port 22: Connection refusedValidate the pod is running:
$ kubectl get pod r1 -n multivendor
NAME READY STATUS RESTARTS AGE
r1 1/1 Running 0 4m47sValidate service is exposed:
$ kubectl get services r1 -n multivendor
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service-r1 LoadBalancer 10.96.134.70 192.168.18.100 443:30001/TCP,22:30004/TCP,6030:30005/TCP 4m22sValidate you can exec on the container:
$ kubectl exec -it r1 -n multivendor -- Cli
Defaulted container "r1" out of: r1, init-r1 (init)
error: Internal error occurred: error executing command in
container: failed to exec in container: failed to start exec
"68abfa4f3742c86f49ec00dff629728d96e589c6848d5247e29d396365d6b697": OCI
runtime exec failed: exec failed: container_linux.go:370: starting container
process caused: exec: "Cli": executable file not found in $PATH: unknownFix the systemd path and cgroups directory and then delete/recreate topology.
If you see something similar to the following on a cptx Juniper node:
$ kubectl exec -it r4 -n multivendor -- cli
Defaulted container "r4" out of: r4, init-r4 (init)
System is not yet ready...then your host likely does not support nested virtualization. Run the following
to confirm, if the output is 0 then the host does not support nested
virtualization.
$ grep -cw vmx /proc/cpuinfo
0Enable nested virtualization or move to a new machine that supports it. When done, run the following ensuring a non-zero output:
$ grep -cw vmx /proc/cpuinfo
16Then delete your cluster and start again.