|
| 1 | +# Debugging examples |
1 | 2 |
|
| 3 | +This guide will walk-through the typical debugging workflow of `kubectl-debug`. |
| 4 | + |
| 5 | +> **Note:** The rest of this document assumes you have installed and properly configured `kubectl-debug` according to the [Project REAMDE](/README.md). |
| 6 | +
|
| 7 | +If you have any real world examples to share with `kubectl-debug`, feel free to open a pull request. |
| 8 | + |
| 9 | +Here's the config file for the following commands for you to re-produce all the command outputs: |
| 10 | + |
| 11 | +```yaml |
| 12 | +agent_port: 10027 |
| 13 | +portForward: true |
| 14 | +agentless: true |
| 15 | +command: |
| 16 | +- '/bin/bash' |
| 17 | +- '-l' |
| 18 | +``` |
| 19 | +
|
| 20 | +## Basic |
| 21 | +
|
| 22 | +`kubectl-debug` use [`nicolaka/netshoot`](https://github.com/nicolaka/netshoot) as the default debug image, the [project document](https://github.com/nicolaka/netshoot/blob/master/README.md) is a great guide about using various tools to troubleshoot your container network. |
| 23 | + |
| 24 | +We will take a few examples here to show how does the powerful `netshoot` work in the `kubectl-debug` context: |
| 25 | + |
| 26 | +Connect to pod: |
| 27 | + |
| 28 | +```shell |
| 29 | +➜ ~ kubectl debug demo-pod |
| 30 | +
|
| 31 | +Agent Pod info: [Name:debug-agent-pod-da46a000-8429-11e9-a40c-8c8590147766, Namespace:default, Image:aylei/debug-agent:latest, HostPort:10027, ContainerPort:10027] |
| 32 | +Waiting for pod debug-agent-pod-da46a000-8429-11e9-a40c-8c8590147766 to run... |
| 33 | +pod demo-pod PodIP 10.233.111.78, agentPodIP 172.16.4.160 |
| 34 | +wait for forward port to debug agent ready... |
| 35 | +Forwarding from 127.0.0.1:10027 -> 10027 |
| 36 | +Forwarding from [::1]:10027 -> 10027 |
| 37 | +Handling connection for 10027 |
| 38 | + pulling image nicolaka/netshoot:latest... |
| 39 | +latest: Pulling from nicolaka/netshoot |
| 40 | +Digest: sha256:5b1f5d66c4fa48a931ff54f2f34e5771eff2bc5e615fef441d5858e30e9bb921 |
| 41 | +Status: Image is up to date for nicolaka/netshoot:latest |
| 42 | +starting debug container... |
| 43 | +container created, open tty... |
| 44 | +
|
| 45 | + [1] 🐳 → hostname |
| 46 | +demo-pod |
| 47 | +``` |
| 48 | + |
| 49 | +Using **iftop** to inspect network traffic: |
| 50 | +```shell |
| 51 | +root @ / |
| 52 | + [2] 🐳 → iftop -i eth0 |
| 53 | +interface: eth0 |
| 54 | +IP address is: 10.233.111.78 |
| 55 | +MAC address is: 86:c3:ae:9d:46:2b |
| 56 | +(CLI graph omitted) |
| 57 | +``` |
| 58 | + |
| 59 | +Using **drill** to diagnose DNS: |
| 60 | +```shell |
| 61 | +root @ / |
| 62 | + [3] 🐳 → drill -V 5 demo-service |
| 63 | +;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 0 |
| 64 | +;; flags: rd ; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 |
| 65 | +;; QUESTION SECTION: |
| 66 | +;; demo-service. IN A |
| 67 | +
|
| 68 | +;; ANSWER SECTION: |
| 69 | +
|
| 70 | +;; AUTHORITY SECTION: |
| 71 | +
|
| 72 | +;; ADDITIONAL SECTION: |
| 73 | +
|
| 74 | +;; Query time: 0 msec |
| 75 | +;; WHEN: Sat Jun 1 05:05:39 2019 |
| 76 | +;; MSG SIZE rcvd: 0 |
| 77 | +;; ->>HEADER<<- opcode: QUERY, rcode: NXDOMAIN, id: 62711 |
| 78 | +;; flags: qr rd ra ; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 |
| 79 | +;; QUESTION SECTION: |
| 80 | +;; demo-service. IN A |
| 81 | +
|
| 82 | +;; ANSWER SECTION: |
| 83 | +
|
| 84 | +;; AUTHORITY SECTION: |
| 85 | +. 30 IN SOA a.root-servers.net. nstld.verisign-grs.com. 2019053101 1800 900 604800 86400 |
| 86 | +
|
| 87 | +;; ADDITIONAL SECTION: |
| 88 | +
|
| 89 | +;; Query time: 58 msec |
| 90 | +;; SERVER: 10.233.0.10 |
| 91 | +;; WHEN: Sat Jun 1 05:05:39 2019 |
| 92 | +;; MSG SIZE rcvd: 121 |
| 93 | +``` |
| 94 | + |
| 95 | +### `proc` filesystem and FUSE |
| 96 | + |
| 97 | +It is common to use tools like `top`, `free` to inspect system metrics like CPU usage and memory. Unfortunately, these commands will display the metrics from the host system by default. Because they read the metrics from the `proc` filesystem (`/proc/*`), which is mounted from the host system. |
| 98 | + |
| 99 | +While this is acceptable (you can still inspect the metrics of container process in the host metrics), this can be misleading and |
| 100 | +counter-intuitive. A common solution is using a [FUSE](https://en.wikipedia.org/wiki/Filesystem_in_Userspace) filesystem, which is out of the scope of `kubectl-debug` plugin. |
| 101 | + |
| 102 | +You may find [this blog post](https://fabiokung.com/2014/03/13/memory-inside-linux-containers/) useful if you want to investigate this problem in depth. |
| 103 | + |
| 104 | +## Access the root filesystem of target container |
| 105 | + |
| 106 | +The root filesystem of target container is located in `/proc/{pid}/root/`, and the `pid` is 1 typically (Pod with [`sharingProcessNamespace`](https://kubernetes.io/docs/tasks/configure-pod-container/share-process-namespace/) enabled is an exception). |
| 107 | + |
| 108 | +```shell |
| 109 | +root @ / |
| 110 | + [4] 🐳 → tail /proc/1/root/log_ |
| 111 | +Hello, world! |
| 112 | +``` |
| 113 | + |
| 114 | +## Debug Pod in "CashLoopBackoff" |
| 115 | + |
| 116 | +Troubleshooting `CashLoopBackoff` of Kubernetes Pod can be tricky. The debug container process will be reaped once the target container (process with pid 1) exists. To tackle with this, `kubectl-debug` provides the `--fork` flag, which borrow the idea from the `oc debug` command: copy the currently Pod and re-produce the issue in the forked Pod. |
| 117 | + |
| 118 | +Under the hood, `kubectl debug --fork` will copy the entire Pod spec and: |
| 119 | + |
| 120 | +* strip all the labels, so that no traffic will be routed from service to this pod; |
| 121 | +* modify the entry-point of target container in order to hold the pid namespace and avoid the Pod crash again; |
| 122 | + |
| 123 | +Here's an example: |
| 124 | + |
| 125 | +```shell |
| 126 | +➜ ~ kubectl debug demo-pod --fork |
| 127 | +Agent Pod info: [Name:debug-agent-pod-dea9e7c8-8439-11e9-883a-8c8590147766, Namespace:default, Image:aylei/debug-agent:latest, HostPort:10027, ContainerPort:10027] |
| 128 | +Waiting for pod debug-agent-pod-dea9e7c8-8439-11e9-883a-8c8590147766 to run... |
| 129 | +Waiting for pod demo-pod-e23c1b68-8439-11e9-883a-8c8590147766-debug to run... |
| 130 | +pod demo-pod PodIP 10.233.111.90, agentPodIP 172.16.4.160 |
| 131 | +wait for forward port to debug agent ready... |
| 132 | +Forwarding from 127.0.0.1:10027 -> 10027 |
| 133 | +Forwarding from [::1]:10027 -> 10027 |
| 134 | +Handling connection for 10027 |
| 135 | + pulling image nicolaka/netshoot:latest... |
| 136 | +latest: Pulling from nicolaka/netshoot |
| 137 | +Digest: sha256:5b1f5d66c4fa48a931ff54f2f34e5771eff2bc5e615fef441d5858e30e9bb921 |
| 138 | +Status: Image is up to date for nicolaka/netshoot:latest |
| 139 | +starting debug container... |
| 140 | +container created, open tty... |
| 141 | +
|
| 142 | + [1] 🐳 → ps -ef |
| 143 | +PID USER TIME COMMAND |
| 144 | + 1 root 0:00 sh -c -- while true; do sleep 30; done; |
| 145 | + 6 root 0:00 sleep 30 |
| 146 | + 7 root 0:00 /bin/bash -l |
| 147 | + 15 root 0:00 ps -ef |
| 148 | +``` |
| 149 | + |
| 150 | +You can `chroot` to the root filesystem of target container to re-produce the error that causes the Pod to crash: |
| 151 | + |
| 152 | +```shell |
| 153 | +root @ / |
| 154 | + [4] 🐳 → chroot /proc/1/root |
| 155 | + |
| 156 | +root @ / |
| 157 | + [#] 🐳 → ls |
| 158 | + bin entrypoint.sh home lib64 mnt root sbin sys tmp var |
| 159 | + dev etc lib media proc run srv usr |
| 160 | + |
| 161 | +root @ / |
| 162 | + [#] 🐳 → ./entrypoint.sh |
| 163 | + (...errors) |
| 164 | +``` |
| 165 | + |
| 166 | +## Debug init container |
| 167 | + |
| 168 | +Just like debugging the ordinary container, we can debug the init-container of Pod. In this case, you must specify the container name of init-container: |
| 169 | + |
| 170 | +```shell |
| 171 | +➜ ~ kubectl debug demo-pod --container=init-pod |
| 172 | +``` |
0 commit comments