You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
enabled by default in the GKE cluster. It is also enabled by default in AKS as part of the
14
14
[AKS Linux Extension](https://learn.microsoft.com/en-us/azure/aks/faq#what-is-the-purpose-of-the-aks-linux-extension-i-see-installed-on-my-linux-vmss-instances).
15
+
15
16
# Background
16
17
17
18
There are tons of node problems that could possibly affect the pods running on the
| [SystemLogMonitor](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/systemlogmonitor) | KernelDeadlock ReadonlyFilesystem FrequentKubeletRestart FrequentDockerRestart FrequentContainerdRestart | A system log monitor monitors system log and reports problems and metrics according to predefined rules. | [filelog](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor-filelog.json), [kmsg](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json), [kernel](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor-counter.json)[abrt](https://github.com/kubernetes/node-problem-detector/blob/master/config/abrt-adaptor.json)[systemd](https://github.com/kubernetes/node-problem-detector/blob/master/config/systemd-monitor-counter.json) | disable_system_log_monitor
68
+
| [SystemLogMonitor](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/systemlogmonitor) | KernelDeadlock ReadonlyFilesystem FrequentKubeletRestart FrequentContainerdRestart | A system log monitor monitors system log and reports problems and metrics according to predefined rules. | [filelog](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor-filelog.json), [kmsg](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json), [kernel](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor-counter.json)[abrt](https://github.com/kubernetes/node-problem-detector/blob/master/config/abrt-adaptor.json)[systemd](https://github.com/kubernetes/node-problem-detector/blob/master/config/systemd-monitor-counter.json) | disable_system_log_monitor
65
69
| [SystemStatsMonitor](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/systemstatsmonitor) | None(Could be added in the future) | A system stats monitor for node-problem-detector to collect various health-related system stats as metrics. See the proposal [here](https://docs.google.com/document/d/1SeaUz6kBavI283Dq8GBpoEUDrHA2a795xtw0OvjM568/edit). | [system-stats-monitor](https://github.com/kubernetes/node-problem-detector/blob/master/config/system-stats-monitor.json) | disable_system_stats_monitor
66
70
| [CustomPluginMonitor](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/custompluginmonitor) | On-demand(According to users configuration), existing example: NTPProblem | A custom plugin monitor for node-problem-detector to invoke and check various node problems with user-defined check scripts. See the proposal [here](https://docs.google.com/document/d/1jK_5YloSYtboj-DtfjmYKxfNnUxCAvohLnsH5aGCAYQ/edit#). | [example](https://github.com/kubernetes/node-problem-detector/blob/4ad49bbd84b8ced45ac825eac01ec93d9235935e/config/custom-plugin-monitor.json) | disable_custom_plugin_monitor
67
-
|[HealthChecker](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/healthchecker)| KubeletUnhealthy ContainerRuntimeUnhealthy| A health checker for node-problem-detector to check kubelet and container runtime health. |[kubelet](https://github.com/kubernetes/node-problem-detector/blob/master/config/health-checker-kubelet.json)[docker](https://github.com/kubernetes/node-problem-detector/blob/master/config/health-checker-docker.json)[containerd](https://github.com/kubernetes/node-problem-detector/blob/master/config/health-checker-containerd.json)|
71
+
|[HealthChecker](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/healthchecker)| KubeletUnhealthy ContainerRuntimeUnhealthy| A health checker for node-problem-detector to check kubelet and container runtime health. |[kubelet](https://github.com/kubernetes/node-problem-detector/blob/master/config/health-checker-kubelet.json)[containerd](https://github.com/kubernetes/node-problem-detector/blob/master/config/health-checker-containerd.json)|
68
72
69
73
# Exporter
70
74
@@ -105,7 +109,6 @@ certain backends. Some of them can be disabled at compile-time using a build tag
105
109
Node problem detector will start a separate custom plugin monitor for each configuration. You can
106
110
use different custom plugin monitors to monitor different node problems.
107
111
108
-
109
112
#### For Health Checkers
110
113
111
114
Health checkers are configured as custom plugins, using the config/health-checker-*.json config files.
@@ -118,9 +121,11 @@ connects the apiserver. This is ignored if `--enable-k8s-exporter` is `false`.
Refer to [heapster docs](https://github.com/kubernetes/heapster/blob/master/docs/source-configuration.md#kubernetes) for a complete list of available options.
125
130
*`--address`: The address to bind the node problem detector server.
126
131
*`--port`: The port to bind the node problem detector server. Use 0 to disable.
@@ -149,7 +154,7 @@ For example, to run without auth, use the following config:
149
154
150
155
* Run `make` in the top directory. It will:
151
156
* Build the binary.
152
-
* Build the docker image. The binary and `config/` are copied into the docker image.
157
+
* Build the container image. The binary and `config/` are copied into the container image.
153
158
154
159
If you do not need certain categories of problem daemons, you could choose to disable them at compilation time. This is the
155
160
best way of keeping your node-problem-detector runtime compact without unnecessary code (e.g. global
@@ -165,7 +170,7 @@ to see how to disable each problem daemon during compilation time.
165
170
166
171
## Push Image
167
172
168
-
`make push` uploads the docker image to a registry. By default, the image will be uploaded to
173
+
`make push` uploads the container image to a registry. By default, the image will be uploaded to
169
174
`staging-k8s.gcr.io`. It's easy to modify the `Makefile` to push the image
170
175
to another registry.
171
176
@@ -198,6 +203,7 @@ To run node-problem-detector standalone, you should set `inClusterConfig` to `fa
198
203
teach node-problem-detector how to access apiserver with `apiserver-override`.
199
204
200
205
To run node-problem-detector standalone with an insecure apiserver connection:
@@ -247,6 +253,7 @@ You can try node-problem-detector in a running cluster by injecting messages to
247
253
When adding new rules or developing node-problem-detector, it is probably easier to test it on the local workstation in the standalone mode. For the API server, an easy way is to use ```kubectl proxy``` to make a running cluster's API server available locally. You will get some errors because your local workstation is not recognized by the API server. But you should still be able to test your new rules regardless.
248
254
249
255
For example, to test [KernelMonitor](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json) rules:
2.```kubectl proxy --port=8080``` (make a running cluster's API server available locally)
252
259
3. Update [KernelMonitor](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json)'s ```logPath``` to your local kernel log directory. For example, on some Linux systems, it is ```/run/log/journal``` instead of ```/var/log/journal```.
@@ -259,9 +266,10 @@ For example, to test [KernelMonitor](https://github.com/kubernetes/node-problem-
259
266
9. You can see disk-related system metrics in Prometheus format at [http://127.0.0.1:20257/metrics](http://127.0.0.1:20257/metrics).
260
267
261
268
**Note**:
262
-
- You can see more rule examples under [test/kernel_log_generator/problems](https://github.com/kubernetes/node-problem-detector/tree/master/test/kernel_log_generator/problems).
263
-
- For [KernelMonitor](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json) message injection, all messages should have ```kernel: ``` prefix (also note there is a space after ```:```); or use [generator.sh](https://github.com/kubernetes/node-problem-detector/blob/master/test/kernel_log_generator/generator.sh).
264
-
- To inject other logs into journald like systemd logs, use ```echo 'Some systemd message' | systemd-cat -t systemd```.
269
+
270
+
* You can see more rule examples under [test/kernel_log_generator/problems](https://github.com/kubernetes/node-problem-detector/tree/master/test/kernel_log_generator/problems).
271
+
* For [KernelMonitor](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json) message injection, all messages should have ```kernel:``` prefix (also note there is a space after ```:```); or use [generator.sh](https://github.com/kubernetes/node-problem-detector/blob/master/test/kernel_log_generator/generator.sh).
272
+
* To inject other logs into journald like systemd logs, use ```echo 'Some systemd message' | systemd-cat -t systemd```.
265
273
266
274
## Dependency Management
267
275
@@ -305,6 +313,7 @@ Kubernetes cluster to a healthy state. The following remedy systems exist:
305
313
NPD is tested via unit tests, [NPD e2e tests](https://github.com/kubernetes/node-problem-detector/blob/master/test/e2e/README.md), Kubernetes e2e tests and Kubernetes nodes e2e tests. Prow handles the [pre-submit tests](https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes/node-problem-detector/node-problem-detector-presubmits.yaml) and [CI tests](https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes/node-problem-detector/node-problem-detector-ci.yaml).
0 commit comments