Skip to content

Commit fadcd8a

Browse files
committed
remove Docker.
1 parent f6bb4f7 commit fadcd8a

24 files changed

+30
-346
lines changed

README.md

Lines changed: 17 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -12,10 +12,12 @@ Now it is running as a
1212
[Kubernetes Addon](https://github.com/kubernetes/kubernetes/tree/master/cluster/addons)
1313
enabled by default in the GKE cluster. It is also enabled by default in AKS as part of the
1414
[AKS Linux Extension](https://learn.microsoft.com/en-us/azure/aks/faq#what-is-the-purpose-of-the-aks-linux-extension-i-see-installed-on-my-linux-vmss-instances).
15+
1516
# Background
1617

1718
There are tons of node problems that could possibly affect the pods running on the
1819
node, such as:
20+
1921
* Infrastructure daemon issues: ntp service down;
2022
* Hardware issues: Bad CPU, memory or disk;
2123
* Kernel issues: Kernel deadlock, corrupted file system;
@@ -34,6 +36,7 @@ layers. Once upstream layers have visibility to those problems, we can discuss t
3436

3537
node-problem-detector uses `Event` and `NodeCondition` to report problems to
3638
apiserver.
39+
3740
* `NodeCondition`: Permanent problem that makes the node unavailable for pods should
3841
be reported as `NodeCondition`.
3942
* `Event`: Temporary problem that has limited impact on pod but is informative
@@ -45,6 +48,7 @@ A problem daemon is a sub-daemon of node-problem-detector. It monitors specific
4548
kinds of node problems and reports them to node-problem-detector.
4649

4750
A problem daemon could be:
51+
4852
* A tiny daemon designed for dedicated Kubernetes use-cases.
4953
* An existing node health monitoring daemon integrated with node-problem-detector.
5054

@@ -61,10 +65,10 @@ List of supported problem daemons types:
6165

6266
| Problem Daemon Types | NodeCondition | Description | Configs | Disabling Build Tag |
6367
|----------------|:---------------:|:------------|:--------|:--------------------|
64-
| [SystemLogMonitor](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/systemlogmonitor) | KernelDeadlock ReadonlyFilesystem FrequentKubeletRestart FrequentDockerRestart FrequentContainerdRestart | A system log monitor monitors system log and reports problems and metrics according to predefined rules. | [filelog](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor-filelog.json), [kmsg](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json), [kernel](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor-counter.json) [abrt](https://github.com/kubernetes/node-problem-detector/blob/master/config/abrt-adaptor.json) [systemd](https://github.com/kubernetes/node-problem-detector/blob/master/config/systemd-monitor-counter.json) | disable_system_log_monitor
68+
| [SystemLogMonitor](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/systemlogmonitor) | KernelDeadlock ReadonlyFilesystem FrequentKubeletRestart FrequentContainerdRestart | A system log monitor monitors system log and reports problems and metrics according to predefined rules. | [filelog](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor-filelog.json), [kmsg](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json), [kernel](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor-counter.json) [abrt](https://github.com/kubernetes/node-problem-detector/blob/master/config/abrt-adaptor.json) [systemd](https://github.com/kubernetes/node-problem-detector/blob/master/config/systemd-monitor-counter.json) | disable_system_log_monitor
6569
| [SystemStatsMonitor](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/systemstatsmonitor) | None(Could be added in the future) | A system stats monitor for node-problem-detector to collect various health-related system stats as metrics. See the proposal [here](https://docs.google.com/document/d/1SeaUz6kBavI283Dq8GBpoEUDrHA2a795xtw0OvjM568/edit). | [system-stats-monitor](https://github.com/kubernetes/node-problem-detector/blob/master/config/system-stats-monitor.json) | disable_system_stats_monitor
6670
| [CustomPluginMonitor](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/custompluginmonitor) | On-demand(According to users configuration), existing example: NTPProblem | A custom plugin monitor for node-problem-detector to invoke and check various node problems with user-defined check scripts. See the proposal [here](https://docs.google.com/document/d/1jK_5YloSYtboj-DtfjmYKxfNnUxCAvohLnsH5aGCAYQ/edit#). | [example](https://github.com/kubernetes/node-problem-detector/blob/4ad49bbd84b8ced45ac825eac01ec93d9235935e/config/custom-plugin-monitor.json) | disable_custom_plugin_monitor
67-
| [HealthChecker](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/healthchecker) | KubeletUnhealthy ContainerRuntimeUnhealthy| A health checker for node-problem-detector to check kubelet and container runtime health. | [kubelet](https://github.com/kubernetes/node-problem-detector/blob/master/config/health-checker-kubelet.json) [docker](https://github.com/kubernetes/node-problem-detector/blob/master/config/health-checker-docker.json) [containerd](https://github.com/kubernetes/node-problem-detector/blob/master/config/health-checker-containerd.json) |
71+
| [HealthChecker](https://github.com/kubernetes/node-problem-detector/tree/master/pkg/healthchecker) | KubeletUnhealthy ContainerRuntimeUnhealthy| A health checker for node-problem-detector to check kubelet and container runtime health. | [kubelet](https://github.com/kubernetes/node-problem-detector/blob/master/config/health-checker-kubelet.json) [containerd](https://github.com/kubernetes/node-problem-detector/blob/master/config/health-checker-containerd.json) |
6872

6973
# Exporter
7074

@@ -105,7 +109,6 @@ certain backends. Some of them can be disabled at compile-time using a build tag
105109
Node problem detector will start a separate custom plugin monitor for each configuration. You can
106110
use different custom plugin monitors to monitor different node problems.
107111

108-
109112
#### For Health Checkers
110113

111114
Health checkers are configured as custom plugins, using the config/health-checker-*.json config files.
@@ -118,9 +121,11 @@ connects the apiserver. This is ignored if `--enable-k8s-exporter` is `false`.
118121
[`source`](https://github.com/kubernetes/heapster/blob/master/docs/source-configuration.md#kubernetes)
119122
flag of [Heapster](https://github.com/kubernetes/heapster).
120123
For example, to run without auth, use the following config:
124+
121125
```
122126
http://APISERVER_IP:APISERVER_PORT?inClusterConfig=false
123127
```
128+
124129
Refer to [heapster docs](https://github.com/kubernetes/heapster/blob/master/docs/source-configuration.md#kubernetes) for a complete list of available options.
125130
* `--address`: The address to bind the node problem detector server.
126131
* `--port`: The port to bind the node problem detector server. Use 0 to disable.
@@ -149,7 +154,7 @@ For example, to run without auth, use the following config:
149154

150155
* Run `make` in the top directory. It will:
151156
* Build the binary.
152-
* Build the docker image. The binary and `config/` are copied into the docker image.
157+
* Build the container image. The binary and `config/` are copied into the container image.
153158

154159
If you do not need certain categories of problem daemons, you could choose to disable them at compilation time. This is the
155160
best way of keeping your node-problem-detector runtime compact without unnecessary code (e.g. global
@@ -165,7 +170,7 @@ to see how to disable each problem daemon during compilation time.
165170

166171
## Push Image
167172

168-
`make push` uploads the docker image to a registry. By default, the image will be uploaded to
173+
`make push` uploads the container image to a registry. By default, the image will be uploaded to
169174
`staging-k8s.gcr.io`. It's easy to modify the `Makefile` to push the image
170175
to another registry.
171176

@@ -198,6 +203,7 @@ To run node-problem-detector standalone, you should set `inClusterConfig` to `fa
198203
teach node-problem-detector how to access apiserver with `apiserver-override`.
199204

200205
To run node-problem-detector standalone with an insecure apiserver connection:
206+
201207
```
202208
node-problem-detector --apiserver-override=http://APISERVER_IP:APISERVER_INSECURE_PORT?inClusterConfig=false
203209
```
@@ -247,6 +253,7 @@ You can try node-problem-detector in a running cluster by injecting messages to
247253
When adding new rules or developing node-problem-detector, it is probably easier to test it on the local workstation in the standalone mode. For the API server, an easy way is to use ```kubectl proxy``` to make a running cluster's API server available locally. You will get some errors because your local workstation is not recognized by the API server. But you should still be able to test your new rules regardless.
248254

249255
For example, to test [KernelMonitor](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json) rules:
256+
250257
1. ```make``` (build node-problem-detector locally)
251258
2. ```kubectl proxy --port=8080``` (make a running cluster's API server available locally)
252259
3. Update [KernelMonitor](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json)'s ```logPath``` to your local kernel log directory. For example, on some Linux systems, it is ```/run/log/journal``` instead of ```/var/log/journal```.
@@ -259,9 +266,10 @@ For example, to test [KernelMonitor](https://github.com/kubernetes/node-problem-
259266
9. You can see disk-related system metrics in Prometheus format at [http://127.0.0.1:20257/metrics](http://127.0.0.1:20257/metrics).
260267

261268
**Note**:
262-
- You can see more rule examples under [test/kernel_log_generator/problems](https://github.com/kubernetes/node-problem-detector/tree/master/test/kernel_log_generator/problems).
263-
- For [KernelMonitor](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json) message injection, all messages should have ```kernel: ``` prefix (also note there is a space after ```:```); or use [generator.sh](https://github.com/kubernetes/node-problem-detector/blob/master/test/kernel_log_generator/generator.sh).
264-
- To inject other logs into journald like systemd logs, use ```echo 'Some systemd message' | systemd-cat -t systemd```.
269+
270+
* You can see more rule examples under [test/kernel_log_generator/problems](https://github.com/kubernetes/node-problem-detector/tree/master/test/kernel_log_generator/problems).
271+
* For [KernelMonitor](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json) message injection, all messages should have ```kernel:``` prefix (also note there is a space after ```:```); or use [generator.sh](https://github.com/kubernetes/node-problem-detector/blob/master/test/kernel_log_generator/generator.sh).
272+
* To inject other logs into journald like systemd logs, use ```echo 'Some systemd message' | systemd-cat -t systemd```.
265273

266274
## Dependency Management
267275

@@ -305,6 +313,7 @@ Kubernetes cluster to a healthy state. The following remedy systems exist:
305313
NPD is tested via unit tests, [NPD e2e tests](https://github.com/kubernetes/node-problem-detector/blob/master/test/e2e/README.md), Kubernetes e2e tests and Kubernetes nodes e2e tests. Prow handles the [pre-submit tests](https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes/node-problem-detector/node-problem-detector-presubmits.yaml) and [CI tests](https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes/node-problem-detector/node-problem-detector-ci.yaml).
306314

307315
CI test results can be found below:
316+
308317
1. [Unit tests](https://testgrid.k8s.io/sig-node-node-problem-detector#ci-npd-test)
309318
2. [NPD e2e tests](https://testgrid.k8s.io/sig-node-node-problem-detector#ci-npd-e2e-test)
310319
3. [Kubernetes e2e tests](https://testgrid.k8s.io/sig-node-node-problem-detector#ci-npd-e2e-kubernetes-gce-gci)

cmd/healthchecker/options/options.go

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -49,15 +49,15 @@ type HealthCheckerOptions struct {
4949
// AddFlags adds health checker command line options to pflag.
5050
func (hco *HealthCheckerOptions) AddFlags(fs *pflag.FlagSet) {
5151
fs.StringVar(&hco.Component, "component", types.KubeletComponent,
52-
"The component to check health for. Supports kubelet, docker, kube-proxy, and cri")
52+
"The component to check health for. Supports kubelet, kube-proxy, and cri")
5353
// Deprecated: For backward compatibility on linux environment. Going forward "service" will be used instead of systemd-service
5454
if runtime.GOOS == "linux" {
5555
fs.MarkDeprecated("systemd-service", "please use --service flag instead")
5656
fs.StringVar(&hco.Service, "systemd-service", "",
57-
"The underlying service responsible for the component. Set to the corresponding component for docker and kubelet, containerd for cri.")
57+
"The underlying service responsible for the component. Set to the corresponding component for kubelet, containerd for cri.")
5858
}
5959
fs.StringVar(&hco.Service, "service", "",
60-
"The underlying service responsible for the component. Set to the corresponding component for docker and kubelet, containerd for cri.")
60+
"The underlying service responsible for the component. Set to the corresponding component for kubelet, containerd for cri.")
6161
fs.BoolVar(&hco.EnableRepair, "enable-repair", true, "Flag to enable/disable repair attempt for the component.")
6262
fs.StringVar(&hco.CriCtlPath, "crictl-path", types.DefaultCriCtl,
6363
"The path to the crictl binary. This is used to check health of cri component.")
@@ -79,9 +79,8 @@ func (hco *HealthCheckerOptions) AddFlags(fs *pflag.FlagSet) {
7979
// Returns error if invalid, nil otherwise.
8080
func (hco *HealthCheckerOptions) IsValid() error {
8181
// Make sure the component specified is valid.
82-
if hco.Component != types.KubeletComponent && hco.Component != types.DockerComponent &&
83-
hco.Component != types.CRIComponent && hco.Component != types.KubeProxyComponent {
84-
return fmt.Errorf("the component specified is not supported. Supported components are : <kubelet/docker/cri/kube-proxy>")
82+
if hco.Component != types.KubeletComponent && hco.Component != types.CRIComponent && hco.Component != types.KubeProxyComponent {
83+
return fmt.Errorf("the component specified is not supported. Supported components are : <kubelet/cri/kube-proxy>")
8584
}
8685
// Make sure the service is specified if repair is enabled.
8786
if hco.EnableRepair && hco.Service == "" {

cmd/logcounter/options/options.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ type LogCounterOptions struct {
4040

4141
// AddFlags adds log counter command line options to pflag.
4242
func (fedo *LogCounterOptions) AddFlags(fs *pflag.FlagSet) {
43-
fs.StringVar(&fedo.JournaldSource, "journald-source", "", "The source configuration of journald, e.g., kernel, kubelet, dockerd, etc")
43+
fs.StringVar(&fedo.JournaldSource, "journald-source", "", "The source configuration of journald, e.g., kernel, kubelet, etc")
4444
fs.StringVar(&fedo.LogPath, "log-path", "", "The log path that log watcher looks up")
4545
fs.StringVar(&fedo.Lookback, "lookback", "", "The time log watcher looks up")
4646
fs.StringVar(&fedo.Delay, "delay", "",

config/docker-monitor-counter.json

Lines changed: 0 additions & 33 deletions
This file was deleted.

config/docker-monitor-filelog.json

Lines changed: 0 additions & 20 deletions
This file was deleted.

config/docker-monitor.json

Lines changed: 0 additions & 36 deletions
This file was deleted.

config/health-checker-docker.json

Lines changed: 0 additions & 33 deletions
This file was deleted.

config/kernel-monitor-filelog.json

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -41,12 +41,6 @@
4141
"type": "temporary",
4242
"reason": "KernelOops",
4343
"pattern": "divide error: 0000 \\[#\\d+\\] SMP"
44-
},
45-
{
46-
"type": "permanent",
47-
"condition": "KernelDeadlock",
48-
"reason": "DockerHung",
49-
"pattern": "task docker:\\w+ blocked for more than \\w+ seconds\\."
5044
}
5145
]
5246
}

config/kernel-monitor.json

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -89,12 +89,6 @@
8989
"condition": "CperHardwareErrorFatal",
9090
"reason": "CperHardwareErrorFatal",
9191
"pattern": ".*\\[Hardware Error\\]: event severity: fatal$"
92-
},
93-
{
94-
"type": "permanent",
95-
"condition": "KernelDeadlock",
96-
"reason": "DockerHung",
97-
"pattern": "task docker:\\w+ blocked for more than \\w+ seconds\\."
9892
}
9993
]
10094
}

config/systemd-monitor-counter.json

Lines changed: 0 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,6 @@
1414
"reason": "NoFrequentKubeletRestart",
1515
"message": "kubelet is functioning properly"
1616
},
17-
{
18-
"type": "FrequentDockerRestart",
19-
"reason": "NoFrequentDockerRestart",
20-
"message": "docker is functioning properly"
21-
},
2217
{
2318
"type": "FrequentContainerdRestart",
2419
"reason": "NoFrequentContainerdRestart",
@@ -42,21 +37,6 @@
4237
],
4338
"timeout": "1m"
4439
},
45-
{
46-
"type": "permanent",
47-
"condition": "FrequentDockerRestart",
48-
"reason": "FrequentDockerRestart",
49-
"path": "/home/kubernetes/bin/log-counter",
50-
"args": [
51-
"--journald-source=systemd",
52-
"--log-path=/var/log/journal",
53-
"--lookback=20m",
54-
"--count=5",
55-
"--pattern=Starting (Docker Application Container Engine|docker.service|docker.service - Docker Application Container Engine)...",
56-
"--revert-pattern=Stopping (Docker Application Container Engine|docker.service|docker.service - Docker Application Container Engine)..."
57-
],
58-
"timeout": "1m"
59-
},
6040
{
6141
"type": "permanent",
6242
"condition": "FrequentContainerdRestart",

0 commit comments

Comments
 (0)