Skip to content

Commit 5da72e8

Browse files
author
Xuewei Zhang
committed
Add problem maker to simulate problems for e2e test
1 parent 40cb3e0 commit 5da72e8

File tree

9 files changed

+302
-2
lines changed

9 files changed

+302
-2
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
/bin/
22
/Dockerfile
3+
/test/bin/
34
/*.tar.gz
45
ci.env
56
pr.env

Makefile

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -103,6 +103,13 @@ endif
103103
-tags "$(BUILD_TAGS)" \
104104
./cmd/nodeproblemdetector
105105

106+
./test/bin/problem-maker: $(PKG_SOURCES)
107+
CGO_ENABLED=$(CGO_ENABLED) GOOS=linux GO111MODULE=on go build \
108+
-mod vendor \
109+
-o test/bin/problem-maker \
110+
-tags "$(BUILD_TAGS)" \
111+
./test/e2e/problemmaker/problem_maker.go
112+
106113
Dockerfile: Dockerfile.in
107114
sed -e 's|@BASEIMAGE@|$(BASEIMAGE)|g' $< >$@
108115
ifneq ($(ENABLE_JOURNALD), 1)
@@ -129,8 +136,8 @@ build-binaries: ./bin/node-problem-detector ./bin/log-counter
129136
build-container: build-binaries Dockerfile
130137
docker build -t $(IMAGE) .
131138

132-
build-tar: ./bin/node-problem-detector ./bin/log-counter
133-
tar -zcvf $(TARBALL) bin/ config/ test/e2e-install.sh
139+
build-tar: ./bin/node-problem-detector ./bin/log-counter ./test/bin/problem-maker
140+
tar -zcvf $(TARBALL) bin/ config/ test/e2e-install.sh test/bin/problem-maker
134141
sha1sum $(TARBALL)
135142
md5sum $(TARBALL)
136143

@@ -156,4 +163,5 @@ push: push-container push-tar
156163
clean:
157164
rm -f bin/log-counter
158165
rm -f bin/node-problem-detector
166+
rm -f test/bin/problem-maker
159167
rm -f node-problem-detector-*.tar.gz

README.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,6 +249,26 @@ Kubernetes cluster to a healthy state. The following remedy systems exist:
249249
[this issue](https://github.com/kubernetes/node-problem-detector/issues/199)
250250
for an example production use case for Draino.
251251

252+
# Testing
253+
254+
NPD is tested via unit tests, [NPD e2e tests](https://github.com/kubernetes/node-problem-detector/blob/master/test/e2e/README.md), Kubernetes e2e tests and Kubernetes nodes e2e tests. Prow handles the [pre-submit tests](https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes/node-problem-detector/node-problem-detector-presubmits.yaml) and [CI tests](https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes/node-problem-detector/node-problem-detector-ci.yaml).
255+
256+
CI test results can be found at below:
257+
1. [Unit tests](https://k8s-testgrid.appspot.com/sig-node-node-problem-detector#ci-npd-test)
258+
2. [NPD e2e tests](https://k8s-testgrid.appspot.com/sig-node-node-problem-detector#ci-npd-e2e-test)
259+
3. [Kubernetes e2e tests](https://k8s-testgrid.appspot.com/sig-node-node-problem-detector#ci-npd-e2e-kubernetes-gce-gci)
260+
4. [Kubernetes nodes e2e tests](https://k8s-testgrid.appspot.com/sig-node-node-problem-detector#ci-npd-e2e-node)
261+
262+
## Running tests
263+
264+
Unit test is ran via `make test`.
265+
266+
See [NPD e2e test documentation](https://github.com/kubernetes/node-problem-detector/blob/master/test/e2e/README.md) for how to setup and run NPD e2e tests.
267+
268+
## Problem Maker
269+
270+
[Problem maker](https://github.com/kubernetes/node-problem-detector/blob/master/test/e2e/problemmaker/README.md) is a program used in NPD e2e tests to generate/simulate node problems. It is ONLY indented to be used by NPD e2e tests. Please do NOT run it on your workstation, as it could cause real node problems.
271+
252272
# Docs
253273

254274
* [Custom plugin monitor](docs/custom_plugin_monitor.md)

test/e2e/problemmaker/README.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
# Problem Maker
2+
3+
Problem maker is a program to generate/simulate various kinds of node problems. It is used in NPD e2e tests to verify NPD's behavior when node problems happen:
4+
1. NPD should report the problems correctly.
5+
2. NPD should survive the problems as much as possible.
6+
7+
**Problem maker is NOT intended to be used in any other places. And please do NOT run this directly on your workstation, as it can cause real OS failures.** For example, running `sudo problem-maker --problem Ext4FilesystemError` will cause an ext4 file system error, which could result in the boot disk being mounted as readonly, requiring a reboot to recover from the failure.
8+
9+
You shouldn't need to run it anyways. If you want to test NPD, it's best to run NPD e2e test.
10+
11+
## Developing/Testing Problem Maker
12+
13+
If you want to enrich the problems that problem maker can generate, you may want to run it to test the behavior. Then the recommended way for running it is to run it in a VM:
14+
```
15+
sudo problem-maker --help
16+
sudo problem-maker --problem DockerHung
17+
sudo problem-maker --problem Ext4FilesystemError
18+
```
19+
20+
Problem maker tries to generate real node problems, and can cause real node failures. And when we do not have a good way to generate the problems, we instruct problem maker to simulate problems by injecting logs. In most (if not all) scenarios, generating real problems is preferred over injecting logs. This is because when kernel is upgraded, log patterns can change. NPD e2e tests is supposed to verify whether NPD can correctly understand the tested kernel.
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
/*
2+
Copyright 2019 The Kubernetes Authors All rights reserved.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
*/
16+
17+
package makers
18+
19+
func init() {
20+
ProblemGenerators["DockerHung"] = makeDockerHung
21+
}
22+
23+
func makeDockerHung() {
24+
const dockerHungPattern = `INFO: task docker:20744 blocked for more than 120 seconds.
25+
Tainted: G C 3.16.0-4-amd64 #1
26+
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
27+
docker D ffff8801a8f2b078 0 20744 1 0x00000000
28+
ffff8801a8f2ac20 0000000000000082 0000000000012f00 ffff880057a17fd8
29+
0000000000012f00 ffff8801a8f2ac20 ffffffff818bb4a0 ffff880057a17d80
30+
ffffffff818bb4a4 ffff8801a8f2ac20 00000000ffffffff ffffffff818bb4a8
31+
Call Trace:
32+
[<ffffffff81510915>] ? schedule_preempt_disabled+0x25/0x70
33+
[<ffffffff815123c3>] ? __mutex_lock_slowpath+0xd3/0x1c0
34+
[<ffffffff815124cb>] ? mutex_lock+0x1b/0x2a
35+
[<ffffffff814175bc>] ? copy_net_ns+0x6c/0x130
36+
[<ffffffff8108bdf4>] ? create_new_namespaces+0xf4/0x180
37+
[<ffffffff8108beec>] ? copy_namespaces+0x6c/0x90
38+
[<ffffffff810654f6>] ? copy_process.part.25+0x966/0x1c30
39+
[<ffffffff81066991>] ? do_fork+0xe1/0x390
40+
[<ffffffff811c442c>] ? __alloc_fd+0x7c/0x120
41+
[<ffffffff81514079>] ? stub_clone+0x69/0x90
42+
[<ffffffff81513d0d>] ? system_call_fast_compare_end+0x10/0x15`
43+
44+
writeKernelMessageOrDie(dockerHungPattern)
45+
}
Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
/*
2+
Copyright 2019 The Kubernetes Authors All rights reserved.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
*/
16+
17+
package makers
18+
19+
import (
20+
"io/ioutil"
21+
22+
"github.com/golang/glog"
23+
)
24+
25+
func init() {
26+
ProblemGenerators["Ext4FilesystemError"] = makeFilesystemError
27+
}
28+
29+
const ext4ErrorTrigger = "/sys/fs/ext4/sda1/trigger_fs_error"
30+
31+
func makeFilesystemError() {
32+
msg := []byte("fake filesystem error from problem-maker")
33+
err := ioutil.WriteFile(ext4ErrorTrigger, msg, 0200)
34+
if err != nil {
35+
glog.Fatalf("Failed writting log to %q: %v", ext4ErrorTrigger, err)
36+
}
37+
}
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
/*
2+
Copyright 2019 The Kubernetes Authors All rights reserved.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
*/
16+
17+
package makers
18+
19+
import (
20+
"io/ioutil"
21+
"strings"
22+
23+
"github.com/golang/glog"
24+
)
25+
26+
func init() {
27+
ProblemGenerators["OOMKill"] = makeOOMKill
28+
}
29+
30+
const kmsgPath = "/dev/kmsg"
31+
32+
func makeOOMKill() {
33+
const oomKillPattern = `Memory cgroup out of memory: Kill process 1012 (heapster) score 1035 or sacrifice child
34+
Killed process 1012 (heapster) total-vm:327128kB, anon-rss:306328kB, file-rss:11132kB, shmem-rss:12345kB`
35+
36+
writeKernelMessageOrDie(oomKillPattern)
37+
}
38+
39+
func writeKernelMessageOrDie(msg string) {
40+
for _, line := range strings.Split(msg, "\n") {
41+
err := ioutil.WriteFile(kmsgPath, []byte(line), 0644)
42+
if err != nil {
43+
glog.Fatalf("Failed writting to %q: %v", kmsgPath, err)
44+
}
45+
}
46+
}
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
/*
2+
Copyright 2019 The Kubernetes Authors All rights reserved.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
*/
16+
17+
package makers
18+
19+
var ProblemGenerators = make(map[string]func())
20+
21+
func GetProblemTypes() []string {
22+
var problems []string
23+
for problem := range ProblemGenerators {
24+
problems = append(problems, problem)
25+
}
26+
return problems
27+
}
Lines changed: 96 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,96 @@
1+
/*
2+
Copyright 2019 The Kubernetes Authors All rights reserved.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
*/
16+
17+
package main
18+
19+
import (
20+
"flag"
21+
"fmt"
22+
"os"
23+
"strings"
24+
"time"
25+
26+
"github.com/golang/glog"
27+
"github.com/spf13/pflag"
28+
29+
"k8s.io/node-problem-detector/test/e2e/problemmaker/makers"
30+
)
31+
32+
func init() {
33+
pflag.CommandLine.AddGoFlagSet(flag.CommandLine)
34+
}
35+
36+
type options struct {
37+
// Command line options. See flag descriptions for the description
38+
Rate float32
39+
Duration time.Duration
40+
Problem string
41+
}
42+
43+
// AddFlags adds log counter command line options to pflag.
44+
func (o *options) AddFlags(fs *pflag.FlagSet) {
45+
fs.Float32Var(&o.Rate, "rate", 1.0,
46+
"Number of times the problem should be generated per second")
47+
fs.DurationVar(&o.Duration, "duration", time.Duration(1)*time.Second,
48+
"Duration for problem maker to keep generating problems")
49+
50+
problems := makers.GetProblemTypes()
51+
fs.StringVar(&o.Problem, "problem", "",
52+
fmt.Sprintf("The type of problem to be generated. Supported types: %q",
53+
strings.Join(problems, ", ")))
54+
}
55+
56+
func main() {
57+
// Set glog flag so that it does not log to files.
58+
if err := flag.Set("logtostderr", "true"); err != nil {
59+
fmt.Printf("Failed to set logtostderr=true: %v\n", err)
60+
os.Exit(1)
61+
}
62+
63+
o := options{}
64+
o.AddFlags(pflag.CommandLine)
65+
pflag.Parse()
66+
67+
if o.Problem == "" {
68+
glog.Fatalf("Please specify the type of problem to make using the --problem argument.")
69+
}
70+
71+
problemGenerator, ok := makers.ProblemGenerators[o.Problem]
72+
if !ok {
73+
glog.Fatalf("Expected to see a problem type of one of %q, but got %q.",
74+
makers.GetProblemTypes(), o.Problem)
75+
}
76+
77+
periodMilli := int(1000.0 / o.Rate)
78+
ticker := time.NewTicker(time.Duration(periodMilli) * time.Millisecond)
79+
defer ticker.Stop()
80+
81+
done := make(chan bool)
82+
go func() {
83+
time.Sleep(o.Duration)
84+
done <- true
85+
}()
86+
87+
for {
88+
select {
89+
case <-done:
90+
return
91+
case <-ticker.C:
92+
glog.Infof("Generating problem: %q", o.Problem)
93+
problemGenerator()
94+
}
95+
}
96+
}

0 commit comments

Comments
 (0)