Skip to content

Commit aea91e3

Browse files
authored
Merge pull request #335 from xueweiz/sd
Add Stackdriver exporter
2 parents 0fdff95 + 0f0e5ef commit aea91e3

File tree

548 files changed

+91722
-1559
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

548 files changed

+91722
-1559
lines changed

.travis.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,5 @@ script:
2727
- BUILD_TAGS="disable_system_log_monitor" make test
2828
- make clean && BUILD_TAGS="disable_system_stats_monitor" make
2929
- BUILD_TAGS="disable_system_stats_monitor" make test
30+
- make clean && BUILD_TAGS="disable_stackdriver_exporter" make
31+
- BUILD_TAGS="disable_stackdriver_exporter" make test

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ PKG:=k8s.io/node-problem-detector
4141
PKG_SOURCES:=$(shell find pkg cmd -name '*.go')
4242

4343
# TARBALL is the name of release tar. Include binary version by default.
44-
TARBALL:=node-problem-detector-$(VERSION).tar.gz
44+
TARBALL?=node-problem-detector-$(VERSION).tar.gz
4545

4646
# IMAGE is the image name of the node problem detector container image.
4747
IMAGE:=$(REGISTRY)/node-problem-detector:$(TAG)

README.md

Lines changed: 33 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -69,27 +69,44 @@ List of supported problem daemons:
6969
# Exporter
7070

7171
An exporter is a component of node-problem-detector. It reports node problems and/or metrics to
72-
certain back end (e.g. Kubernetes API server, or Prometheus scrape endpoint).
72+
certain back end. Some of them can be disable at compile time using a build tag. List of supported exporters:
73+
74+
| Exporter |Description | Disabling Build Tag |
75+
|----------|:-----------|:--------------------|
76+
| Kubernetes exporter | Kubernetes exporter reports node problems to Kubernetes API server: temporary problems get reported as Events, and permanent problems get reported as Node Conditions. |
77+
| Prometheus exporter | Prometheus exporter reports node problems and metrics locally as Prometheus metrics |
78+
| [Stackdriver exporter](https://github.com/kubernetes/node-problem-detector/blob/master/config/exporter/stackdriver-exporter.json) | Stackdriver exporter reports node problems and metrics to Stackdriver Monitoring API. | disable_stackdriver_exporter
7379

7480
# Usage
7581

7682
## Flags
7783

7884
* `--version`: Print current version of node-problem-detector.
79-
* `--address`: The address to bind the node problem detector server.
80-
* `--port`: The port to bind the node problem detector server. Use 0 to disable.
85+
* `--hostname-override`: A customized node name used for node-problem-detector to update conditions and emit events. node-problem-detector gets node name first from `hostname-override`, then `NODE_NAME` environment variable and finally fall back to `os.Hostname`.
86+
87+
#### For System Log Monitor
88+
8189
* `--config.system-log-monitor`: List of paths to system log monitor configuration files, comma separated, e.g.
8290
[config/kernel-monitor.json](https://github.com/kubernetes/node-problem-detector/blob/master/config/kernel-monitor.json).
8391
Node problem detector will start a separate log monitor for each configuration. You can
8492
use different log monitors to monitor different system log.
85-
* `--config.custom-plugin-monitor`: List of paths to custom plugin monitor config files, comma separated, e.g.
86-
[config/custom-plugin-monitor.json](https://github.com/kubernetes/node-problem-detector/blob/master/config/custom-plugin-monitor.json).
87-
Node problem detector will start a separate custom plugin monitor for each configuration. You can
88-
use different custom plugin monitors to monitor different node problems.
93+
94+
#### For System Stats Monitor
95+
8996
* `--config.system-stats-monitor`: List of paths to system stats monitor config files, comma separated, e.g.
9097
[config/system-stats-monitor.json](https://github.com/kubernetes/node-problem-detector/blob/master/config/system-stats-monitor.json).
9198
Node problem detector will start a separate system stats monitor for each configuration. You can
9299
use different system stats monitors to monitor different problem-related system stats.
100+
101+
#### For Custom Plugin Monitor
102+
103+
* `--config.custom-plugin-monitor`: List of paths to custom plugin monitor config files, comma separated, e.g.
104+
[config/custom-plugin-monitor.json](https://github.com/kubernetes/node-problem-detector/blob/master/config/custom-plugin-monitor.json).
105+
Node problem detector will start a separate custom plugin monitor for each configuration. You can
106+
use different custom plugin monitors to monitor different node problems.
107+
108+
#### For Kubernetes exporter
109+
93110
* `--enable-k8s-exporter`: Enables reporting to Kubernetes API server, default to `true`.
94111
* `--apiserver-override`: A URI parameter used to customize how node-problem-detector
95112
connects the apiserver. This is ignored if `--enable-k8s-exporter` is `false`. The format is same as the
@@ -100,10 +117,18 @@ For example, to run without auth, use the following config:
100117
http://APISERVER_IP:APISERVER_PORT?inClusterConfig=false
101118
```
102119
Refer [heapster docs](https://github.com/kubernetes/heapster/blob/master/docs/source-configuration.md#kubernetes) for a complete list of available options.
103-
* `--hostname-override`: A customized node name used for node-problem-detector to update conditions and emit events. node-problem-detector gets node name first from `hostname-override`, then `NODE_NAME` environment variable and finally fall back to `os.Hostname`.
120+
* `--address`: The address to bind the node problem detector server.
121+
* `--port`: The port to bind the node problem detector server. Use 0 to disable.
122+
123+
#### For Prometheus exporter
124+
104125
* `--prometheus-address`: The address to bind the Prometheus scrape endpoint, default to `127.0.0.1`.
105126
* `--prometheus-port`: The port to bind the Prometheus scrape endpoint, default to 20257. Use 0 to disable.
106127

128+
#### For Stackdriver exporter
129+
130+
* `--exporter.stackdriver`: Path to a Stackdriver exporter config file, e.g. [config/exporter/stackdriver-exporter.json](https://github.com/kubernetes/node-problem-detector/blob/master/config/exporter/stackdriver-exporter.json), default to empty string. Set to empty string to disable.
131+
107132
### Deprecated Flags
108133

109134
* `--system-log-monitors`: List of paths to system log monitor config files, comma separated. This option is deprecated, replaced by `--config.system-log-monitor`, and will be removed. NPD will panic if both `--system-log-monitors` and `--config.system-log-monitor` are set.
Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
/*
2+
Copyright 2019 The Kubernetes Authors All rights reserved.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
*/
16+
17+
package exporterplugins
18+
19+
// This file is necessary to make sure the exporterplugins package non-empty
20+
// under any build tags.
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
// +build !disable_stackdriver_exporter
2+
3+
/*
4+
Copyright 2019 The Kubernetes Authors All rights reserved.
5+
6+
Licensed under the Apache License, Version 2.0 (the "License");
7+
you may not use this file except in compliance with the License.
8+
You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing, software
13+
distributed under the License is distributed on an "AS IS" BASIS,
14+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
See the License for the specific language governing permissions and
16+
limitations under the License.
17+
*/
18+
19+
package exporterplugins
20+
21+
import (
22+
_ "k8s.io/node-problem-detector/pkg/exporters/stackdriver"
23+
)
24+
25+
// The stackdriver plugin takes about 6MB in the NPD binary.

cmd/nodeproblemdetector/node_problem_detector.go

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,10 @@ import (
2222
"github.com/golang/glog"
2323
"github.com/spf13/pflag"
2424

25+
_ "k8s.io/node-problem-detector/cmd/nodeproblemdetector/exporterplugins"
2526
_ "k8s.io/node-problem-detector/cmd/nodeproblemdetector/problemdaemonplugins"
2627
"k8s.io/node-problem-detector/cmd/options"
28+
"k8s.io/node-problem-detector/pkg/exporters"
2729
"k8s.io/node-problem-detector/pkg/exporters/k8sexporter"
2830
"k8s.io/node-problem-detector/pkg/exporters/prometheusexporter"
2931
"k8s.io/node-problem-detector/pkg/problemdaemon"
@@ -54,21 +56,28 @@ func main() {
5456
}
5557

5658
// Initialize exporters.
57-
exporters := []types.Exporter{}
59+
defaultExporters := []types.Exporter{}
5860
if ke := k8sexporter.NewExporterOrDie(npdo); ke != nil {
59-
exporters = append(exporters, ke)
61+
defaultExporters = append(defaultExporters, ke)
6062
glog.Info("K8s exporter started.")
6163
}
6264
if pe := prometheusexporter.NewExporterOrDie(npdo); pe != nil {
63-
exporters = append(exporters, pe)
65+
defaultExporters = append(defaultExporters, pe)
6466
glog.Info("Prometheus exporter started.")
6567
}
66-
if len(exporters) == 0 {
68+
69+
plugableExporters := exporters.NewExporters()
70+
71+
npdExporters := []types.Exporter{}
72+
npdExporters = append(npdExporters, defaultExporters...)
73+
npdExporters = append(npdExporters, plugableExporters...)
74+
75+
if len(npdExporters) == 0 {
6776
glog.Fatalf("No exporter is successfully setup")
6877
}
6978

7079
// Initialize NPD core.
71-
p := problemdetector.NewProblemDetector(problemDaemons, exporters)
80+
p := problemdetector.NewProblemDetector(problemDaemons, npdExporters)
7281
if err := p.Run(); err != nil {
7382
glog.Fatalf("Problem detector failed with error: %v", err)
7483
}

cmd/options/options.go

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,7 @@ import (
2626

2727
"github.com/spf13/pflag"
2828

29+
"k8s.io/node-problem-detector/pkg/exporters"
2930
"k8s.io/node-problem-detector/pkg/problemdaemon"
3031
"k8s.io/node-problem-detector/pkg/types"
3132
)
@@ -86,6 +87,7 @@ type NodeProblemDetectorOptions struct {
8687

8788
func NewNodeProblemDetectorOptions() *NodeProblemDetectorOptions {
8889
npdo := &NodeProblemDetectorOptions{MonitorConfigPaths: types.ProblemDaemonConfigPathMap{}}
90+
8991
for _, problemDaemonName := range problemdaemon.GetProblemDaemonNames() {
9092
npdo.MonitorConfigPaths[problemDaemonName] = &[]string{}
9193
}
@@ -118,6 +120,10 @@ func (npdo *NodeProblemDetectorOptions) AddFlags(fs *pflag.FlagSet) {
118120
fs.StringVar(&npdo.PrometheusServerAddress, "prometheus-address",
119121
"127.0.0.1", "The address to bind the Prometheus scrape endpoint.")
120122

123+
for _, exporterName := range exporters.GetExporterNames() {
124+
exporterHandler := exporters.GetExporterHandlerOrDie(exporterName)
125+
exporterHandler.Options.SetFlags(fs)
126+
}
121127
for _, problemDaemonName := range problemdaemon.GetProblemDaemonNames() {
122128
fs.StringSliceVar(
123129
npdo.MonitorConfigPaths[problemDaemonName],
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"apiEndpoint": "monitoring.googleapis.com:443",
3+
"exportPeriod": "60s",
4+
"metadataFetchTimeout": "600s",
5+
"metadataFetchInterval": "10s",
6+
"panicOnMetadataFetchFailure": false,
7+
"customMetricPrefix": ""
8+
}

config/systemd/node-problem-detector-metric-only.service

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,13 @@
11
[Unit]
22
Description=Node problem detector
3-
Wants=local-fs.target
4-
After=local-fs.target
3+
Wants=network-online.target
4+
After=network-online.target
55

66
[Service]
77
Restart=always
88
RestartSec=10
99
ExecStart=/home/kubernetes/bin/node-problem-detector --v=2 --logtostderr --enable-k8s-exporter=false \
10+
--exporter.stackdriver=/home/kubernetes/node-problem-detector/config/exporter/stackdriver-exporter.json \
1011
--config.system-log-monitor=/home/kubernetes/node-problem-detector/config/kernel-monitor.json,/home/kubernetes/node-problem-detector/config/docker-monitor.json,/home/kubernetes/node-problem-detector/config/systemd-monitor.json \
1112
--config.custom-plugin-monitor=/home/kubernetes/node-problem-detector/config/kernel-monitor-counter.json,/home/kubernetes/node-problem-detector/config/systemd-monitor-counter.json \
1213
--config.system-stats-monitor=/home/kubernetes/node-problem-detector/config/system-stats-monitor.json

go.mod

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ go 1.11
55
require (
66
code.cloudfoundry.org/clock v0.0.0-20180518195852-02e53af36e6c
77
contrib.go.opencensus.io/exporter/prometheus v0.0.0-20190427222117-f6cda26f80a3
8+
contrib.go.opencensus.io/exporter/stackdriver v0.12.5
89
github.com/PuerkitoBio/purell v1.0.0 // indirect
910
github.com/PuerkitoBio/urlesc v0.0.0-20160726150825-5bd2802263f2 // indirect
1011
github.com/StackExchange/wmi v0.0.0-20181212234831-e0a55b97c705 // indirect
@@ -23,7 +24,6 @@ require (
2324
github.com/go-openapi/swag v0.0.0-20160704191624-1d0bd113de87 // indirect
2425
github.com/golang/glog v0.0.0-20160126235308-23def4e6c14b
2526
github.com/golang/groupcache v0.0.0-20150125180832-604ed5785183 // indirect
26-
github.com/google/btree v1.0.0 // indirect
2727
github.com/google/cadvisor v0.33.0
2828
github.com/google/gofuzz v0.0.0-20161122191042-44d81051d367 // indirect
2929
github.com/googleapis/gnostic v0.1.0 // indirect
@@ -47,8 +47,7 @@ require (
4747
github.com/spf13/pflag v1.0.3
4848
github.com/stretchr/testify v1.3.0
4949
github.com/tedsuo/ifrit v0.0.0-20180802180643-bea94bb476cc // indirect
50-
go.opencensus.io v0.21.0
51-
golang.org/x/net v0.0.0-20190603091049-60506f45cf65 // indirect
50+
go.opencensus.io v0.22.0
5251
golang.org/x/oauth2 v0.0.0-20190604053449-0f29369cfe45
5352
google.golang.org/api v0.7.0
5453
gopkg.in/check.v1 v1.0.0-20180628173108-788fd7840127 // indirect

0 commit comments

Comments
 (0)