|
1 | 1 | # Change Log
|
| 2 | + |
2 | 3 | All notable changes to this project will be documented in this file.
|
3 | 4 |
|
4 | 5 | The format is based on [Keep a Changelog](http://keepachangelog.com/)
|
5 | 6 | and this project adheres to [Semantic Versioning](http://semver.org/).
|
6 | 7 |
|
7 | 8 | ## [Unreleased]
|
| 9 | + |
| 10 | +## [0.8.6] - 2020-01-22 |
| 11 | + |
| 12 | +### Added |
| 13 | + |
| 14 | +- Windows build now supported. |
| 15 | +- Added metrics to retrieve stats such as `procs_running` and `procs_blocked`. |
| 16 | +- Added metrics to retrieve network stats. |
| 17 | +- Added metric to retrieve guest OS features such as unknwon modules, ktd, |
| 18 | + and kernel integrity. |
| 19 | + |
| 20 | +### Changed |
| 21 | + |
| 22 | +- Print result's message when status is unknown. |
| 23 | + |
| 24 | +### Fixed |
| 25 | + |
| 26 | +- Fixed custom plugin command timeout when the command spawns a long running |
| 27 | + child process. |
| 28 | + |
| 29 | +## [0.8.5] - 2020-11-18 |
| 30 | + |
| 31 | +### Added |
| 32 | + |
| 33 | +- Added problem detection for buffer I/O error. |
| 34 | +- Added CPU load average metrics support. |
| 35 | +- Added kubelet apiserver connection check in health checker. |
| 36 | + |
| 37 | +### Changed |
| 38 | + |
| 39 | +- Will now catch hung task with pattern like `tasks airflow scheduler: *`. |
| 40 | +- Better handling to avoid duplicating disk bytes metrics used on fstype and |
| 41 | + mount types. |
| 42 | + |
| 43 | +### Fixed |
| 44 | + |
| 45 | +- Fixed the deployment yaml to prevent NPD from scheduling onto windows nodes. |
| 46 | +- Fixed memory unit for `/proc/meminfo` metrics. |
| 47 | +- Fixed OOMKilling detection for new linux kernel v5.1+. |
| 48 | + |
| 49 | +## [0.8.4] - 2020-09-01 |
| 50 | + |
| 51 | +### Added |
| 52 | + |
| 53 | +- Added `FSType` and `MountOption` as labels to the metric `disk_usage_bytes`. |
| 54 | +- Added `DockerContainerStartupFailure` event in `docker-monitor.json` to |
| 55 | + detect docker issue |
| 56 | + [docker/for-linux#647](https://github.com/docker/for-linux/issues/647). |
| 57 | + |
| 58 | +### Fixed |
| 59 | + |
| 60 | +- Reduced log spam generated by the custom plugin monitor. |
| 61 | + |
| 62 | +## [0.8.3] - 2020-06-30 |
| 63 | + |
| 64 | +### Added |
| 65 | + |
| 66 | +- `health-checker` binary now included in the docker image. |
| 67 | + |
| 68 | +### Changed |
| 69 | + |
| 70 | +- `--enable-repair=true` is now the default for docker and kubelet health |
| 71 | + checker. |
| 72 | +- Custom plugin will now only generate status update log when the status has |
| 73 | + changed. |
| 74 | +- Limit the size of custom plugin output to 4kb, extra output will be drained |
| 75 | + and discarded. |
| 76 | + |
| 77 | +### Fixed |
| 78 | + |
| 79 | +- Fix a race condition that services may be killed periodically when |
| 80 | + `--enable-repair=true`, and systemd service restart time equals the health |
| 81 | + check period. |
| 82 | + |
| 83 | +## [0.8.2] - 2020-05-28 |
| 84 | + |
| 85 | +### Added |
| 86 | + |
| 87 | +- Added an `--event-namespace` flag to make event namespace configurable. |
| 88 | +- Added `rhel` support in OS version. |
| 89 | +- Added `health-checker` as a custom plugin. The `health-checker` can be used |
| 90 | + to monitor healthiness of kubelet, docker and CRI container runtimes (e.g. |
| 91 | + `containerd`, `cri-o`) and restart them if they are not healthy if |
| 92 | + `enable-repair` is turned on. |
| 93 | + |
| 94 | +### Fixed |
| 95 | + |
| 96 | +- [#420](https://github.com/kubernetes/node-problem-detector/issues/420) Added |
| 97 | + missing `lsblk` to the container image. |
| 98 | + |
| 99 | +## [0.8.1] - 2020-02-25 |
| 100 | + |
| 101 | +### Added |
| 102 | + |
| 103 | +- Added `host_uptime` metrics for CentOS. |
| 104 | +- Now collecting a lot more useful CPU/disk/memory metrics. |
| 105 | + |
| 106 | +### Changed |
| 107 | + |
| 108 | +- Improved `network_problem.sh` to support `nf_conntrack` and report error when |
| 109 | + conntrack table is 90% full. |
| 110 | + |
| 111 | +### Fixed |
| 112 | + |
| 113 | +- [#366](https://github.com/kubernetes/node-problem-detector/issues/366) Fixed |
| 114 | + building with `ENABLE_JOURNALD=0`. |
| 115 | +- Fixed the first 0 value metrics reported for `disk_avg_queue_len`. |
| 116 | +- Fix a few metric units for disk metrics and the calculation for |
| 117 | + `disk_avg_queue_len`. |
| 118 | + |
| 119 | +## [0.8.0] - 2019-10-30 |
| 120 | + |
8 | 121 | ### Added
|
9 |
| -- Add travis presubmit test. |
| 122 | + |
| 123 | +- Added Stackdriver exporter. |
| 124 | +- Added a `k8s-exporter-heartbeat-period` flag to make the heart beat period |
| 125 | + of K8s exporter configurable. |
10 | 126 |
|
11 | 127 | ### Changed
|
| 128 | + |
| 129 | +- Changed the default heart beat period of K8s exporter from `1m` to `5m`. |
| 130 | + |
| 131 | +### Fixed |
| 132 | + |
| 133 | +- Addressed an issue with a panic caused by closing an already closed channel. |
| 134 | +- Fixed several potential busy loops. |
| 135 | + |
| 136 | +## [0.7.1] - 2019-08-27 |
| 137 | + |
| 138 | +### Added |
| 139 | + |
| 140 | +- Added validation that permanent problems habe a preset default condition. |
| 141 | + |
| 142 | +### Changed |
| 143 | + |
| 144 | +- Empty LogPath will now use journald's default path. |
| 145 | +- Systemd monitor now looks back 5 minutes. |
| 146 | +- Bumped base image to `k8s.gcr.io/debian-base-amd64:1.0.0`. |
| 147 | +- Updated the detection method for docker overlay2 issues. |
| 148 | +- Moved NPD into the kube-system namespace. |
| 149 | + |
| 150 | +### Fixed |
| 151 | + |
| 152 | +- [#202](https://github.com/kubernetes/node-problem-detector/issues/202) Fixed |
| 153 | + an issue that condition can't switch back to false for custom plugins. |
| 154 | + |
| 155 | +## [0.7.0] - 2019-07-25 |
| 156 | + |
| 157 | +### Added |
| 158 | + |
| 159 | +- Added a system stats monitor is added into NPD as a new problem daemon. It |
| 160 | + collects useful node problem related system stats with OpenCensus such as |
| 161 | + `disk/io_time`, `disk/weighted_io` and `disk/avg_queue_len`. |
| 162 | +- Besides node condition and events, problems detected by existing problem |
| 163 | + daemons are also collected into OpenCensus as metrics: |
| 164 | + `problem_counter{reason="PROBLEM_REASON"} xxx` for events and |
| 165 | + `problem_gauge{reason="PROBLEM_REASON",type="PROBLEM_TYPE"} 1 or 0` for |
| 166 | + conditions. |
| 167 | +- A Prometheus exporter is added to export all OpenCensus metrics collected by |
| 168 | + NPD through Prometheus. |
| 169 | +- A plugin system for problem daemons is added. Problem daemons can be disabled |
| 170 | + at compile time with build tags, such as `disable_system_stats_monitor`, |
| 171 | + `disable_system_log_monitor` and `disable_custom_plugin_monitor`. |
| 172 | +- A problem exporter interface is added. The original kubernetes problem |
| 173 | + reporting logic was moved into `k8sexporter`. Prometheus support is |
| 174 | + implemented as `prometheusexporter`. |
| 175 | + |
| 176 | +## [0.6.6] - 2019-08-13 |
| 177 | + |
| 178 | +### Changed |
| 179 | + |
| 180 | +- Updated the detection method for docker overlay2 issues. |
| 181 | + |
| 182 | +### Fixed |
| 183 | + |
| 184 | +- [#202](https://github.com/kubernetes/node-problem-detector/issues/202) Fixed |
| 185 | + an issue that condition can't switch back to false for custom plugins. |
| 186 | + |
| 187 | +## [0.6.5] - 2019-07-24 |
| 188 | + |
| 189 | +### Fixed |
| 190 | + |
| 191 | +- [#295](https://github.com/kubernetes/node-problem-detector/issues/295) Added |
| 192 | + configurable timeout to wait for apiserver to be ready before starting |
| 193 | + problem detection. |
| 194 | + |
| 195 | +## [0.6.4] - 2019-06-13 |
| 196 | + |
| 197 | +### Changed |
| 198 | + |
| 199 | +- Switch from godep to go modules resulting in bumping versions of many |
| 200 | + dependencies. |
| 201 | +- Changed custom plugin handling to run immediately on startup. |
| 202 | + |
| 203 | +### Fixed |
| 204 | + |
| 205 | +- [#269](https://github.com/kubernetes/node-problem-detector/issues/269) Fixed |
| 206 | + issue so that using `--version` should not require monitors to be specified. |
| 207 | + |
| 208 | +## [0.6.3] - 2019-04-05 |
| 209 | + |
| 210 | +### Added |
| 211 | + |
| 212 | +- Added better handling and reporting when missing required flags. |
| 213 | + |
| 214 | +### Fixed |
| 215 | + |
| 216 | +- Disabled glog writing to files for the log-counter plugin. |
| 217 | + |
| 218 | +## [0.6.2] - 2019-01-07 |
| 219 | + |
| 220 | +### Added |
| 221 | + |
| 222 | +- Added resource limites to NPD deployment. |
| 223 | +- Added log-counter to dockerfile. |
| 224 | +- Added `enable_message_change_based_condition_update` option to enable |
| 225 | + condition update when messages cahnge for custom plugin. |
| 226 | + |
| 227 | +### Fixed |
| 228 | + |
| 229 | +- [#232](https://github.com/kubernetes/node-problem-detector/issues/232) Explicitly |
| 230 | + include libsystemd0 in the image. |
| 231 | + |
| 232 | +## [0.6.1] - 2018-11-28 |
| 233 | + |
| 234 | +### Changed |
| 235 | + |
| 236 | +- Bumped base image to `k8s.gcr.io/debian-base-amd64:0.4.0`. |
| 237 | + |
| 238 | +## [0.6.0] - 2018-11-27 |
| 239 | + |
| 240 | +### Added |
| 241 | + |
| 242 | +- Added ConfigMap for NPD config. |
| 243 | +- Added readonly filesystem detection. |
| 244 | +- Added frequent kubelet/docker restart detection. |
| 245 | +- Added corrupt docker overlay2 issue detection. |
| 246 | + |
| 247 | +### Changed |
| 248 | + |
| 249 | +- Bumped Kubernetes client version to 1.9. |
| 250 | +- Updated OOMKilling pattern to support new kernel. |
| 251 | + |
| 252 | +## [0.5.0] - 2018-06-22 |
| 253 | + |
| 254 | +### Added |
| 255 | + |
| 256 | +- Added custom problem detector plugin interface. |
| 257 | +- Added custom network plugin monitor. |
| 258 | +- Added a kernel log counter custom problem detector to detect problems which |
| 259 | + have the same pattern. |
| 260 | + |
| 261 | +### Changed |
| 262 | + |
| 263 | +- Changed default port from 10256 to 20256 to avoid conflict with kube-proxy. |
| 264 | +- Bumped golang version from 1.8 to 1.9. |
| 265 | +- Bumped base image to `k8s.gcr.io/debian-base-amd64:0.3`. |
| 266 | + |
| 267 | +### Fixed |
| 268 | + |
| 269 | +- Fixed an error in the labels applied to the daemonset label selector. |
| 270 | + |
| 271 | +## [0.4.1] - 2017-06-21 |
| 272 | + |
| 273 | +### Added |
| 274 | + |
| 275 | +- Added docker image pull error detection. |
| 276 | + |
| 277 | +## [0.4.0] - 2017-04-31 |
| 278 | + |
| 279 | +### Added |
| 280 | + |
| 281 | +- Added "kernel log generator" container for test purposes. |
| 282 | +- Added ABRT adaptor config. |
| 283 | + |
| 284 | +## [0.3.0] - 2017-03-15 |
| 285 | + |
| 286 | +### Added |
| 287 | + |
| 288 | +- Added look back support in kernel monitor. Kernel monitor will look back for |
| 289 | + specified amount of time to detect old problems during each start or restart. |
| 290 | +- Added support for running node-problem-detector standalone. |
| 291 | +- Added `-hostname-override` option to provide custom node name. |
| 292 | +- Added `-port` option to provide custom listening port for service. |
| 293 | +- Added `-address` option to define binding address. |
| 294 | +- Added journald support. |
| 295 | +- Added travis presubmit test. |
| 296 | +- Added arbitrary system log support. |
| 297 | + |
| 298 | +### Changed |
| 299 | + |
12 | 300 | - Update kubernetes version to v1.4.0-beta.3
|
13 | 301 |
|
| 302 | +### Fixed |
| 303 | + |
| 304 | +- Only change transition timestamp when condition has changed. |
| 305 | +- [#47](https://github.com/kubernetes/node-problem-detector/issues/47) Don't |
| 306 | + report KernelDeadlock on `unregister_netdevice` event. |
| 307 | +- [#48](https://github.com/kubernetes/node-problem-detector/issues/48) Use |
| 308 | + system boot time instead of "StartPattern". |
| 309 | + |
14 | 310 | ## [0.2.0] - 2016-08-23
|
| 311 | + |
15 | 312 | ### Added
|
16 |
| -- Add look back support in kernel monitor. Kernel monitor will look back for |
17 |
| - specified amount of time to detect old problems during each start or restart. |
| 313 | + |
18 | 314 | - Add support for some kernel oops detection.
|
19 | 315 |
|
20 | 316 | ### Changed
|
| 317 | + |
21 | 318 | - Change NPD to get node name from `NODE_NAME` env first before `os.Hostname`,
|
22 | 319 | and update the example to get node name from downward api and set `NODE_NAME`.
|
23 | 320 |
|
24 | 321 | ## 0.1.0 - 2016-06-09
|
| 322 | + |
25 | 323 | ### Added
|
| 324 | + |
26 | 325 | - Initial version of node problem detector.
|
27 | 326 |
|
28 |
| -[Unreleased]: https://github.com/kubernetes/node-problem-detector/compare/v0.2...HEAD |
29 |
| -[0.2.0]: https://github.com/kubernetes/node-problem-detector/compare/v0.1...v0.2 |
| 327 | +[Unreleased]: https://github.com/kubernetes/node-problem-detector/compare/v0.8.6...HEAD |
| 328 | +[0.8.6]: https://github.com/kubernetes/node-problem-detector/compare/v0.8.5...v0.8.6 |
| 329 | +[0.8.5]: https://github.com/kubernetes/node-problem-detector/compare/v0.8.4...v0.8.5 |
| 330 | +[0.8.4]: https://github.com/kubernetes/node-problem-detector/compare/v0.8.3...v0.8.4 |
| 331 | +[0.8.3]: https://github.com/kubernetes/node-problem-detector/compare/v0.8.2...v0.8.3 |
| 332 | +[0.8.2]: https://github.com/kubernetes/node-problem-detector/compare/v0.8.1...v0.8.2 |
| 333 | +[0.8.1]: https://github.com/kubernetes/node-problem-detector/compare/v0.8.0...v0.8.1 |
| 334 | +[0.8.0]: https://github.com/kubernetes/node-problem-detector/compare/v0.7.0...v0.8.0 |
| 335 | +[0.7.1]: https://github.com/kubernetes/node-problem-detector/compare/v0.7.0...v0.7.1 |
| 336 | +[0.7.0]: https://github.com/kubernetes/node-problem-detector/compare/v0.6.6...v0.7.0 |
| 337 | +[0.6.6]: https://github.com/kubernetes/node-problem-detector/compare/v0.6.5...v0.6.6 |
| 338 | +[0.6.5]: https://github.com/kubernetes/node-problem-detector/compare/v0.6.4...v0.6.5 |
| 339 | +[0.6.4]: https://github.com/kubernetes/node-problem-detector/compare/v0.6.3...v0.6.4 |
| 340 | +[0.6.3]: https://github.com/kubernetes/node-problem-detector/compare/v0.6.2...v0.6.3 |
| 341 | +[0.6.2]: https://github.com/kubernetes/node-problem-detector/compare/v0.6.1...v0.6.2 |
| 342 | +[0.6.1]: https://github.com/kubernetes/node-problem-detector/compare/v0.6.0...v0.6.1 |
| 343 | +[0.6.0]: https://github.com/kubernetes/node-problem-detector/compare/v0.5.0...v0.6.0 |
| 344 | +[0.5.0]: https://github.com/kubernetes/node-problem-detector/compare/v0.4.1...v0.5.0 |
| 345 | +[0.4.1]: https://github.com/kubernetes/node-problem-detector/compare/v0.4.0...v0.4.1 |
| 346 | +[0.4.0]: https://github.com/kubernetes/node-problem-detector/compare/v0.3.0...v0.4.0 |
| 347 | +[0.3.0]: https://github.com/kubernetes/node-problem-detector/compare/v0.2.0...v0.3.0 |
| 348 | +[0.2.0]: https://github.com/kubernetes/node-problem-detector/compare/v0.1.0...v0.2.0 |
0 commit comments