|
| 1 | +## 2016-12-15 |
| 2 | + |
| 3 | +Agenda: |
| 4 | + |
| 5 | + |
| 6 | + |
| 7 | +* Demo by Datadog (rescheduled) |
| 8 | +* Kubernetes Metric Conventions: [https://docs.google.com/document/d/1YVs02Li6QFCg8Th2Wa4z1u2NBlQHDp2dj3EdAt6uskE/edit#](https://docs.google.com/document/d/1YVs02Li6QFCg8Th2Wa4z1u2NBlQHDp2dj3EdAt6uskE/edit#) |
| 9 | +* Resource metrics API: looking towards beta |
| 10 | + * [https://docs.google.com/document/d/1t0G7OS6OP9qPndkkNROCu0pF3-vkDmzonmT-6gEWcx0/edit?ts=5852bda8](https://docs.google.com/document/d/1t0G7OS6OP9qPndkkNROCu0pF3-vkDmzonmT-6gEWcx0/edit?ts=5852bda8) |
| 11 | + |
| 12 | +Notes: |
| 13 | + |
| 14 | + |
| 15 | + |
| 16 | +* Put metric convention document somewhere visible for reference |
| 17 | + * [https://github.com/kubernetes/community/tree/master/contributors/devel](https://github.com/kubernetes/community/tree/master/contributors/devel) |
| 18 | +* Resource metrics API should be moved towards beta |
| 19 | + * To be finalized after holiday break |
| 20 | + * Working towards beta in 1.7 |
| 21 | +* Custom metrics API: |
| 22 | + * [https://github.com/kubernetes/community/pull/152/files](https://github.com/kubernetes/community/pull/152/files) |
| 23 | + |
| 24 | + |
| 25 | +## 2016-12-08 |
| 26 | + |
| 27 | +**Warning: This meeting will be about logging. If you are not interested please skip.** |
| 28 | + |
| 29 | +Agenda |
| 30 | + |
| 31 | + |
| 32 | + |
| 33 | +* Restart LogDir proposal ([https://github.com/kubernetes/kubernetes/pull/13010](https://github.com/kubernetes/kubernetes/pull/13010)) |
| 34 | +* Alternative [https://github.com/kubernetes/kubernetes/pull/33111](https://github.com/kubernetes/kubernetes/pull/33111) |
| 35 | + |
| 36 | +Meeting notes: [https://gist.github.com/leahnp/463501f6dfe39f6f21ea5d3ebcb787d7](https://gist.github.com/leahnp/463501f6dfe39f6f21ea5d3ebcb787d7) |
| 37 | + |
| 38 | + |
| 39 | +## 2016-12-01 |
| 40 | + |
| 41 | + |
| 42 | +### Agenda |
| 43 | + |
| 44 | + |
| 45 | + |
| 46 | +* Heapster needs your help |
| 47 | + * [sross] Need to come up with map of sinks to maintainers |
| 48 | + * Maybe consider dropping sinks without mainters |
| 49 | + * [sross] need statement of plans for Heapster |
| 50 | + * [sross] putting into maintenance mode, what does maintenance mode entail, should we continue accepting sinks? |
| 51 | + * [piosz] to write something up and send out |
| 52 | +* [mwringe] what is plan for timeline for monitoring pipeline work |
| 53 | + * [piosz] plan is starting work Q2 2017, unless anyone else can help |
| 54 | + * [piosz] major missing component is discovery summarizer |
| 55 | + * [sross] we (Red Hat) are willing to help out in this area |
| 56 | + |
| 57 | + |
| 58 | +## [Cancelled] 2016-11-24: Thanksgiving in US |
| 59 | + |
| 60 | + |
| 61 | +## [Cancelled] 2016-11-17: no meeting week |
| 62 | + |
| 63 | + |
| 64 | +## [Cancelled] 2016-11-10: Kubecon |
| 65 | + |
| 66 | + |
| 67 | +## [Cancelled] 2016-11-03 |
| 68 | + |
| 69 | + |
| 70 | +## 2016-10-27 |
| 71 | + |
| 72 | + |
| 73 | +### Agenda |
| 74 | + |
| 75 | + |
| 76 | + |
| 77 | +* F2f meeting about monitoring in Seattle during KubeCon (on Monday Nov 7th) |
| 78 | + |
| 79 | + |
| 80 | +## 2016-10-20 |
| 81 | + |
| 82 | +**Warning: This meeting will be about logging. If you are not interested please skip.** |
| 83 | + |
| 84 | + |
| 85 | +### Agenda |
| 86 | + |
| 87 | + |
| 88 | + |
| 89 | +* f2f meeting about logging in Seattle during KubeCon (probably on Monday Nov 7th) |
| 90 | + * There is going to be a kubernetes dev summit (Nov 10th) meeting for logging |
| 91 | +* Group administrivia: frequency? Length? Topics? |
| 92 | +* Current state of logging in Kubernetes |
| 93 | +* What’s going on with logging? |
| 94 | + |
| 95 | +Notes |
| 96 | + |
| 97 | +Developers Summit - 45 minute unconference topic on the future of logging |
| 98 | + |
| 99 | + - moderated by Vishnu and Patrick |
| 100 | + |
| 101 | + - open to anyone who is attending the Kubernetes Developers Conference |
| 102 | + |
| 103 | +Discussion of Face to Face meeting - Piotr and Patrick to sync up offline |
| 104 | + |
| 105 | +Frequency: every three weeks, going to skip next week/push back one week next meeting is during KubeCon Developers Summit. |
| 106 | + |
| 107 | + - There will be an announcement for exactly when the next meeting is |
| 108 | + |
| 109 | +Logging Discussion Topics: |
| 110 | + |
| 111 | + - logging volumes (proposal started by David Cowden -[ https://docs.google.com/document/d/1K2hh7nQ9glYzGE-5J7oKBB7oK3S_MKqwCISXZK-sB2Q/edit#](https://docs.google.com/document/d/1K2hh7nQ9glYzGE-5J7oKBB7oK3S_MKqwCISXZK-sB2Q/edit#)) |
| 112 | + |
| 113 | + - hot loop logging and verbosity for scalability issues. |
| 114 | + |
| 115 | + - how to detect spammy instances |
| 116 | + |
| 117 | + - how to not let this wreck the cluster |
| 118 | + |
| 119 | + - general dissatisfaction with the logging facility |
| 120 | + |
| 121 | + - structured logging kubernetes wide for consistent consumption |
| 122 | + |
| 123 | + - application log type detection |
| 124 | + |
| 125 | + - what metadata do we need to carry through a logging pipeline to id a source system (e.g. mysql, user application) |
| 126 | + |
| 127 | + - what do logging vendors need supplied to aid in this |
| 128 | + |
| 129 | +Current logging pipelines |
| 130 | + |
| 131 | + - fluentd direct to GCP or ES |
| 132 | + |
| 133 | + - fluentd to kafka to fluentd to ES |
| 134 | + |
| 135 | +Action Items |
| 136 | + |
| 137 | + - Piotr & Patrick to determine f2f details |
| 138 | + |
| 139 | + - Try and get logging vendors to join the SIG |
| 140 | + |
| 141 | + |
| 142 | +## [Cancelled] 2016-10-13 |
| 143 | + |
| 144 | + |
| 145 | +## 2016-10-06 |
| 146 | + |
| 147 | + |
| 148 | +### Agenda |
| 149 | + |
| 150 | + |
| 151 | + |
| 152 | +* No response from sig api machinery (moving to next meeting) |
| 153 | +* Continue discussion on monitoring architecture |
| 154 | + * Agreed to versioned, well-defined API |
| 155 | + * Rest API vs. Query Language |
| 156 | + * A webhook model was suggested for the APIs (like Auth in Kube today) |
| 157 | + * [sross] has concerns over discoverability of webhooks |
| 158 | + * Webhook vs API server is largely an implementation question |
| 159 | + * will decide on discovery vs webhook for consumption once we get the API design in place |
| 160 | + * [sross] will propose an API design for the custom metrics API and historical metrics API |
| 161 | +* Discuss [roadmap](https://docs.google.com/document/d/1j6uHkU8m6GvElNKCJdBN8KrejkUzVbp2l0zTyeSxrl8/edit) |
| 162 | + * Discussed briefly, please go read afterwards |
| 163 | + * [sross] to lead push on custom metrics design/implementation for 1.5 |
| 164 | + * 1.5 API features will be mainly implemented in terms of Heapster |
| 165 | +* looking forward for one-click install of 3rd party monitoring (possibly Prometheus, but as an out of the box, one command setup; possible choices for deployment: helm, kpm) |
| 166 | +* Logging discussion feasibility conversation (ie: is this a reasonable location for having discussions about logging) |
| 167 | + * This may be a reasonable place for logging discussions, if we explicitly note which meetings will discuss logging (and/or when logging will be discussed) |
| 168 | + * May also just want to create a separate SIG |
| 169 | + * [decarr] mentioned CRI discussion on logging and metrics |
| 170 | + * Outcome was that we should sync with SIG node on that, but it should probably stay more in SIG node |
| 171 | + |
| 172 | + |
| 173 | +## 2016-09-29 |
| 174 | + |
| 175 | + |
| 176 | +### Agenda |
| 177 | + |
| 178 | + |
| 179 | + |
| 180 | +* Discuss [Kubernetes monitoring architecture proposal ](https://docs.google.com/document/d/1z7R44MUz_5gRLwsVH0S9rOy8W5naM9XE5NrbeGIqO2k/edit#) |
| 181 | + * |
| 182 | + |
| 183 | + |
| 184 | +### Notes |
| 185 | + |
| 186 | + |
| 187 | + |
| 188 | +* Main metrics pipeline used by Kubernetes components |
| 189 | +* Separate operator-defined monitoring pipeline for user-exposed monitoring |
| 190 | + * Generally collects core metrics redundantly/independently |
| 191 | +* Should it be possible to implement the core metrics pipeline on top of the custom monitoring system |
| 192 | + * As long as one implements the core metrics API, one could swap it out for scheduler etc. |
| 193 | +* Upstream Kubernetes would test against the stable core pipeline |
| 194 | +* Replaceable != Pluggable – the entire thing gets replaced in a custom scenario |
| 195 | +* Master Metrics API part of main Kubernetes API |
| 196 | + * Should further APIs like for historic metrics also be in that group? |
| 197 | + * Discussion for sig-apimachinery |
| 198 | +* Should Infrastore be part of core Kubernetes |
| 199 | + * Provides historic time series data about the system |
| 200 | + * Would require implementing a subset of a TSDB |
| 201 | + * Not an implemented component, just an API |
| 202 | + |
| 203 | + |
| 204 | + |
| 205 | +* What are core metrics exactly? |
| 206 | + * CPU, memory, disk |
| 207 | + * What about network and ingress? |
| 208 | + * Resource estimator would not read from master metrics API but collect information itself (e.g. from kubelet) |
| 209 | + |
| 210 | + |
| 211 | +## 2016-09-22 |
| 212 | + |
| 213 | + |
| 214 | +### Agenda |
| 215 | + |
| 216 | + |
| 217 | + |
| 218 | +* Mission statement: [https://docs.google.com/document/d/15Q47xbYTGHEZ-wVULGSgOSD5Kq-OehJj-MEChVH1kqk/edit?usp=sharing](https://docs.google.com/document/d/15Q47xbYTGHEZ-wVULGSgOSD5Kq-OehJj-MEChVH1kqk/edit?usp=sharing) |
| 219 | +* Kubesnap demo |
| 220 | + |
| 221 | + |
| 222 | +### Notes |
| 223 | + |
| 224 | + |
| 225 | + |
| 226 | +* Kubesnap demo by Andrzej Kuriata, Intel ([slides](https://docs.google.com/presentation/d/1fgGik1nq-yEN7Y2dRIQWTjb7r5HEWaG9paDCdvzE_IA/edit?usp=sharing)): |
| 227 | + * Daemon set in k8s |
| 228 | + * Integration with Heapster |
| 229 | +* Mission Statement: |
| 230 | + * Enough people to coordinate, but small enough to be focused |
| 231 | + * List of people actually doing development/design in the scope of this sig |
| 232 | + * Scratchpad before a meeting to set discussions of features before meeting |
| 233 | + * Sig autoscaling discussed and committed to features/metrics in previous meetings |
| 234 | + * A plan for an api for 1.5? |
| 235 | + |
| 236 | + |
| 237 | +## 2016-09-15 |
| 238 | + |
| 239 | + |
| 240 | +### Agenda |
| 241 | + |
| 242 | + |
| 243 | + |
| 244 | +* Presentation by Eric Lemoine (Mirantis): monitoring Kubernetes with [Snap](http://snap-telemetry.io/) and [Hindsight](https://github.com/trink/hindsight). [Slides](https://docs.google.com/presentation/d/1XWM0UmuYdcP_VsbKg6yiSDb6TR1JmouHdZAnLelBWXg/edit?usp=sharing) |
| 245 | +* Meeting frequency |
| 246 | +* Ownership SIG instrumentation vs SIG autoscaling |
| 247 | +* [Discuss how to export pod labels for cAdvisor metrics (see kubernetes/kubernetes#32326)](https://github.com/trink/hindsight) |
| 248 | + |
| 249 | + |
| 250 | +### Notes |
| 251 | + |
| 252 | + |
| 253 | + |
| 254 | +* Meeting frequency - defer until ownership clarified |
| 255 | +* Ownership SIG autoscaling vs instrumentation |
| 256 | + * Triggering issue: [https://github.com/kubernetes/kubernetes/issues/31784](https://github.com/kubernetes/kubernetes/issues/31784) |
| 257 | + * HPA is consumer of Master Metrics API (also kubectl top, scheduler, UI) |
| 258 | + * Could potentially be relevant to monitoring as well |
| 259 | + * Make distinction between metrics used by the cluster and metrics about the cluster |
| 260 | + * One SIG lead cares about system level metrics, one about the external/monitoring side. Good setup for the SIG to handle both areas? |
| 261 | + * Follow up with mission statement on the mailing list taking these things into account |
| 262 | +* Kube-state-metrics v0.2.0 was released with many more metrics: |
| 263 | + * [https://github.com/kubernetes/kube-state-metrics#metrics](https://github.com/kubernetes/kube-state-metrics#metrics) |
| 264 | + |
| 265 | + |
| 266 | +## 2016-09-08 |
| 267 | + |
| 268 | + |
| 269 | +### Agenda |
| 270 | + |
| 271 | + |
| 272 | + |
| 273 | +* Sylvain Boily showing their monitoring solution |
| 274 | + |
| 275 | + |
| 276 | +### Notes |
| 277 | + |
| 278 | + |
| 279 | + |
| 280 | +* Demo by Sylvain on their monitoring setup using InfluxDB+Grafana+Kapacitor |
| 281 | + * Scraping metrics from Heapster, Eventer, and apiserver |
| 282 | +* Separation apiserver vs kube-state-metrics |
| 283 | + * The apiserver exposes metrics on /metrics about the running state of the apiserver process |
| 284 | + * How man requests came in from clients? What was their latency? |
| 285 | + * Outbound latency to the etcd cluster? |
| 286 | + * Kube-state-metrics aims to provide metrics on logical state of the entire Kubernetes cluster |
| 287 | + * How many deployments exist? |
| 288 | + * How many restarts did pod X have? |
| 289 | + * How many available/desired pods does a deployment have? |
| 290 | + * How much capacity does node X have? |
| 291 | +* Separation Heapster vs [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics/commits/master) |
| 292 | + * Heapster holds metrics about characteristics about things running on Kubernetes, used by other system components. |
| 293 | + * Currently Heapster asks the Kubelet for cAdvisor metrics vs. kube-state-metrics collecting information from the apiserver |
| 294 | +* Should eventer information be consolidated with kube-state-metrics? |
| 295 | +* Should we look into the creation of a monitoring namespace / service for all other namespace to use? |
| 296 | +* Should monitoring be available out of the box with a k8s installation when done in a private datacenter ? |
| 297 | + |
| 298 | + |
| 299 | +## 2016-09-01 |
| 300 | + |
| 301 | + |
| 302 | +### Agenda |
| 303 | + |
| 304 | + |
| 305 | + |
| 306 | +* State of [Kubernetes monitoring at Soundcloud](https://drive.google.com/file/d/0B_br6xk3Iws3aGZ5NkFMMDRqRjhvM1p1RWZXbVF2aVhiWGZz/view?usp=sharing) (Matthias Rampke) |
| 307 | +* Future of [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics) |
| 308 | +* Application metric separation in cAdvisor ([https://github.com/google/cadvisor/issues/1420](https://github.com/google/cadvisor/issues/1420)) |
| 309 | +* ... |
| 310 | + |
| 311 | + |
| 312 | +### Notes |
| 313 | + |
| 314 | + |
| 315 | + |
| 316 | +* Matthias Rampke giving an intro to their Kubernetes monitoring setup |
| 317 | + * Currently running Prometheus generally outside of Kubernetes |
| 318 | + * Easy migration path from previous infrastructure |
| 319 | + * Still using DNS as service discovery instead of Kubernetes API |
| 320 | + * Sharded Prometheus servers by team for application monitoring |
| 321 | + * Severe lack of metrics around Kubernetes cluster state itself |
| 322 | + * Long-term vision (1yr): all services and their dependencies running inside of Kubernetes |
| 323 | + * Prometheus part of that via a standard configuration |
| 324 | + * Easy to spin up monitoring new components |
| 325 | +* People using Heapster as it gives them all metrics in one component |
| 326 | +* Something as easy to deploy as Heapster would be useful |
| 327 | +* Three sets of metrics |
| 328 | + * Those useful only for monitoring (e.g. number of pods) |
| 329 | + * Metrics for auto-scaling (CPU, custom app metrics) |
| 330 | + * Those that fit both |
| 331 | +* Make Prometheus a first-class citizen/best practice for exposing custom auto-scaling metrics? |
| 332 | +* Overlap between auto-scaling and monitoring metrics seems generally fine |
| 333 | + * storing them twice is okay, auto-scaling metrics are way fewer |
| 334 | +* Kube-state-metrics |
| 335 | + * Keep it as a playground or fold it into controller manager? |
| 336 | + * |
| 337 | + |
| 338 | + |
| 339 | +## 2016-08-25 |
| 340 | + |
| 341 | + |
| 342 | +### Notes |
| 343 | + |
| 344 | + |
| 345 | + |
| 346 | +* CoreOS would like to see |
| 347 | + * more instrumentation as insight into cluster |
| 348 | + * Remove orthogonal features in for example cadvisor |
| 349 | +* RedHat |
| 350 | + * Good out-of-the-box solution for cluster observability, component interaction |
| 351 | + * Collaboration with sig-autoscaling |
| 352 | +* SoundCloud: |
| 353 | + * Prometheus originated at SoundCloud |
| 354 | + * Bare metal kubernetes setup: separation of monitoring |
| 355 | + * Separation of heapster and overall kubernetes architecture |
| 356 | + * How are people instrumenting around kubernetes |
| 357 | +* Mirantis: |
| 358 | + * Scalability of monitoring solutions |
| 359 | + * More metadata from kubelet “stats” API: labels are missing for example |
| 360 | + * Also interested in “Separation of heapster and overall kubernetes architecture” (from SoundCloud) |
| 361 | + * Extended insight into OpenStack & Kubernetes |
| 362 | + * During our scalability tests we want to measure k8s behaviour in some set of defined metrics |
| 363 | +* Intel: |
| 364 | + * Integration of snap into kubernetes |
| 365 | + * Help deliver monitoring goals |
| 366 | + |
| 367 | +Where should guides for flavors of monitoring live? |
| 368 | + |
| 369 | +→ ad hoc currently, not all the same |
| 370 | + |
| 371 | +→ best practices in the community |
| 372 | + |
| 373 | +Where are we and where do we want to do? → Google doc will be setup |
| 374 | + |
| 375 | +Next meeting: Discuss google doc & Matthias from SoundCloud will give an insight of how they are using Prometheus to monitor Kubernetes and its pain points. |
| 376 | + |
| 377 | +Next time will use Zoom as hangout limit is 10 participants. |
| 378 | + |
| 379 | +Kubernetes monitoring architecture (~~requires joining [https://groups.google.com/forum/#!forum/kubernetes-sig-node](https://groups.google.com/forum/#!forum/kubernetes-sig-node)~~): [https://docs.google.com/document/d/1HMvhhtV3Xow85iZdowJ7GMsryU6pvjOzruqcJYY9MMI/edit?ts=57b0eec1#heading=h.gav7ymlujqys](https://docs.google.com/document/d/1HMvhhtV3Xow85iZdowJ7GMsryU6pvjOzruqcJYY9MMI/edit?ts=57b0eec1#heading=h.gav7ymlujqys) |
| 380 | + |
0 commit comments