kubernetes
diff --git a/‎sig-instrumentation/archive/meeting-notes-2016.md
Lines changed: 380 additions & 0 deletions b/‎sig-instrumentation/archive/meeting-notes-2016.md
Lines changed: 380 additions & 0 deletions
@@ -0,0 +1,380 @@
+## 2016-12-15
+
+Agenda:
+
+
+
+*   Demo by Datadog (rescheduled)
+*   Kubernetes Metric Conventions: [https://docs.google.com/document/d/1YVs02Li6QFCg8Th2Wa4z1u2NBlQHDp2dj3EdAt6uskE/edit#](https://docs.google.com/document/d/1YVs02Li6QFCg8Th2Wa4z1u2NBlQHDp2dj3EdAt6uskE/edit#)
+*   Resource metrics API: looking towards beta
+    *   [https://docs.google.com/document/d/1t0G7OS6OP9qPndkkNROCu0pF3-vkDmzonmT-6gEWcx0/edit?ts=5852bda8](https://docs.google.com/document/d/1t0G7OS6OP9qPndkkNROCu0pF3-vkDmzonmT-6gEWcx0/edit?ts=5852bda8)
+
+Notes:
+
+
+
+*   Put metric convention document somewhere visible for reference
+    *   [https://github.com/kubernetes/community/tree/master/contributors/devel](https://github.com/kubernetes/community/tree/master/contributors/devel)
+*   Resource metrics API should be moved towards beta
+    *   To be finalized after holiday break
+    *   Working towards beta in 1.7
+*   Custom metrics API:
+    *   [https://github.com/kubernetes/community/pull/152/files](https://github.com/kubernetes/community/pull/152/files)
+
+
+## 2016-12-08
+
+**Warning: This meeting will be about logging. If you are not interested please skip.**
+
+Agenda
+
+
+
+*   Restart LogDir proposal ([https://github.com/kubernetes/kubernetes/pull/13010](https://github.com/kubernetes/kubernetes/pull/13010))
+*   Alternative [https://github.com/kubernetes/kubernetes/pull/33111](https://github.com/kubernetes/kubernetes/pull/33111)
+
+Meeting notes:  [https://gist.github.com/leahnp/463501f6dfe39f6f21ea5d3ebcb787d7](https://gist.github.com/leahnp/463501f6dfe39f6f21ea5d3ebcb787d7)
+
+
+## 2016-12-01
+
+
+### Agenda
+
+
+
+*   Heapster needs your help
+    *   [sross] Need to come up with map of sinks to maintainers
+        *   Maybe consider dropping sinks without mainters
+    *   [sross] need statement of plans for Heapster
+        *   [sross] putting into maintenance mode, what does maintenance mode entail, should we continue accepting sinks?
+        *   [piosz] to write something up and send out
+*   [mwringe] what is plan for timeline for monitoring pipeline work
+    *   [piosz] plan is starting work Q2 2017, unless anyone else can help
+        *   [piosz] major missing component is discovery summarizer
+        *   [sross] we (Red Hat) are willing to help out in this area
+
+
+## [Cancelled] 2016-11-24: Thanksgiving in US
+
+
+## [Cancelled] 2016-11-17: no meeting week
+
+
+## [Cancelled] 2016-11-10: Kubecon
+
+
+## [Cancelled] 2016-11-03
+
+
+## 2016-10-27
+
+
+### Agenda
+
+
+
+*   F2f meeting about monitoring in Seattle during KubeCon (on Monday Nov 7th)
+
+
+## 2016-10-20
+
+**Warning: This meeting will be about logging. If you are not interested please skip.**
+
+
+### Agenda
+
+
+
+*   f2f meeting about logging in Seattle during KubeCon (probably on Monday Nov 7th)
+    *   There is going to be a kubernetes dev summit (Nov 10th) meeting for logging
+*   Group administrivia:  frequency?  Length? Topics?
+*   Current state of logging in Kubernetes
+*   What’s going on with logging?
+
+Notes
+
+Developers Summit - 45 minute unconference topic on the future of logging
+
+ - moderated by Vishnu and Patrick
+
+ - open to anyone who is attending the Kubernetes Developers Conference
+
+Discussion of Face to Face meeting - Piotr and Patrick to sync up offline
+
+Frequency:  every three weeks, going to skip next week/push back one week next meeting is during KubeCon Developers Summit.  
+
+ - There will be an announcement for exactly when the next meeting is
+
+Logging Discussion Topics:
+
+  - logging volumes (proposal started by David Cowden -[ https://docs.google.com/document/d/1K2hh7nQ9glYzGE-5J7oKBB7oK3S_MKqwCISXZK-sB2Q/edit#](https://docs.google.com/document/d/1K2hh7nQ9glYzGE-5J7oKBB7oK3S_MKqwCISXZK-sB2Q/edit#))
+
+  - hot loop logging and verbosity for scalability issues. 
+
+     - how to detect spammy instances
+
+     - how to not let this wreck the cluster
+
+  - general dissatisfaction with the logging facility
+
+  - structured logging kubernetes wide for consistent consumption
+
+  - application log type detection
+
+    - what metadata do we need to carry through a logging pipeline to id a source system (e.g. mysql, user application)
+
+    - what do logging vendors need supplied to aid in this
+
+Current logging pipelines
+
+  - fluentd direct to GCP or ES
+
+  - fluentd to kafka to fluentd to ES
+
+Action Items
+
+ - Piotr & Patrick to determine f2f details
+
+ - Try and get logging vendors to join the SIG
+
+
+## [Cancelled] 2016-10-13
+
+
+## 2016-10-06
+
+
+### Agenda
+
+
+
+*   No response from sig api machinery (moving to next meeting)
+*   Continue discussion on monitoring architecture
+    *   Agreed to versioned, well-defined API
+    *   Rest API vs. Query Language
+    *   A webhook model was suggested for the APIs (like Auth in Kube today)
+        *   [sross] has concerns over discoverability of webhooks
+        *   Webhook vs API server is largely an implementation question
+        *   will decide on discovery vs webhook for consumption once we get the API design in place
+    *   [sross] will propose an API design for the custom metrics API and historical metrics API
+*   Discuss [roadmap](https://docs.google.com/document/d/1j6uHkU8m6GvElNKCJdBN8KrejkUzVbp2l0zTyeSxrl8/edit)
+    *   Discussed briefly, please go read afterwards
+    *   [sross] to lead push on custom metrics design/implementation for 1.5
+    *   1.5 API features will be mainly implemented in terms of Heapster
+*   looking forward for one-click install of 3rd party monitoring (possibly Prometheus, but as an out of the box, one command setup; possible choices for deployment: helm, kpm)
+*   Logging discussion feasibility conversation (ie: is this a reasonable location for having discussions about logging)
+    *   This may be a reasonable place for logging discussions, if we explicitly note which meetings will discuss logging (and/or when logging will be discussed)
+    *   May also just want to create a separate SIG
+    *   [decarr] mentioned CRI discussion on logging and metrics
+        *   Outcome was that we should sync with SIG node on that, but it should probably stay more in SIG node
+
+
+## 2016-09-29
+
+
+### Agenda
+
+
+
+*   Discuss [Kubernetes monitoring architecture proposal ](https://docs.google.com/document/d/1z7R44MUz_5gRLwsVH0S9rOy8W5naM9XE5NrbeGIqO2k/edit#)
+    *   
+
+
+### Notes
+
+
+
+*   Main metrics pipeline used by Kubernetes components
+*   Separate operator-defined monitoring pipeline for user-exposed monitoring
+    *   Generally collects core metrics redundantly/independently
+*   Should it be possible to implement the core metrics pipeline on top of the custom monitoring system
+    *   As long as one implements the core metrics API, one could swap it out for scheduler etc.
+*   Upstream Kubernetes would test against the stable core pipeline
+*   Replaceable != Pluggable – the entire thing gets replaced in a custom scenario
+*   Master Metrics API part of main Kubernetes API
+    *   Should further APIs like for historic metrics also be in that group?
+    *   Discussion for sig-apimachinery
+*   Should Infrastore be part of core Kubernetes
+    *   Provides historic time series data about the system
+    *   Would require implementing a subset of a TSDB
+    *   Not an implemented component, just an API
+
+     
+
+*   What are core metrics exactly?
+    *   CPU, memory, disk
+    *   What about network and ingress?
+    *   Resource estimator would not read from master metrics API but collect information itself (e.g. from kubelet)
+
+
+## 2016-09-22
+
+
+### Agenda
+
+
+
+*   Mission statement: [https://docs.google.com/document/d/15Q47xbYTGHEZ-wVULGSgOSD5Kq-OehJj-MEChVH1kqk/edit?usp=sharing](https://docs.google.com/document/d/15Q47xbYTGHEZ-wVULGSgOSD5Kq-OehJj-MEChVH1kqk/edit?usp=sharing)
+*   Kubesnap demo
+
+
+### Notes
+
+
+
+*   Kubesnap demo by Andrzej Kuriata, Intel ([slides](https://docs.google.com/presentation/d/1fgGik1nq-yEN7Y2dRIQWTjb7r5HEWaG9paDCdvzE_IA/edit?usp=sharing)):
+    *   Daemon set in k8s
+    *   Integration with Heapster
+*   Mission Statement:
+    *   Enough people to coordinate, but small enough to be focused
+    *   List of people actually doing development/design in the scope of this sig
+    *   Scratchpad before a meeting to set discussions of features before meeting
+    *   Sig autoscaling discussed and committed to features/metrics in previous meetings
+    *   A plan for an api for 1.5?
+
+
+## 2016-09-15
+
+
+### Agenda
+
+
+
+*   Presentation by Eric Lemoine (Mirantis): monitoring Kubernetes with [Snap](http://snap-telemetry.io/) and [Hindsight](https://github.com/trink/hindsight). [Slides](https://docs.google.com/presentation/d/1XWM0UmuYdcP_VsbKg6yiSDb6TR1JmouHdZAnLelBWXg/edit?usp=sharing)
+*   Meeting frequency
+*   Ownership SIG instrumentation vs SIG autoscaling
+*   [Discuss how to export pod labels for cAdvisor metrics (see kubernetes/kubernetes#32326)](https://github.com/trink/hindsight)
+
+
+### Notes
+
+
+
+*   Meeting frequency - defer until ownership clarified
+*   Ownership SIG autoscaling vs instrumentation
+    *   Triggering issue: [https://github.com/kubernetes/kubernetes/issues/31784](https://github.com/kubernetes/kubernetes/issues/31784)
+    *   HPA is consumer of Master Metrics API (also kubectl top, scheduler, UI)
+    *   Could potentially be relevant to monitoring as well
+    *   Make distinction between metrics used by the cluster and metrics about the cluster
+    *   One SIG lead cares about system level metrics, one about the external/monitoring side. Good setup for the SIG to handle both areas?
+    *   Follow up with mission statement on the mailing list taking these things into account
+*   Kube-state-metrics v0.2.0 was released with many more metrics:
+    *   [https://github.com/kubernetes/kube-state-metrics#metrics](https://github.com/kubernetes/kube-state-metrics#metrics)
+
+
+## 2016-09-08
+
+
+### Agenda
+
+
+
+*   Sylvain Boily showing their monitoring solution
+
+
+### Notes
+
+
+
+*   Demo by Sylvain on their monitoring setup using InfluxDB+Grafana+Kapacitor
+    *   Scraping metrics from Heapster, Eventer, and apiserver
+*   Separation apiserver vs kube-state-metrics
+    *   The apiserver exposes metrics on /metrics about the running state of the apiserver process
+        *   How man requests came in from clients? What was their latency?
+        *   Outbound latency to the etcd cluster?
+    *   Kube-state-metrics aims to provide metrics on logical state of the entire Kubernetes cluster
+        *   How many deployments exist?
+        *   How many restarts did pod X have?
+        *   How many available/desired pods does a deployment have?
+        *   How much capacity does node X have?
+*   Separation Heapster vs [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics/commits/master)
+    *   Heapster holds metrics about characteristics about things running on Kubernetes, used by other system components.
+    *   Currently Heapster asks the Kubelet for cAdvisor metrics vs. kube-state-metrics collecting information from the apiserver
+*   Should eventer information be consolidated with kube-state-metrics?
+*   Should we look into the creation of a monitoring namespace / service for all other namespace to use? 
+*   Should monitoring be available out of the box with a k8s installation when done in a private datacenter ?
+
+
+## 2016-09-01
+
+
+### Agenda
+
+
+
+*   State of [Kubernetes monitoring at Soundcloud](https://drive.google.com/file/d/0B_br6xk3Iws3aGZ5NkFMMDRqRjhvM1p1RWZXbVF2aVhiWGZz/view?usp=sharing) (Matthias Rampke)
+*   Future of [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics)
+*   Application metric separation in cAdvisor ([https://github.com/google/cadvisor/issues/1420](https://github.com/google/cadvisor/issues/1420))
+*   ...
+
+
+### Notes
+
+
+
+*   Matthias Rampke giving an intro to their Kubernetes monitoring setup
+    *   Currently running Prometheus generally outside of Kubernetes
+        *   Easy migration path from previous infrastructure
+    *   Still using DNS as service discovery instead of Kubernetes API
+    *   Sharded Prometheus servers by team for application monitoring
+    *   Severe lack of metrics around Kubernetes cluster state itself
+    *   Long-term vision (1yr): all services and their dependencies running inside of Kubernetes
+        *   Prometheus part of that via a standard configuration
+        *   Easy to spin up monitoring new components
+*   People using Heapster as it gives them all metrics in one component
+*   Something as easy to deploy as Heapster would be useful
+*   Three sets of metrics
+    *   Those useful only for monitoring (e.g. number of pods)
+    *   Metrics for auto-scaling (CPU, custom app metrics)
+    *   Those that fit both
+*   Make Prometheus a first-class citizen/best practice for exposing custom auto-scaling metrics?
+*   Overlap between auto-scaling and monitoring metrics seems generally fine
+    *   storing them twice is okay, auto-scaling metrics are way fewer
+*   Kube-state-metrics
+    *   Keep it as a playground or fold it into controller manager?
+    *   
+
+
+## 2016-08-25
+
+
+### Notes
+
+
+
+*   CoreOS would like to see
+    *   more instrumentation as insight into cluster
+    *   Remove orthogonal features in for example cadvisor
+*   RedHat
+    *   Good out-of-the-box solution for cluster observability, component interaction
+    *   Collaboration with sig-autoscaling
+*   SoundCloud:
+    *   Prometheus originated at SoundCloud
+    *   Bare metal kubernetes setup: separation of monitoring
+    *   Separation of heapster and overall kubernetes architecture
+    *   How are people instrumenting around kubernetes
+*   Mirantis:
+    *   Scalability of monitoring solutions
+    *   More metadata from kubelet “stats” API: labels are missing for example
+    *   Also interested in “Separation of heapster and overall kubernetes architecture” (from SoundCloud)
+    *   Extended insight into OpenStack & Kubernetes
+    *   During our scalability tests we want to measure k8s behaviour in some set of defined metrics
+*   Intel:
+    *   Integration of snap into kubernetes
+    *   Help deliver monitoring goals
+
+Where should guides for flavors of monitoring live?
+
+→ ad hoc currently, not all the same
+
+→ best practices in the community
+
+Where are we and where do we want to do? → Google doc will be setup
+
+Next meeting: Discuss google doc & Matthias from SoundCloud will give an insight of how they are using Prometheus to monitor Kubernetes and its pain points.
+
+Next time will use Zoom as hangout limit is 10 participants.
+
+Kubernetes monitoring architecture (~~requires joining [https://groups.google.com/forum/#!forum/kubernetes-sig-node](https://groups.google.com/forum/#!forum/kubernetes-sig-node)~~): [https://docs.google.com/document/d/1HMvhhtV3Xow85iZdowJ7GMsryU6pvjOzruqcJYY9MMI/edit?ts=57b0eec1#heading=h.gav7ymlujqys](https://docs.google.com/document/d/1HMvhhtV3Xow85iZdowJ7GMsryU6pvjOzruqcJYY9MMI/edit?ts=57b0eec1#heading=h.gav7ymlujqys)
+