Skip to content

Commit de2b397

Browse files
authored
Merge pull request #5313 from ehashman/sig-inst-notes
Add sig-instrumentation meeting note archive
2 parents dcfbd53 + af9c863 commit de2b397

File tree

4 files changed

+1495
-0
lines changed

4 files changed

+1495
-0
lines changed
Lines changed: 380 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,380 @@
1+
## 2016-12-15
2+
3+
Agenda:
4+
5+
6+
7+
* Demo by Datadog (rescheduled)
8+
* Kubernetes Metric Conventions: [https://docs.google.com/document/d/1YVs02Li6QFCg8Th2Wa4z1u2NBlQHDp2dj3EdAt6uskE/edit#](https://docs.google.com/document/d/1YVs02Li6QFCg8Th2Wa4z1u2NBlQHDp2dj3EdAt6uskE/edit#)
9+
* Resource metrics API: looking towards beta
10+
* [https://docs.google.com/document/d/1t0G7OS6OP9qPndkkNROCu0pF3-vkDmzonmT-6gEWcx0/edit?ts=5852bda8](https://docs.google.com/document/d/1t0G7OS6OP9qPndkkNROCu0pF3-vkDmzonmT-6gEWcx0/edit?ts=5852bda8)
11+
12+
Notes:
13+
14+
15+
16+
* Put metric convention document somewhere visible for reference
17+
* [https://github.com/kubernetes/community/tree/master/contributors/devel](https://github.com/kubernetes/community/tree/master/contributors/devel)
18+
* Resource metrics API should be moved towards beta
19+
* To be finalized after holiday break
20+
* Working towards beta in 1.7
21+
* Custom metrics API:
22+
* [https://github.com/kubernetes/community/pull/152/files](https://github.com/kubernetes/community/pull/152/files)
23+
24+
25+
## 2016-12-08
26+
27+
**Warning: This meeting will be about logging. If you are not interested please skip.**
28+
29+
Agenda
30+
31+
32+
33+
* Restart LogDir proposal ([https://github.com/kubernetes/kubernetes/pull/13010](https://github.com/kubernetes/kubernetes/pull/13010))
34+
* Alternative [https://github.com/kubernetes/kubernetes/pull/33111](https://github.com/kubernetes/kubernetes/pull/33111)
35+
36+
Meeting notes: [https://gist.github.com/leahnp/463501f6dfe39f6f21ea5d3ebcb787d7](https://gist.github.com/leahnp/463501f6dfe39f6f21ea5d3ebcb787d7)
37+
38+
39+
## 2016-12-01
40+
41+
42+
### Agenda
43+
44+
45+
46+
* Heapster needs your help
47+
* [sross] Need to come up with map of sinks to maintainers
48+
* Maybe consider dropping sinks without mainters
49+
* [sross] need statement of plans for Heapster
50+
* [sross] putting into maintenance mode, what does maintenance mode entail, should we continue accepting sinks?
51+
* [piosz] to write something up and send out
52+
* [mwringe] what is plan for timeline for monitoring pipeline work
53+
* [piosz] plan is starting work Q2 2017, unless anyone else can help
54+
* [piosz] major missing component is discovery summarizer
55+
* [sross] we (Red Hat) are willing to help out in this area
56+
57+
58+
## [Cancelled] 2016-11-24: Thanksgiving in US
59+
60+
61+
## [Cancelled] 2016-11-17: no meeting week
62+
63+
64+
## [Cancelled] 2016-11-10: Kubecon
65+
66+
67+
## [Cancelled] 2016-11-03
68+
69+
70+
## 2016-10-27
71+
72+
73+
### Agenda
74+
75+
76+
77+
* F2f meeting about monitoring in Seattle during KubeCon (on Monday Nov 7th)
78+
79+
80+
## 2016-10-20
81+
82+
**Warning: This meeting will be about logging. If you are not interested please skip.**
83+
84+
85+
### Agenda
86+
87+
88+
89+
* f2f meeting about logging in Seattle during KubeCon (probably on Monday Nov 7th)
90+
* There is going to be a kubernetes dev summit (Nov 10th) meeting for logging
91+
* Group administrivia: frequency? Length? Topics?
92+
* Current state of logging in Kubernetes
93+
* What’s going on with logging?
94+
95+
Notes
96+
97+
Developers Summit - 45 minute unconference topic on the future of logging
98+
99+
- moderated by Vishnu and Patrick
100+
101+
- open to anyone who is attending the Kubernetes Developers Conference
102+
103+
Discussion of Face to Face meeting - Piotr and Patrick to sync up offline
104+
105+
Frequency: every three weeks, going to skip next week/push back one week next meeting is during KubeCon Developers Summit.
106+
107+
- There will be an announcement for exactly when the next meeting is
108+
109+
Logging Discussion Topics:
110+
111+
- logging volumes (proposal started by David Cowden -[ https://docs.google.com/document/d/1K2hh7nQ9glYzGE-5J7oKBB7oK3S_MKqwCISXZK-sB2Q/edit#](https://docs.google.com/document/d/1K2hh7nQ9glYzGE-5J7oKBB7oK3S_MKqwCISXZK-sB2Q/edit#))
112+
113+
- hot loop logging and verbosity for scalability issues.
114+
115+
- how to detect spammy instances
116+
117+
- how to not let this wreck the cluster
118+
119+
- general dissatisfaction with the logging facility
120+
121+
- structured logging kubernetes wide for consistent consumption
122+
123+
- application log type detection
124+
125+
- what metadata do we need to carry through a logging pipeline to id a source system (e.g. mysql, user application)
126+
127+
- what do logging vendors need supplied to aid in this
128+
129+
Current logging pipelines
130+
131+
- fluentd direct to GCP or ES
132+
133+
- fluentd to kafka to fluentd to ES
134+
135+
Action Items
136+
137+
- Piotr & Patrick to determine f2f details
138+
139+
- Try and get logging vendors to join the SIG
140+
141+
142+
## [Cancelled] 2016-10-13
143+
144+
145+
## 2016-10-06
146+
147+
148+
### Agenda
149+
150+
151+
152+
* No response from sig api machinery (moving to next meeting)
153+
* Continue discussion on monitoring architecture
154+
* Agreed to versioned, well-defined API
155+
* Rest API vs. Query Language
156+
* A webhook model was suggested for the APIs (like Auth in Kube today)
157+
* [sross] has concerns over discoverability of webhooks
158+
* Webhook vs API server is largely an implementation question
159+
* will decide on discovery vs webhook for consumption once we get the API design in place
160+
* [sross] will propose an API design for the custom metrics API and historical metrics API
161+
* Discuss [roadmap](https://docs.google.com/document/d/1j6uHkU8m6GvElNKCJdBN8KrejkUzVbp2l0zTyeSxrl8/edit)
162+
* Discussed briefly, please go read afterwards
163+
* [sross] to lead push on custom metrics design/implementation for 1.5
164+
* 1.5 API features will be mainly implemented in terms of Heapster
165+
* looking forward for one-click install of 3rd party monitoring (possibly Prometheus, but as an out of the box, one command setup; possible choices for deployment: helm, kpm)
166+
* Logging discussion feasibility conversation (ie: is this a reasonable location for having discussions about logging)
167+
* This may be a reasonable place for logging discussions, if we explicitly note which meetings will discuss logging (and/or when logging will be discussed)
168+
* May also just want to create a separate SIG
169+
* [decarr] mentioned CRI discussion on logging and metrics
170+
* Outcome was that we should sync with SIG node on that, but it should probably stay more in SIG node
171+
172+
173+
## 2016-09-29
174+
175+
176+
### Agenda
177+
178+
179+
180+
* Discuss [Kubernetes monitoring architecture proposal ](https://docs.google.com/document/d/1z7R44MUz_5gRLwsVH0S9rOy8W5naM9XE5NrbeGIqO2k/edit#)
181+
*
182+
183+
184+
### Notes
185+
186+
187+
188+
* Main metrics pipeline used by Kubernetes components
189+
* Separate operator-defined monitoring pipeline for user-exposed monitoring
190+
* Generally collects core metrics redundantly/independently
191+
* Should it be possible to implement the core metrics pipeline on top of the custom monitoring system
192+
* As long as one implements the core metrics API, one could swap it out for scheduler etc.
193+
* Upstream Kubernetes would test against the stable core pipeline
194+
* Replaceable != Pluggable – the entire thing gets replaced in a custom scenario
195+
* Master Metrics API part of main Kubernetes API
196+
* Should further APIs like for historic metrics also be in that group?
197+
* Discussion for sig-apimachinery
198+
* Should Infrastore be part of core Kubernetes
199+
* Provides historic time series data about the system
200+
* Would require implementing a subset of a TSDB
201+
* Not an implemented component, just an API
202+
203+
204+
205+
* What are core metrics exactly?
206+
* CPU, memory, disk
207+
* What about network and ingress?
208+
* Resource estimator would not read from master metrics API but collect information itself (e.g. from kubelet)
209+
210+
211+
## 2016-09-22
212+
213+
214+
### Agenda
215+
216+
217+
218+
* Mission statement: [https://docs.google.com/document/d/15Q47xbYTGHEZ-wVULGSgOSD5Kq-OehJj-MEChVH1kqk/edit?usp=sharing](https://docs.google.com/document/d/15Q47xbYTGHEZ-wVULGSgOSD5Kq-OehJj-MEChVH1kqk/edit?usp=sharing)
219+
* Kubesnap demo
220+
221+
222+
### Notes
223+
224+
225+
226+
* Kubesnap demo by Andrzej Kuriata, Intel ([slides](https://docs.google.com/presentation/d/1fgGik1nq-yEN7Y2dRIQWTjb7r5HEWaG9paDCdvzE_IA/edit?usp=sharing)):
227+
* Daemon set in k8s
228+
* Integration with Heapster
229+
* Mission Statement:
230+
* Enough people to coordinate, but small enough to be focused
231+
* List of people actually doing development/design in the scope of this sig
232+
* Scratchpad before a meeting to set discussions of features before meeting
233+
* Sig autoscaling discussed and committed to features/metrics in previous meetings
234+
* A plan for an api for 1.5?
235+
236+
237+
## 2016-09-15
238+
239+
240+
### Agenda
241+
242+
243+
244+
* Presentation by Eric Lemoine (Mirantis): monitoring Kubernetes with [Snap](http://snap-telemetry.io/) and [Hindsight](https://github.com/trink/hindsight). [Slides](https://docs.google.com/presentation/d/1XWM0UmuYdcP_VsbKg6yiSDb6TR1JmouHdZAnLelBWXg/edit?usp=sharing)
245+
* Meeting frequency
246+
* Ownership SIG instrumentation vs SIG autoscaling
247+
* [Discuss how to export pod labels for cAdvisor metrics (see kubernetes/kubernetes#32326)](https://github.com/trink/hindsight)
248+
249+
250+
### Notes
251+
252+
253+
254+
* Meeting frequency - defer until ownership clarified
255+
* Ownership SIG autoscaling vs instrumentation
256+
* Triggering issue: [https://github.com/kubernetes/kubernetes/issues/31784](https://github.com/kubernetes/kubernetes/issues/31784)
257+
* HPA is consumer of Master Metrics API (also kubectl top, scheduler, UI)
258+
* Could potentially be relevant to monitoring as well
259+
* Make distinction between metrics used by the cluster and metrics about the cluster
260+
* One SIG lead cares about system level metrics, one about the external/monitoring side. Good setup for the SIG to handle both areas?
261+
* Follow up with mission statement on the mailing list taking these things into account
262+
* Kube-state-metrics v0.2.0 was released with many more metrics:
263+
* [https://github.com/kubernetes/kube-state-metrics#metrics](https://github.com/kubernetes/kube-state-metrics#metrics)
264+
265+
266+
## 2016-09-08
267+
268+
269+
### Agenda
270+
271+
272+
273+
* Sylvain Boily showing their monitoring solution
274+
275+
276+
### Notes
277+
278+
279+
280+
* Demo by Sylvain on their monitoring setup using InfluxDB+Grafana+Kapacitor
281+
* Scraping metrics from Heapster, Eventer, and apiserver
282+
* Separation apiserver vs kube-state-metrics
283+
* The apiserver exposes metrics on /metrics about the running state of the apiserver process
284+
* How man requests came in from clients? What was their latency?
285+
* Outbound latency to the etcd cluster?
286+
* Kube-state-metrics aims to provide metrics on logical state of the entire Kubernetes cluster
287+
* How many deployments exist?
288+
* How many restarts did pod X have?
289+
* How many available/desired pods does a deployment have?
290+
* How much capacity does node X have?
291+
* Separation Heapster vs [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics/commits/master)
292+
* Heapster holds metrics about characteristics about things running on Kubernetes, used by other system components.
293+
* Currently Heapster asks the Kubelet for cAdvisor metrics vs. kube-state-metrics collecting information from the apiserver
294+
* Should eventer information be consolidated with kube-state-metrics?
295+
* Should we look into the creation of a monitoring namespace / service for all other namespace to use?
296+
* Should monitoring be available out of the box with a k8s installation when done in a private datacenter ?
297+
298+
299+
## 2016-09-01
300+
301+
302+
### Agenda
303+
304+
305+
306+
* State of [Kubernetes monitoring at Soundcloud](https://drive.google.com/file/d/0B_br6xk3Iws3aGZ5NkFMMDRqRjhvM1p1RWZXbVF2aVhiWGZz/view?usp=sharing) (Matthias Rampke)
307+
* Future of [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics)
308+
* Application metric separation in cAdvisor ([https://github.com/google/cadvisor/issues/1420](https://github.com/google/cadvisor/issues/1420))
309+
* ...
310+
311+
312+
### Notes
313+
314+
315+
316+
* Matthias Rampke giving an intro to their Kubernetes monitoring setup
317+
* Currently running Prometheus generally outside of Kubernetes
318+
* Easy migration path from previous infrastructure
319+
* Still using DNS as service discovery instead of Kubernetes API
320+
* Sharded Prometheus servers by team for application monitoring
321+
* Severe lack of metrics around Kubernetes cluster state itself
322+
* Long-term vision (1yr): all services and their dependencies running inside of Kubernetes
323+
* Prometheus part of that via a standard configuration
324+
* Easy to spin up monitoring new components
325+
* People using Heapster as it gives them all metrics in one component
326+
* Something as easy to deploy as Heapster would be useful
327+
* Three sets of metrics
328+
* Those useful only for monitoring (e.g. number of pods)
329+
* Metrics for auto-scaling (CPU, custom app metrics)
330+
* Those that fit both
331+
* Make Prometheus a first-class citizen/best practice for exposing custom auto-scaling metrics?
332+
* Overlap between auto-scaling and monitoring metrics seems generally fine
333+
* storing them twice is okay, auto-scaling metrics are way fewer
334+
* Kube-state-metrics
335+
* Keep it as a playground or fold it into controller manager?
336+
*
337+
338+
339+
## 2016-08-25
340+
341+
342+
### Notes
343+
344+
345+
346+
* CoreOS would like to see
347+
* more instrumentation as insight into cluster
348+
* Remove orthogonal features in for example cadvisor
349+
* RedHat
350+
* Good out-of-the-box solution for cluster observability, component interaction
351+
* Collaboration with sig-autoscaling
352+
* SoundCloud:
353+
* Prometheus originated at SoundCloud
354+
* Bare metal kubernetes setup: separation of monitoring
355+
* Separation of heapster and overall kubernetes architecture
356+
* How are people instrumenting around kubernetes
357+
* Mirantis:
358+
* Scalability of monitoring solutions
359+
* More metadata from kubelet “stats” API: labels are missing for example
360+
* Also interested in “Separation of heapster and overall kubernetes architecture” (from SoundCloud)
361+
* Extended insight into OpenStack & Kubernetes
362+
* During our scalability tests we want to measure k8s behaviour in some set of defined metrics
363+
* Intel:
364+
* Integration of snap into kubernetes
365+
* Help deliver monitoring goals
366+
367+
Where should guides for flavors of monitoring live?
368+
369+
→ ad hoc currently, not all the same
370+
371+
→ best practices in the community
372+
373+
Where are we and where do we want to do? → Google doc will be setup
374+
375+
Next meeting: Discuss google doc & Matthias from SoundCloud will give an insight of how they are using Prometheus to monitor Kubernetes and its pain points.
376+
377+
Next time will use Zoom as hangout limit is 10 participants.
378+
379+
Kubernetes monitoring architecture (~~requires joining [https://groups.google.com/forum/#!forum/kubernetes-sig-node](https://groups.google.com/forum/#!forum/kubernetes-sig-node)~~): [https://docs.google.com/document/d/1HMvhhtV3Xow85iZdowJ7GMsryU6pvjOzruqcJYY9MMI/edit?ts=57b0eec1#heading=h.gav7ymlujqys](https://docs.google.com/document/d/1HMvhhtV3Xow85iZdowJ7GMsryU6pvjOzruqcJYY9MMI/edit?ts=57b0eec1#heading=h.gav7ymlujqys)
380+

0 commit comments

Comments
 (0)