Skip to content

Commit beaebac

Browse files
authored
Merge pull request kubernetes#3439 from jsturtevant/add-windows-podstats-details
Add Windows pod sandbox stats information to KEP 2371 - cri pod container stats
2 parents e634c15 + 4dbc00a commit beaebac

File tree

1 file changed

+149
-1
lines changed
  • keps/sig-node/2371-cri-pod-container-stats

1 file changed

+149
-1
lines changed

keps/sig-node/2371-cri-pod-container-stats/README.md

Lines changed: 149 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
- [cAdvisor Metrics Endpoint](#cadvisor-metrics-endpoint)
2929
- [CRI implementations](#cri-implementations)
3030
- [cAdvisor](#cadvisor-1)
31+
- [Windows](#windows)
3132
- [Test Plan](#test-plan)
3233
- [Prerequisite testing updates](#prerequisite-testing-updates)
3334
- [Unit tests](#unit-tests)
@@ -273,6 +274,32 @@ Thus, this KEP largely the plan described [here](#plan), with some changes:
273274

274275
#### Open Questions
275276
1. For the newly introduced CRI API fields, should there be Windows and Linux specific fields?
277+
For the alpha implementation, [PR #102789](https://github.com/kubernetes/kubernetes/pull/102789) added a Linux specific field
278+
`LinuxPodSandboxStats`. It also added Windows Specific field called `WindowsPodSandboxStats` but left it blank.
279+
280+
Using the `WindowsPodSandboxStats` stats struct we will create new Windows specific fields that make sense for Windows stats. The
281+
motivation behind this is Windows has differences in stats that are specific to its OS and doesn't currently fill certain fields (in some
282+
cases cannot such as `rss_bytes`). By adopting a Windows Specific set of stats it will allow for flexibity and customization in the
283+
future.
284+
285+
A challenge with new `WindowsPodSandboxStats` with custom fields is that the current calls to CRI endpoint `ListContainerStats` use the
286+
exiting `ContainerStats` object which is not Windows/Linux specific. In the case of the CRI call `ListPodSandboxStats` it would
287+
currently return the new Windows specific stats propose here. There will be a miss match in names of the fields but the underlying
288+
values will be the same. The biggest difference initially will be that fields that were left blank in `ContainerStats` will not be on
289+
the new Windows specific structs.
290+
291+
Making changes to `ListContainerStats` in backwards compatible way to use the Windows Specific stat structs is not in scope for this KEP.
292+
293+
Alternatives include:
294+
295+
- Add fields to `PodSandboxStats` which will be used by both Linux and Windows. Windows and
296+
Linux would both use this field filling in stats that make most sense for them. Windows is currently missing several stats but can
297+
reasonably fill in many of the missing fields (see the [Windows](#windows) section below). This would be similar approach that is use
298+
today for the current CRI `ListContainerStats` call and this would make logic in kubelet more straight forward. This may have made sense
299+
if `LinuxPodSandboxStats` wasn't already created but would require re-work in kubelet and doesn't provide the flexibility for future customization based on OS.
300+
- Leave the two fields `LinuxPodSandboxStats` and `WindowsPodSandboxStats` but use same fields. This might be the easiest but is counter-intuitive and still adds complexity to the Kubelet implementation. We also end up with Linux specific fields in the `WindowsPodSandboxStats`.
301+
302+
See more information in the [ContainerStats](#containerstats-additions) and [Windows](#windows) sections below which gives details on proposal and Windows stats differences.
276303

277304
### Risks and Mitigations
278305

@@ -433,7 +460,16 @@ message LinuxPodSandboxStats {
433460
434461
// WindowsPodSandboxStats provides the resource usage statistics for a pod sandbox on windows
435462
message WindowsPodSandboxStats {
436-
// TODO: Add stats relevant to windows.
463+
// CPU usage gathered for the pod sandbox.
464+
WindowsCpuUsage cpu = 1;
465+
// Memory usage gathered for the pod sandbox.
466+
WindowsMemoryUsage memory = 2;
467+
// Network usage gathered for the pod sandbox
468+
WindowsNetworkUsage network = 3;
469+
// Stats pertaining to processes in the pod sandbox.
470+
WindowsProcessUsage process = 4;
471+
// Stats of containers in the measured pod sandbox.
472+
repeated WindowsContainerStats containers = 5;
437473
}
438474
439475
// NetworkUsage contains data about network resources.
@@ -467,6 +503,58 @@ message ProcessUsage {
467503
// Number of processes.
468504
UInt64Value process_count = 2;
469505
}
506+
507+
// Windows specific fields. Many of these will look the same initially
508+
// this leave the ability to customize between Linux and Windows in the future
509+
// Adding only fields we currently populate
510+
message WindowsCpuUsage {
511+
int64 timestamp = 1;
512+
UInt64Value usage_core_nano_seconds = 2;
513+
UInt64Value usage_nano_cores = 3;
514+
}
515+
516+
// MemoryUsage provides the memory usage information.
517+
message WindowsMemoryUsage {
518+
int64 timestamp = 1;
519+
// The amount of working set memory in bytes.
520+
UInt64Value working_set_bytes = 2;
521+
UInt64Value available_bytes = 3;
522+
UInt64Value page_faults = 6;
523+
}
524+
525+
message WindowsNetworkUsage {
526+
// The time at which these stats were updated.
527+
int64 timestamp = 1;
528+
WindowsNetworkInterfaceUsage default_interface = 2;
529+
repeated WindowsNetworkInterfaceUsage interfaces = 3;
530+
}
531+
532+
message WindowsNetworkInterfaceUsage {
533+
string name = 1;
534+
UInt64Value rx_bytes = 2;
535+
UInt64Value rx_packets_dropped = 3;
536+
UInt64Value tx_bytes = 4;
537+
UInt64Value tx_packets_dropped = 5;
538+
}
539+
540+
message WindowsProcessUsage {
541+
int64 timestamp = 1;
542+
UInt64Value process_count = 2;
543+
}
544+
545+
message WindowsContainerStats {
546+
ContainerAttributes attributes = 1;
547+
WindowsCpuUsage cpu = 2;
548+
WindowsMemoryUsage memory = 3;
549+
WindowsFilesystemUsage writable_layer = 4;
550+
}
551+
552+
message WindowsFilesystemUsage {
553+
int64 timestamp = 1;
554+
FilesystemIdentifier fs_id = 2;
555+
UInt64Value used_bytes = 3;
556+
}
557+
470558
```
471559

472560
##### ContainerMetrics additions
@@ -574,6 +662,66 @@ Below is the proposed strategy for doing so:
574662

575663
As a requirement for the Beta stage, cAdvisor must support optionally collecting and broadcasting these metrics, similarly to the changes needed for summary API.
576664

665+
#### Windows
666+
667+
Windows currently does a best effort at filling out the stats in `/stats/summary` and misses some stats either because those are not exposed or they are not supported.
668+
669+
Another aspect for Windows to consider is that work is being done to create [Hyper-v containers](https://github.com/containerd/containerd/issues/6862). We will want to make sure we have an intersection of stats that support Hyper-v as well.
670+
It was [discussed](https://github.com/kubernetes/kubernetes/pull/110754#issuecomment-1176531055) if we want a separate set of stats specific for Window Hyper-v vs process isolated but decided that
671+
these stats should be generic and not expose implementation details of the pod sandbox.
672+
More detailed stats could be collected by external tools if required.
673+
674+
The current set of stats that are used by windows in the `ListContainerStats` API:
675+
676+
**cpu usage** - https://github.com/microsoft/hcsshim/blob/master/cmd/containerd-shim-runhcs-v1/stats/stats.proto
677+
678+
| field | type | process isolated field | hyperv filed | notes |
679+
| -------------------------- | ------ | ---------------------- | ---------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
680+
| timestamp | int64 ||| |
681+
| usage\_core\_nano\_seconds | uint64 | TotalRuntimeNS | TotalRuntimeNS | // Cumulative CPU usage (sum across all cores) since object creation. |
682+
| usage\_nano\_cores | uint64 | calculated value | calculated value | calculated value done runtime (containerd or kubelet)<br><br> // Total CPU usage (sum of all cores) averaged over the sample window.  <br> // The "core" unit can be interpreted as CPU core-nanoseconds per second. |
683+
684+
**Memory usage** - https://github.com/microsoft/hcsshim/blob/master/cmd/containerd-shim-runhcs-v1/stats/stats.proto
685+
686+
| field | type | process isolated field | hyperv filed | notes |
687+
| ------------------- | ------ | ------------------------------------------- | -------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
688+
| timestamp | int64 ||| |
689+
| working\_set\_bytes | uint64 | memory\_usage\_private\_working\_set\_bytes | working\_set\_bytes? | // The amount of working set memory in bytes.<br><br>Is hyper-v working\_set\_bytes same as private\_working\_set\_bytes |
690+
| available\_bytes | uint64 | | available\_memory |   We should be able to return this. It is the limit set on the job object. https://docs.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-jobobject_extended_limit_information <br><br>  // Available memory for use. This is defined as the memory limit - workingSetBytes.<br><br> |
691+
| usage\_bytes | uint65 | ? | ? | // Total memory in use. This includes all memory regardless of when it was accessed. Is this cumultive memory usage? |
692+
| rss\_bytes | uint66 | n/a | n/a | windows doesn't have rss. Cannot report rss |
693+
| page\_faults | uint67 | | | not reported. It may be possible use `TotalPageFaultCount` from https://docs.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-jobobject_basic_accounting_information |
694+
| major\_page\_faults | uint68 | | | not reported. Windows does not make a distinction here |
695+
696+
**Process isolated** also has
697+
698+
- memory_usage_commit_bytes
699+
- memory_usage_commit_peak_bytes
700+
701+
**Hyperv** also has
702+
703+
- virtual_node_count
704+
- available_memory_buffer
705+
- reserved_memory
706+
- assigned_memory
707+
- slp_active
708+
- balancing_enabled
709+
- dm_operation_in_progress
710+
711+
These are not used currently but are be very specific to each implementation and could be collected by specialized tools on a as needed basis.
712+
713+
**Network stats**
714+
| field | type | process isolated field | hyperv filed | notes |
715+
| ---------- | ------ | ---------------------- | ------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
716+
| timestamp | int64 ||| |
717+
|  name | uint64 | EndpointID | | same values can be used for hyperv |
718+
| rx\_bytes | uint65 | BytesReceived | | same values can be used for hyperv |
719+
| rx\_errors | uint66 | | | can we use DroppedPacketsIncoming. https://github.com/microsoft/hcsshim/blob/949e46a1260a6aca39c1b813a1ead2344ffe6199/internal/hns/hnsendpoint.go#L65 |
720+
| tx\_bytes | uint67 | BytesSent | | same values can be used for hyperv |
721+
| tx\_errors | uint68 | | | should use DroppedPacketsOutgoing |
722+
723+
Based on the above settings, we should stay conservative and expose the existing set of working overlapping stats. This is what is proposed in the [changes cri](#cri-implementation)
724+
577725
### Test Plan
578726

579727
<!--

0 commit comments

Comments
 (0)