Add FreePages to HugePagesInfo for hugepage availability reporting #3804

srikalyan · 2025-12-24T17:04:18Z

Summary

Adds FreePages *uint64 field to HugePagesInfo struct, populated from /sys/devices/system/node/node<N>/hugepages/hugepages-<size>kB/free_hugepages
Uses pointer type with omitempty to distinguish between "0 free pages" and "data unavailable"
Adds machine_node_hugepages_free Prometheus metric to expose free hugepage count per NUMA node
Enables consumers like the Kubernetes Memory Manager to verify actual hugepage availability during pod admission

Motivation

The Kubernetes Static Memory Manager currently only tracks hugepage allocations for Guaranteed QoS pods. However, Burstable and BestEffort pods can consume hugepages (via hugetlbfs mounts or mmap with MAP_HUGETLB) without being tracked. This causes Guaranteed pods to be admitted based on stale allocation data, only to fail at runtime when hugepages are exhausted.

By exposing free_hugepages from sysfs, consumers can verify actual OS-reported availability before making admission decisions.

Design

The field uses *uint64 with omitempty (following v2 convention) to distinguish:

nil: free_hugepages data unavailable (file missing or unreadable)
0: zero free hugepages available
N: N free hugepages available

This allows consumers to detect when the data isn't available and fall back appropriately.

Note: Since GetMachineInfo() is cached at startup, the FreePages value represents point-in-time data. Consumers requiring real-time availability may need to read sysfs directly or use a dedicated fresh-read method (pending KEP outcome).

Prometheus Metric

New metric machine_node_hugepages_free exposes free hugepage count:

# HELP machine_node_hugepages_free Number of free hugepages on NUMA node.
# TYPE machine_node_hugepages_free gauge
machine_node_hugepages_free{node_id="0",page_size="2048",...} 512
machine_node_hugepages_free{node_id="1",page_size="1048576",...} 2

Labels match machine_node_hugepages_count for easy correlation. The metric is only emitted when FreePages data is available (nil-safe).

Test Plan

Added unit tests for GetHugePagesFree() in sysfs
Updated TestGetHugePagesInfo to verify FreePages is correctly populated
Verified JSON serialization with omitempty behavior
Added TestGetHugePagesFree() for Prometheus metric extraction
Updated TestPrometheusMachineCollector expected output
All existing tests pass

KEP: Memory Manager Hugepages Availability Verification kubernetes/enhancements#5759 (KEP-5759: Memory Manager Hugepages Availability Verification)
Kubernetes Issue: Static Memory Manager doesn't verify OS reported available huge pages during pod admission kubernetes/kubernetes#134395
KEP PR: KEP-5759: Memory Manager Hugepages Availability Verification kubernetes/enhancements#5753

This change adds a FreePages field to HugePagesInfo, populated from /sys/devices/system/node/node<N>/hugepages/hugepages-<size>kB/free_hugepages This enables consumers like the Kubernetes Memory Manager to verify actual hugepage availability during pod admission, rather than only tracking allocations which can miss consumption by untracked workloads. The field uses *uint64 with omitempty to distinguish between: - nil: free_hugepages data unavailable (file missing or unreadable) - 0: zero free hugepages available - N: N free hugepages available Related: kubernetes/kubernetes#134395

This KEP proposes enhancing the Memory Manager's Static policy to verify OS-reported free hugepages availability during pod admission. Problem: The Memory Manager only tracks hugepage allocations for Guaranteed QoS pods. Burstable/BestEffort pods can consume hugepages without being tracked, causing subsequent Guaranteed pods to be admitted but fail at runtime when hugepages are exhausted. Solution: - Add FreePages field to cadvisor's HugePagesInfo (PR google/cadvisor#3804) - Verify OS-reported free hugepages during Allocate() in Static policy - Reject pods when insufficient free hugepages are available Related: kubernetes/kubernetes#134395

srikalyan · 2025-12-27T16:58:33Z

Based on KEP review feedback, I'm considering changing FreePages from *uint64 to uint64.

Rationale: On Linux systems with hugepages configured, the sysfs interface (/sys/devices/system/node/node<N>/hugepages/hugepages-<size>kB/free_hugepages) is always available. We don't need to distinguish between "0 free hugepages" and "data unavailable" since sysfs won't be unavailable.

Current implementation: Uses *uint64 with omitempty to distinguish nil (unavailable) from 0 (zero free).

Proposed change: Use plain uint64. A value of 0 simply means zero free hugepages.

What are your thoughts on this? I'm happy to update the PR either way based on cadvisor's conventions and your preference.

cc @iwankgb

iwankgb · 2026-01-16T16:56:38Z

@srikalyan, are you planning to expose this metric via Prometheus endpoint?

srikalyan · 2026-01-17T21:48:22Z

@srikalyan, are you planning to expose this metric via Prometheus endpoint?

I would love to do it waiting on the KEP. If KEP is not a blocker, I'm happy to help doing it soon

Exposes free hugepage count per NUMA node via Prometheus endpoint: - Adds machine_node_hugepages_free gauge metric with node_id and page_size labels - Only emits metrics when FreePages data is available (nil-safe) - Follows same pattern as existing machine_node_hugepages_count metric This enables monitoring and alerting on hugepage availability across NUMA nodes, complementing the HugePagesInfo.FreePages field added in the previous commit.

srikalyan · 2026-01-21T17:34:26Z

@iwankgb Done! I've added the machine_node_hugepages_free Prometheus metric in the latest commit.

The metric follows the same pattern as machine_node_hugepages_count:

Name: machine_node_hugepages_free
Type: Gauge
Labels: node_id, page_size (same as machine_node_hugepages_count)
Behavior: Only emits when FreePages data is available (nil-safe)

Example output:

# HELP machine_node_hugepages_free Number of free hugepages on NUMA node.
# TYPE machine_node_hugepages_free gauge
machine_node_hugepages_free{node_id="0",page_size="2048",...} 512
machine_node_hugepages_free{node_id="1",page_size="1048576",...} 2

This enables monitoring/alerting on hugepage availability alongside the existing total count metric.

This was referenced Dec 24, 2025

KEP-5759: Memory Manager Hugepages Availability Verification kubernetes/enhancements#5753

Open

Memory Manager Hugepages Availability Verification kubernetes/enhancements#5759

Open

srikalyan marked this pull request as draft December 27, 2025 17:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add FreePages to HugePagesInfo for hugepage availability reporting #3804

Add FreePages to HugePagesInfo for hugepage availability reporting #3804

Uh oh!

srikalyan commented Dec 24, 2025 •

edited

Loading

Uh oh!

srikalyan commented Dec 27, 2025

Uh oh!

iwankgb commented Jan 16, 2026

Uh oh!

srikalyan commented Jan 17, 2026

Uh oh!

srikalyan commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add FreePages to HugePagesInfo for hugepage availability reporting #3804

Are you sure you want to change the base?

Add FreePages to HugePagesInfo for hugepage availability reporting #3804

Uh oh!

Conversation

srikalyan commented Dec 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Design

Prometheus Metric

Test Plan

Related

Uh oh!

srikalyan commented Dec 27, 2025

Uh oh!

iwankgb commented Jan 16, 2026

Uh oh!

srikalyan commented Jan 17, 2026

Uh oh!

srikalyan commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

srikalyan commented Dec 24, 2025 •

edited

Loading