Use PSS to monitor memory usage by boesr · Pull Request #23 · chpc-uofu/cgroup-warden

boesr · 2026-01-20T11:49:22Z

This PR introduces a new configuration option, CGROUP_WARDEN_IGNORE_CACHE, which changes how memory usage is calculated for cgroups. When enabled, the warden will calculate memory usage based on the sum of PSS (Proportional Set Size) from all processes within the cgroup, instead of relying on the default cgroup memory statistics which include the filesystem cache.

On systems with high I/O (e.g. export nodes for file transfers), the Linux kernel's page cache can grow significantly. Since cgroup v2 includes this cache in the memory.current statistics, the cgroup-warden might report high memory usage and trigger limits, even if the actual application memory (RSS/PSS) is well within limits.

Previously, users often received violation emails for memory usage that included the filesystem cache. However, since the cache was not explicitly shown or broken down in the attached usage diagrams, it was confusing for users to understand why they were flagged for a policy violation. By switching to PSS-based reporting, we ensure that the metrics and alerts align with the actual memory pressure caused by the user's processes.

jay-mckay · 2026-01-22T00:53:35Z

Thanks for the PR. We had not realized that the file cache could become as large as you have described. After some discussion, we believe that removing the filesystem cache from reported memory usage is the preferred way to account for memory usage, as it unpredictable and out of the users direct control. Because of this, we should modify the collectors for both V1 and V2 to use a summed PSS for the cgroup_warden_memory_usage_bytes value, much like you have done.

This will not need to be configured, so we can remove the extra configuration variable you have added. We can also streamline the summation of the process PSS by reusing the already collected process information, i.e.

var totalPSS float64 // new
for name, p := range procs {
    totalPSS += p.memoryPSSTotal // new
	ch <- prometheus.MustNewConstMetric(c.procCPU, prometheus.CounterValue, float64(p.cpuSecondsTotal), cg, info.Username, name)
	ch <- prometheus.MustNewConstMetric(c.procMemory, prometheus.GaugeValue, float64(p.memoryBytesTotal), cg, info.Username, name)
	ch <- prometheus.MustNewConstMetric(c.procPSS, prometheus.GaugeValue, float64(p.memoryPSSTotal), cg, info.Username, name)
	ch <- prometheus.MustNewConstMetric(c.procCount, prometheus.GaugeValue, float64(p.count), cg, info.Username, name)
}
ch <- prometheus.MustNewConstMetric(c.memoryUsage, prometheus.GaugeValue, totalPSS, cg, info.Username, name) // new

We can make these changes as well, if you would prefer.

jay-mckay

See #23 (comment)

boesr · 2026-01-22T09:25:19Z

I reverted the previous changes and adjusted the collector to calculate the total PSS. I still need to test it, but this includes the changes you requested.

boesr · 2026-01-22T09:43:51Z

I have just tested it on one of our export nodes. Everything seems to be working, and the cache is no longer included.

jay-mckay · 2026-01-23T18:45:33Z

Changes look good to me. I will do some testing on some of our systems as well to double check, and then we can merge this in.

boesr added 2 commits January 20, 2026 11:23

adds flag to deactivate cache

1c050f0

fixes reassignment of procs

8ce8556

jay-mckay self-requested a review January 21, 2026 22:27

jay-mckay requested changes Jan 22, 2026

View reviewed changes

makes cache ignore mandatory and removes parameter

52b4a80

jay-mckay approved these changes Jan 23, 2026

View reviewed changes

jay-mckay changed the title ~~Add support for ignoring memory cache via PSS summation (optionally)~~ Use PSS to monitor memory usage Jan 23, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use PSS to monitor memory usage#23

Use PSS to monitor memory usage#23
boesr wants to merge 3 commits intochpc-uofu:mainfrom
boesr:main

boesr commented Jan 20, 2026

Uh oh!

jay-mckay commented Jan 22, 2026

Uh oh!

jay-mckay left a comment

Uh oh!

boesr commented Jan 22, 2026

Uh oh!

boesr commented Jan 22, 2026

Uh oh!

jay-mckay commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

boesr commented Jan 20, 2026

Uh oh!

jay-mckay commented Jan 22, 2026

Uh oh!

jay-mckay left a comment

Choose a reason for hiding this comment

Uh oh!

boesr commented Jan 22, 2026

Uh oh!

boesr commented Jan 22, 2026

Uh oh!

jay-mckay commented Jan 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants