Skip to content

Conversation

eldering
Copy link
Member

@eldering eldering commented Nov 4, 2024

This replaces the memory.memsw.max_usage_in_bytes value from cgroup v1. We always assume it's there so we should fail if it is not. Otherwise things silently degrade in a way they would not have before.

Addresses #2763

@eldering eldering requested a review from meisterT November 4, 2024 21:24
@nickygerritsen
Copy link
Member

But it did 'work' without this thing, it just reported it as 0? I think it's better to document that than to not allow cgroupv2 at all (which is basically what this PR does) on older kernels (like the one in Ubuntu 22.04).

Or did it fully break for you?

@eldering
Copy link
Member Author

eldering commented Nov 4, 2024

But it did 'work' without this thing, it just reported it as 0? I think it's better to document that than to not allow cgroupv2 at all (which is basically what this PR does) on older kernels (like the one in Ubuntu 22.04).

Or did it fully break for you?

Yes, it was working because we allowed it to. But we didn't allow it in the cgroup v1 implementation either, so there you were guaranteed to always get real memory usage value. I think silently reporting zero when the kernel does not support it, creates a confusing situation (which happened to us at the BAPC). So better to explicitly bail out and let the user switch to cgroup v1. You're on an older kernel anyways.

@nickygerritsen
Copy link
Member

If others think the same I think we need to re update the docs to tell them how to switch to v1. I think we removed that now

@meisterT
Copy link
Member

meisterT commented Nov 5, 2024

If we do this (I don't feel strongly), then we should change the script that I touched in 8f8f4c0 from warning to error to get this error before you start judging.

@vmcj
Copy link
Member

vmcj commented Nov 5, 2024

Is this only a problem for people upgrading DOMjudge on their current OS or also for people installing a new server (Debian stable, current Ubuntu LTS)?
In case you upgrade you already have cgroupV1, keeping this in the docs that this still works seems fine for me (and crashing if we can't guarantee the same information is fine by me)

If we also do this for new installations on newer/current versions seems a bit hard for informational level of info (assuming that runguard still does its job and still terminates submissions taking too much memory) I'm not sure if the IMO minor decrease in memory is worth the discussion people need to have to have cgroupV1 enabled again with their sysadmin so I would at least give an override flag for this behaviour or an explicit question when the judgedaemon boots?

@nickygerritsen
Copy link
Member

Ubuntu 22.04 at least doesn't support it. It's not the latest LTS, but I think it's still widely used, especially since there is no ICPC 24.04 image yet (although it is coming).

@vmcj
Copy link
Member

vmcj commented Nov 9, 2024

Ubuntu 22.04 at least doesn't support it. It's not the latest LTS, but I think it's still widely used, especially since there is no ICPC 24.04 image yet (although it is coming).

In that case I think terminating like @eldering suggests is the better option but also adding an override flag for when you accept this (minor) decrease in performance if its only informational.

@eldering did the submissions with memory-limit version still get detected?

@eldering eldering force-pushed the error-missing-memory.peak branch from 8cf1538 to b3a8dda Compare November 22, 2024 14:25
``GRUB_CMDLINE_LINUX_DEFAULT="quiet cgroup_enable=memory swapaccount=1 isolcpus=2"``

On modern systems where cgroup v2 is available, DOMjudge will try to
use that. This requires kernel versions 5.20 or 6.0 or later to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
use that. This requires kernel versions 5.20 or 6.0 or later to
use that. This requires kernel versions 5.19 or 6.0 or later to

minor=$(echo "$kernel_version" | cut -d '.' -f 2)
if [ "$major" -lt 5 ] || { [ "$major" -eq 5 ] && [ "$minor" -lt 19 ]; }; then
echo "WARNING: Kernel ($kernel_version) is too old to record peak RAM usage" >&2
echo "Kernel ($kernel_version) is too old to record peak RAM usage with cgroup V2, trying cgroup V1" >&2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you not failing here with the cgroup_error_and_usage?

@eldering eldering force-pushed the error-missing-memory.peak branch from b3a8dda to 266cf7e Compare November 22, 2024 16:09
This replaces the `memory.memsw.max_usage_in_bytes` value from
cgroup v1. We always assume it's there so we should fail if it is
not. Otherwise things silently degrade in a way they would not have
before.

Also document this in the create_cgroups script and manual.

Addresses #2763
@eldering eldering force-pushed the error-missing-memory.peak branch from 266cf7e to 1224ad3 Compare November 22, 2024 16:39
@eldering eldering enabled auto-merge November 22, 2024 17:15
@eldering eldering added this pull request to the merge queue Nov 22, 2024
Merged via the queue into main with commit a0728be Nov 22, 2024
32 checks passed
@eldering eldering deleted the error-missing-memory.peak branch November 22, 2024 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants