-
Notifications
You must be signed in to change notification settings - Fork 445
perf: replace platform-specific CPU/memory probing with gopsutil #2306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 1 commit
2eaee9d
eeeb4ef
e0c0b3e
7d87306
663c71c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| package cpu | ||
|
|
||
| import ( | ||
| "os" | ||
| "runtime" | ||
|
|
||
| "github.com/shirou/gopsutil/v4/process" | ||
| ) | ||
|
|
||
| // Use existing cpuCount from the package | ||
| var cpuCount = runtime.GOMAXPROCS(0) | ||
|
|
||
| // ProbeLoad checks if the current process CPU usage is below a threshold relative to available cores | ||
| // Returns true if process CPU usage is NOT too high | ||
| // Returns false if process is already under heavy CPU load | ||
| func ProbeLoad(maxLoadFactor float64) bool { | ||
| // Get the current process | ||
| pid := os.Getpid() | ||
| proc, err := process.NewProcess(int32(pid)) | ||
| if err != nil { | ||
| return false // Error getting process, don't scale | ||
| } | ||
|
|
||
| // Get CPU percent for this process | ||
| // Note: this returns percent across all cores, so 100% per core * number of cores is the max | ||
| cpuPercent, err := proc.CPUPercent() | ||
| if err != nil { | ||
| return false // Error getting CPU usage, don't scale | ||
| } | ||
|
|
||
| // Calculate maximum CPU percentage allowed based on the factor and available cores | ||
| // For example, with 4 cores and maxLoadFactor of 0.7, maxAllowedPercent would be 280% | ||
| // (representing 70% utilization of all cores) | ||
| maxAllowedPercent := float64(cpuCount) * 100.0 * maxLoadFactor | ||
|
|
||
| // Return true if CPU usage is below threshold (scaling is recommended) | ||
| // Return false if CPU usage is already high (avoid scaling) | ||
| return cpuPercent < maxAllowedPercent | ||
| } | ||
This file was deleted.
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,15 @@ | ||
| package memory | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe could we totally remove this package and inline the call to the lib?
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For sure. The memory package is gone, now and calls have been inlined. |
||
|
|
||
| import ( | ||
| "github.com/shirou/gopsutil/v4/mem" | ||
| ) | ||
|
|
||
| // TotalSysMemory returns available system memory in bytes | ||
| func TotalSysMemory() (uint64, error) { | ||
| vmStat, err := mem.VirtualMemory() | ||
| if err != nil { | ||
| return 0, err | ||
| } | ||
|
|
||
| return vmStat.Available, nil | ||
| } | ||
This file was deleted.
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does proc.CPUPercent() acutally work? Doesn't it take the total CPU time since the process was created? So you'd still have to wait for some time between
NewProcessandCPUPercent.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, my bad, not sure how I missed this,
CPUPercent()returns the lifetime average since process start, which is useless.Switched to
Percent(0)which returns the CPU delta since the last call (non-blocking): https://pkg.go.dev/github.com/shirou/gopsutil/v4/process#Process.PercentThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't you still have to call
cpuProc.Percent(100 * time.Millisecond)to get the CPU time in the last 100ms? Otherwise you're getting the CPU time since the last call, which might have been several minutes ago.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, its a tradeoff, but the impact is minimal.
Percent(100ms) blocks the scaling goroutine for 100ms per scale decision. This is what I am trying to avoid, in the FrankenAsync POC I am dispatching PHP scripts internally to separate threads within a single request. In this scenario I like threads to scale up instantly, not wait 100ms each.
Percent(0) returns the delta since last call. Under scaling pressure, calls come every ~5ms (minStallTime), so the window stays tight. The only potential gap would the first call after idle returns 0, which could wrongly allow or prevents one scale-up. After that it's accurate.
Its a tradeoff between instant scaling, with potentially being off 1 thread, or delayed scaling, which adds 100msec unwanted latency to internal sub-requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I get it now, ideally we should also be able to somehow differentiate between scaling a http worker and something regarding async/parallelism, with parallelism you probably would not want to wait at all.
Can you share a link to your POC again? Might also make sense to somehow combine your approach with #2287, which tries to do something similar with 'background workers'.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can find the info and links to the POC I did here: #2223
I do feel this is unrelated though from that work, it just emerged while working on that. Also why I submitted it as separate PR. Reasoning:
The 120ms in the current ProbeCPUs isn't intentional scaling delay; it is a side effect of how CPU is measured (sleep, then compare). The probe's only job is "is CPU overloaded?": yes or no.
The scaling loop already has a hardcoded
minStallTime(5ms) as the gate for when to consider scaling. If that needs tuning, it could be made configurable, which would allow to properly control the behavior.Percent(0) is non-blocking, cross-platform, and self-regulates under load since the scaling goroutine calls it every ~5ms when threads are stalled.
This PR also matters for the Windows support shipped in v1.12.0, the current code has no CPU probing on Windows at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'm in favor of making this cross-platform 👍 . The stalling is still kind of intentional to keep scaling from consuming too many resources at once.
It's a bit unfortunate that this will take past CPU consumption as metric instead of current one. Maybe we could also repeatedly measure the metric in the downscaling loop. There are also some k6 tests in the repo to simulate hanging while scaling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! I added a
probeCPULoad()call indeactivateThreads()to keep thePercent(0)baseline fresh. Delta window is now at most 5s instead of potentially stale.