-
Notifications
You must be signed in to change notification settings - Fork 154
Replies: 1 comment · 3 replies
-
@ubergarm , Good info! Thanks for pointing that out. Am i correctly understanding that by setting the P-state offsets to some positive delta and the locking the clocks up (plus the power limit) you was able to force your NVIDIA GPUs to overclock better than the default boost algo? etc. I have compiled the lact but lol its seem to use the DRM to check for the GPUs available (to overclock) which the p2p-enabled driver doesn't expose. :) lactd:
at the same time:
The software (LACT) seems pretty bulky (its 400+ rust packages) etc. May be I should rather write a C code that uses NVML directly? |
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1
-
Heya @magikRUKKOLA Yes, I have my GPU tuned up for both windows and Linux and believe it offers better performance and/or energy efficiency depending exactly how you tune. Seems like keeping my 3090TI FE in P2 (perforrmance state 2) [0 is most "performance" out of 16 possible states, of which only like 8 or so are used on this model with 8 being like idle maybe[. For my specific silicon locking GPU clock at 1950MHz at 890mv in pstate 2 seems about the sweet spot given my cooling/fan noise/max power envelope. No need to set Oh interesting your P2P drivers might not be quite compatible with I did some more benchmarks with ik_llama.cpp and for full GPU offload dense models it is quite nice, though I've heard for big MoEs it doesn't help much given CPU/RAM bottleneck. It also helps for ComfyUI stuff in my experience. More benchmarks and links in this comment thread here: https://www.reddit.com/r/LocalLLaMA/comments/1nkycpq/comment/nf3c08a/ Sorry my stuff is all spread out, hard to do research in 2025 xD Curious your experience with this technique once you get it going! (i also had to increase the GPU fans profile to keep it under 80 degC as it will temperature throttle at 83C... i have some more notes in the github thread for LACT too about some other infrequent almost random seeming throttling i see using this method which i don't understand yet). |
Beta Was this translation helpful? Give feedback.
All reactions
-
Beta Was this translation helpful? Give feedback.
All reactions
-
Basically I made a tool that is checking [if] the GPU utilization is above a certain threshold it just overclocks it (P-state offset, clocks locking and [a] power limit). If for a number of consequent checks the daemon detects if the system is idle, then it just picks up the lowest frequences supported and sets it. I also added the temperature check -- if the temperature is above a certain threshold (80C default) it turns on the fans of all GPUs to the max. (mine are stacked together one on top of another so that is beneficial). Works great! Thanks for the info! code snippet: void apply_settings(nvmlDevice_t device, int index) {
nvmlReturn_t result;
if (config.reset) {
if (config.graphics) {
char buf[MESSAGE_LEN];
snprintf(buf, sizeof(buf), "Resetting settings for GPU %d", index);
enqueue_message(buf);
} else {
printf("Resetting settings for GPU %d\n", index);
}
reset_settings(device);
return;
}
// Set power limit
if (config.power_limit > 0) {
unsigned int current_power;
result = nvmlDeviceGetPowerManagementLimit(device, ¤t_power);
if (result == NVML_SUCCESS) {
unsigned int new_power = config.power_limit * 1000; // Convert to mW
if (current_power != new_power) {
result = nvmlDeviceSetPowerManagementLimit(device, new_power);
if (result == NVML_SUCCESS) {
if (config.graphics) {
char buf[MESSAGE_LEN];
snprintf(buf, sizeof(buf), "GPU %d: Power set to %dW", index, config.power_limit);
enqueue_message(buf);
} else {
printf("GPU %d: Power limit set to %dW\n", index, config.power_limit);
}
} else if (result != NVML_ERROR_NOT_SUPPORTED) {
fprintf(stderr, "GPU %d: Failed to set power limit: %s\n",
index, nvmlErrorString(result));
}
}
}
}
// Set clock offsets
if (config.gpu_offset != 0) {
result = nvmlDeviceSetGpcClkVfOffset(device, config.gpu_offset);
if (result == NVML_SUCCESS) {
if (config.graphics) {
char buf[MESSAGE_LEN];
snprintf(buf, sizeof(buf), "GPU %d: GPU offset %dMHz", index, config.gpu_offset);
enqueue_message(buf);
} else {
printf("GPU %d: GPU offset set to %dMHz\n", index, config.gpu_offset);
}
} else if (result != NVML_ERROR_NOT_SUPPORTED) {
fprintf(stderr, "GPU %d: Failed to set GPU offset: %s\n",
index, nvmlErrorString(result));
}
}
if (config.vram_offset != 0) {
result = nvmlDeviceSetMemClkVfOffset(device, config.vram_offset);
if (result == NVML_SUCCESS) {
if (config.graphics) {
char buf[MESSAGE_LEN];
snprintf(buf, sizeof(buf), "GPU %d: VRAM offset %dMHz", index, config.vram_offset);
enqueue_message(buf);
} else {
printf("GPU %d: VRAM offset set to %dMHz\n", index, config.vram_offset);
}
} else if (result != NVML_ERROR_NOT_SUPPORTED) {
fprintf(stderr, "GPU %d: Failed to set VRAM offset: %s\n",
index, nvmlErrorString(result));
}
}
// Set locked clocks
if (config.gpu_lock) {
unsigned int min_clock, max_clock;
result = nvmlDeviceGetMinMaxClockOfPState(device, NVML_CLOCK_GRAPHICS,
NVML_PSTATE_0, &min_clock, &max_clock);
if (result == NVML_SUCCESS) {
unsigned int target = max_clock + config.gpu_offset;
result = nvmlDeviceSetGpuLockedClocks(device, target, target);
if (result == NVML_SUCCESS) {
printf("GPU %d: GPU locked at %dMHz\n", index, target);
overclock_applied = 1;
} else if (result != NVML_ERROR_NOT_SUPPORTED) {
fprintf(stderr, "GPU %d: Failed to lock GPU clock: %s\n",
index, nvmlErrorString(result));
}
} else if (result != NVML_ERROR_NOT_SUPPORTED) {
fprintf(stderr, "GPU %d: Failed to get base GPU clock: %s\n",
index, nvmlErrorString(result));
}
}
if (config.vram_lock) {
unsigned int min_clock, max_clock;
result = nvmlDeviceGetMinMaxClockOfPState(device, NVML_CLOCK_MEM,
NVML_PSTATE_0, &min_clock, &max_clock);
if (result == NVML_SUCCESS) {
unsigned int target = max_clock + config.vram_offset;
result = nvmlDeviceSetMemoryLockedClocks(device, target, target);
if (result == NVML_SUCCESS) {
printf("GPU %d: VRAM locked at %dMHz\n", index, target);
overclock_applied = 1;
} else if (result != NVML_ERROR_NOT_SUPPORTED) {
fprintf(stderr, "GPU %d: Failed to lock VRAM clock: %s\n",
index, nvmlErrorString(result));
}
} else if (result != NVML_ERROR_NOT_SUPPORTED) {
fprintf(stderr, "GPU %d: Failed to get base VRAM clock: %s\n",
index, nvmlErrorString(result));
}
}
config.apply_count++;
} The tool itself is quite long (1400 lines). If anyone wants the full code LMK. |
Beta Was this translation helpful? Give feedback.
All reactions
-
🎉 1
Uh oh!
There was an error while loading. Please reload this page.
-
Dropping a PSA here for folks who haven't tried this out yet, but with both NVIDIA and AMD GPUs you have access to some clocks/power state configurations in Linux with LACT.
There is a recent method specific to nvidia GPUs and I just did a benchmark comparing LACT overclock profile vs naieve
nvidia-smi -pl 400
power cap showing the LACT method to give better performance with lower energy usage.Writeup and details here:
https://forum.level1techs.com/t/some-gpu-5090-4090-3090-a600-idle-power-consumption-headless-on-linux-fedora-42-and-some-undervolt-overclock-info/237064/6
Cheers!
Beta Was this translation helpful? Give feedback.
All reactions