Affinity CPU thread affinity parameter to gvadetect/gvaclassify#719
Affinity CPU thread affinity parameter to gvadetect/gvaclassify#719
Conversation
| } | ||
|
|
||
| // if string parsed without errrors, update core pinning | ||
| base_inference->core_pinning_mask = core_mask; |
There was a problem hiding this comment.
This line is outside the try/catch block. When parsing fails, core_mask remains 0 (from line 631 initialization) and gets assigned to core_pinning_mask, which silently disables all cores and overwrites any previously valid configuration.
Consider moving the assignment inside the try block after successful parsing to preserve the existing value on error:
| base_inference->core_pinning_mask = core_mask; | |
| } catch (const std::exception &e) { | |
| GST_ELEMENT_ERROR(base_inference, RESOURCE, SETTINGS, ("Invalid core-pinning format"), | |
| ("Failed to parse core-pinning property: %s", e.what())); | |
| return; // Preserve existing core_pinning_mask on error | |
| } | |
| // If string parsed without errors, update core pinning | |
| base_inference->core_pinning_mask = core_mask; |
| int end = std::stoi(part.substr(dash_pos + 1)); | ||
|
|
||
| // Set bits in the range | ||
| for (int i = start; i <= end; i++) { |
There was a problem hiding this comment.
Missing bounds validation causes undefined behavior when core index >= 64. Shifting a 64-bit value by >= 64 bits is UB per C++ standard. Consider adding validation:
| for (int i = start; i <= end; i++) { | |
| // Set bits in the range | |
| for (int i = start; i <= end; i++) { | |
| if (i >= 0 && i < 64) { | |
| core_mask |= (1ULL << i); | |
| } else { | |
| GST_WARNING_OBJECT(base_inference, "Core index %d out of range [0-63], skipping", i); | |
| } | |
| } |
| } | ||
| } else { | ||
| // Set single bit | ||
| core_mask |= (1ULL << std::stoi(part)); |
There was a problem hiding this comment.
Same issue as the range loop: missing bounds check before bit shift. Consider adding validation:
| core_mask |= (1ULL << std::stoi(part)); | |
| // Set single bit | |
| int core_id = std::stoi(part); | |
| if (core_id >= 0 && core_id < 64) { | |
| core_mask |= (1ULL << core_id); | |
| } else { | |
| GST_WARNING_OBJECT(base_inference, "Core index %d out of range [0-63], skipping", core_id); | |
| } |
| #define DEFAULT_SHARE_VADISPLAY_CTX TRUE | ||
|
|
||
| #define DEFAULT_CORE_PINNING nullptr | ||
|
|
There was a problem hiding this comment.
The magic number 64 appears throughout the code. Consider defining it as a constant for maintainability:
| #define DEFAULT_CORE_PINNING nullptr | |
| #define MAX_CPU_CORES 64 |
Then use MAX_CPU_CORES in all the bounds checks and getter loop instead of hardcoded 64.
| case PROP_SCHEDULING_POLICY: | ||
| g_value_set_string(value, base_inference->scheduling_policy); | ||
| break; | ||
| case PROP_CORE_PINNING: { |
There was a problem hiding this comment.
The getter outputs individual cores ("0,1,2,3") while the setter accepts range notation ("0-3"). This asymmetry means:
- Setting "0-3" then getting returns "0,1,2,3"
- Copy-pasting the getter output works, but loses the compact representation
Consider implementing range compaction in the getter to match the setter's format. For example:
- Consecutive cores 0,1,2,3 → "0-3"
- Non-consecutive 0,2,5,6,7 → "0,2,5-7"
This makes the property symmetric and more user-friendly for large core counts.
| // parse input string and set bits in core_pinning accordingly | ||
| while (std::getline(iss, part, ',')) { | ||
| // Trim whitespace | ||
| part.erase(0, part.find_first_not_of(" \t")); |
There was a problem hiding this comment.
Whitespace trimming only handles spaces and tabs. Consider using a more robust approach that handles all whitespace characters (including newlines, carriage returns, etc.):
| part.erase(0, part.find_first_not_of(" \t")); | |
| // Trim whitespace | |
| auto is_space = [](unsigned char c) { return std::isspace(c); }; | |
| part.erase(0, std::find_if_not(part.begin(), part.end(), is_space) - part.begin()); | |
| part.erase(std::find_if_not(part.rbegin(), part.rend(), is_space).base(), part.end()); |
This handles all standard whitespace characters and is more maintainable.
|
|
||
| try { | ||
| // Split by comma to get individual ranges or numbers | ||
| std::string str(range_str); |
There was a problem hiding this comment.
This creates a full copy of the input string, then std::istringstream on line 636 copies it again. For a read-only parameter, this can be wasteful.
Since this is C++17 code (using std::from_chars would be better anyway), consider using std::string_view for the parameter:
void gva_base_inference_set_core_pinning(GvaBaseInference *base_inference,
const gchar *range_str) {
std::string_view sv(range_str);
// Then parse sv directly without copying
}Or better yet, pass through to a helper function that takes std::string_view to avoid C string → string_view conversion overhead in the common path. 📎
| if (part.find('-') != std::string::npos) { | ||
| // Parse range like "1-5" | ||
| size_t dash_pos = part.find('-'); | ||
| int start = std::stoi(part.substr(0, dash_pos)); |
There was a problem hiding this comment.
std::stoi throws exceptions on invalid input, which is expensive for control flow. For performance-critical parsing, use C++17's std::from_chars which is:
- Exception-free (returns error code)
- Significantly faster (~10x in benchmarks)
- Doesn't allocate or use locale
#include <charconv>
int start, end;
auto [ptr1, ec1] = std::from_chars(part.data(), part.data() + dash_pos, start);
auto [ptr2, ec2] = std::from_chars(part.data() + dash_pos + 1, part.data() + part.size(), end);
if (ec1 == std::errc{} && ec2 == std::errc{}) {
// Valid parse
} else {
// Handle error without exception overhead
}Applies to lines 648, 649, and 657.
| CPU_ZERO(&cpuset); // Initialize to zero | ||
| int num_cores = sysconf(_SC_NPROCESSORS_ONLN); // Get number of available CPU cores | ||
| for (int core_id = 0; core_id < num_cores; ++core_id) { | ||
| if (mask & (1ULL << core_id)) { |
There was a problem hiding this comment.
Here mask can have bits set beyond num_cores
The input mask parameter (64-bit) can have bits set for cores that don't exist on the system. For example:
- System has 16 cores (
num_cores = 16) - User passes mask
0xFFFFFFFFFFFFFFFF(all 64 bits set) - Loop iterates 16 times, but bits 16-63 in mask are never validated
CPU_SETgets called only for existing cores, BUT the mask is accepted silently
This creates inconsistency: the user thinks they pinned to cores 0-63, but actually only pinned to 0-15.
Please consider validating mask against actual core count:
// After line 1016
if (mask >= (1ULL << num_cores)) {
GVA_WARNING("Affinity mask 0x%lx has bits set beyond available cores (%d), truncating\n", mask, num_cores);
mask &= ((1ULL << num_cores) - 1); // Clear invalid bits
}| // Convert range of integer IDs to bitset representing cores for pinning. | ||
| // Example input: "1-5, 8, 10-12" | ||
| void gva_base_inference_set_core_pinning(GvaBaseInference *base_inference, const gchar *range_str) { | ||
| guint64 core_mask = 0; |
There was a problem hiding this comment.
What about a possibility that we have a CPU with more than 64 cores?
|
please check why windows step fails |
… to check for Intel Core Ultra Series 3 CPUs.
There was a problem hiding this comment.
Pull request overview
Adds CPU core pinning support to GStreamer inference elements to stabilize performance, including automatic pinning to P-cores on Intel Core Ultra 3xxH (PTL-H) when no affinity is otherwise configured.
Changes:
- Introduces a new
core-pinningproperty (string) on base inference elements and stores it as acore_pinning_mask. - Detects PTL-H (Ultra 3xxH, 16-core) CPUs and defaults affinity to the first 4 cores when process affinity is unrestricted.
- Applies the resulting affinity mask by pinning the current thread during model creation.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| src/utils/utils.h | Adds PTL-H core-count constant and declares PTL-H CPU detection helper (non-Windows). |
| src/utils/utils.cpp | Implements PTL-H CPU detection via /proc/cpuinfo model name regex (non-Windows). |
| src/monolithic/gst/inference_elements/base/inference_impl.h | Declares InferenceImpl::SetAffinityMask. |
| src/monolithic/gst/inference_elements/base/inference_impl.cpp | Applies thread affinity mask during model creation and implements mask-to-cpuset pinning. |
| src/monolithic/gst/inference_elements/base/gva_base_inference.h | Adds core_pinning_mask field to element state. |
| src/monolithic/gst/inference_elements/base/gva_base_inference.cpp | Adds core-pinning property, default mask derivation from process affinity, and string-to-mask parsing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| DWORD_PTR mask = static_cast<DWORD_PTR>(&mask); | ||
|
|
||
| HANDLE thread = GetCurrentThread(); // Get handle for the current thread | ||
| DWORD_PTR result = SetThreadAffinityMask(thread, mask); | ||
|
|
There was a problem hiding this comment.
Windows implementation of affinity pinning is incorrect: it redeclares mask (shadowing the parameter) and casts &mask (address) to DWORD_PTR, which will pass a pointer value instead of the intended bitmask to SetThreadAffinityMask and likely fail or pin unpredictably. Use the incoming uint64_t mask value (converted to DWORD_PTR) and avoid shadowing; also ensure required Windows headers/types are included under _WIN32.
| int num_cores = sysconf(_SC_NPROCESSORS_ONLN); // Get number of available CPU cores | ||
| for (int core_id = 0; core_id < num_cores; ++core_id) { | ||
| if (mask & (1ULL << core_id)) { | ||
| CPU_SET(core_id, &cpuset); // Add the specific core | ||
| } | ||
| } |
There was a problem hiding this comment.
Linux affinity code builds a cpu_set_t by iterating core_id < num_cores but shifts 1ULL << core_id. If num_cores >= 64, shifting by >=64 is undefined behavior (and the uint64_t mask cannot represent cores beyond 63 anyway). Consider capping the loop at 64 (or std::min(num_cores, 64)) and/or rejecting masks that reference unsupported core IDs.
| void InferenceImpl::SetAffinityMask(uint64_t mask) { | ||
| GVA_INFO("Setting CPU affinity mask to 0x%lx\n", mask); | ||
| #ifndef _WIN32 | ||
| cpu_set_t cpuset; | ||
| CPU_ZERO(&cpuset); // Initialize to zero | ||
| int num_cores = sysconf(_SC_NPROCESSORS_ONLN); // Get number of available CPU cores | ||
| for (int core_id = 0; core_id < num_cores; ++core_id) { |
There was a problem hiding this comment.
This file now uses cpu_set_t/CPU_SET + pthread_setaffinity_np + sysconf, but it doesn’t include the required platform headers (typically <sched.h>, <pthread.h>, <unistd.h> on Linux, and <windows.h> on Windows). Relying on transitive includes is fragile and may break builds; add the proper includes under the existing _WIN32 guards.
| //if(Utils::isCPUPTLHSeries()) { | ||
| // base_inference->core_pinning_mask = 0xF; // Pin to first 4 cores , the P-Cores | ||
| //} | ||
| } |
There was a problem hiding this comment.
The _WIN32 branch of set_core_pinning_mask has an unterminated if (GetProcessAffinityMask(...)) { block: there’s no closing } before #endif, which will break Windows builds. Ensure all opened braces in the Windows branch are properly closed before the #endif and the function’s closing brace.
| } | |
| } | |
| } |
| if (part.find('-') != std::string::npos) { | ||
| // Parse range like "1-5" | ||
| size_t dash_pos = part.find('-'); | ||
| int start = std::stoi(part.substr(0, dash_pos)); | ||
| int end = std::stoi(part.substr(dash_pos + 1)); | ||
|
|
||
| // Set bits in the range | ||
| for (int i = start; i <= end; i++) { | ||
| core_mask |= (1ULL << i); | ||
| } | ||
| } else { | ||
| // Set single bit | ||
| core_mask |= (1ULL << std::stoi(part)); | ||
| } |
There was a problem hiding this comment.
gva_base_inference_set_core_pinning builds a uint64_t mask using 1ULL << i where i comes from user input/ranges. Negative core IDs (e.g. "-1") or IDs >= 64 will cause undefined behavior when shifting. Validate parsed core IDs (and ranges) are within [0, 63] (and ideally within available CPU cores) before shifting/setting bits.
| g_object_class_install_property( | ||
| gobject_class, PROP_CORE_PINNING, | ||
| g_param_spec_string("core-pinning", "Core Pinning", | ||
| "List or range of CPU cores to pin this inference element to (e.g., '0-3' or '0,2,3')", | ||
| nullptr, param_flags)); |
There was a problem hiding this comment.
New core-pinning property and parsing logic are introduced here, but there are existing GStreamer pipeline unit tests under tests/unit_tests/tests_gstgva/ that exercise gvadetect/gvaclassify properties. Please add coverage for core-pinning (valid list/range, invalid format, and default behavior) to prevent regressions (especially around parse errors and mask application).
| auto cpu_set_to_bitmask = [&num_cores](const cpu_set_t *cpu_set) -> guint64 { | ||
| guint64 bitmask = 0; | ||
| for (int i = 0; i < num_cores; ++i) { | ||
| if (CPU_ISSET(i, cpu_set)) { | ||
| bitmask |= (1ULL << i); | ||
| } | ||
| } |
There was a problem hiding this comment.
cpu_set_to_bitmask iterates i < num_cores and does bitmask |= (1ULL << i). If the system has 64+ online cores, shifting by >=64 is undefined behavior and the 64-bit mask can’t represent those cores anyway. Cap the loop at 64 (or at least guard if (i < 64)), and document/handle truncation explicitly.
| void InferenceImpl::SetAffinityMask(uint64_t mask) { | ||
| GVA_INFO("Setting CPU affinity mask to 0x%lx\n", mask); | ||
| #ifndef _WIN32 | ||
| cpu_set_t cpuset; | ||
| CPU_ZERO(&cpuset); // Initialize to zero | ||
| int num_cores = sysconf(_SC_NPROCESSORS_ONLN); // Get number of available CPU cores | ||
| for (int core_id = 0; core_id < num_cores; ++core_id) { | ||
| if (mask & (1ULL << core_id)) { | ||
| CPU_SET(core_id, &cpuset); // Add the specific core | ||
| } | ||
| } | ||
|
|
||
| pthread_t current_thread = pthread_self(); // Get current thread handle | ||
| int result = pthread_setaffinity_np(current_thread, sizeof(cpu_set_t), &cpuset); | ||
|
|
||
| if (result != 0) { | ||
| GVA_ERROR("Error pinning thread: (%d)\n", result); | ||
| } |
There was a problem hiding this comment.
SetAffinityMask will call pthread_setaffinity_np even when mask == 0, producing an empty cpu_set_t and typically failing with EINVAL. Consider treating mask == 0 as “no pinning requested” (skip setting affinity) or returning/logging a clear error without attempting to set an empty CPU set.
| #define DEFAULT_CORE_PINNING nullptr | ||
|
|
There was a problem hiding this comment.
DEFAULT_CORE_PINNING is defined but never used. Please remove it or wire it into the core-pinning property default to avoid dead code and confusion about the intended default behavior.
| #define DEFAULT_CORE_PINNING nullptr |
| case PROP_SHARE_VADISPLAY_CTX: | ||
| base_inference->share_va_display_ctx = g_value_get_boolean(value); | ||
| break; | ||
| case PROP_CORE_PINNING: |
There was a problem hiding this comment.
Unlike several other property setters in this file (e.g., model, model-proc, labels) that check check_gva_base_inference_stopped() and warn/error when changed at runtime, core-pinning is applied unconditionally. Since affinity is only set during model creation, changing core-pinning while the element is running likely won’t take effect. Consider enforcing the same “only mutable while stopped” rule (or re-applying affinity when changed) to avoid a misleading runtime-configurable property.
| case PROP_CORE_PINNING: | |
| case PROP_CORE_PINNING: | |
| if (GST_STATE(base_inference) > GST_STATE_READY) { | |
| GST_WARNING_OBJECT(base_inference, | |
| "core-pinning property can only be changed while the element is stopped"); | |
| break; | |
| } |
Description
Add new parameter in gvadetect/gvaclassify elements - set CPU affinity mask for stable performance results.
In addition if no mask is set by a user (either through new parameter or process-wide taskset) and Intel Core Ultra 3xxH is detected, then set affinity mask to p-cores for inference elements.
Fixes # (issue)
No specific bug filed, but performance results are fluctuating if core pinning is not used with 3xxH series.
Any Newly Introduced Dependencies
No new dependencies.
How Has This Been Tested?
Tested locally, to be verified in CI.
Checklist: