Skip to content

Affinity CPU thread affinity parameter to gvadetect/gvaclassify#719

Draft
tjanczak wants to merge 6 commits intomainfrom
affinity_mask_param
Draft

Affinity CPU thread affinity parameter to gvadetect/gvaclassify#719
tjanczak wants to merge 6 commits intomainfrom
affinity_mask_param

Conversation

@tjanczak
Copy link
Copy Markdown
Contributor

Description

Add new parameter in gvadetect/gvaclassify elements - set CPU affinity mask for stable performance results.
In addition if no mask is set by a user (either through new parameter or process-wide taskset) and Intel Core Ultra 3xxH is detected, then set affinity mask to p-cores for inference elements.

Fixes # (issue)

No specific bug filed, but performance results are fluctuating if core pinning is not used with 3xxH series.

Any Newly Introduced Dependencies

No new dependencies.

How Has This Been Tested?

Tested locally, to be verified in CI.

Checklist:

  • I agree to use the MIT license for my code changes.
  • I have not introduced any 3rd party components incompatible with MIT.
  • I have not included any company confidential information, trade secret, password or security token.
  • I have performed a self-review of my code.

}

// if string parsed without errrors, update core pinning
base_inference->core_pinning_mask = core_mask;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is outside the try/catch block. When parsing fails, core_mask remains 0 (from line 631 initialization) and gets assigned to core_pinning_mask, which silently disables all cores and overwrites any previously valid configuration.

Consider moving the assignment inside the try block after successful parsing to preserve the existing value on error:

Suggested change
base_inference->core_pinning_mask = core_mask;
} catch (const std::exception &e) {
GST_ELEMENT_ERROR(base_inference, RESOURCE, SETTINGS, ("Invalid core-pinning format"),
("Failed to parse core-pinning property: %s", e.what()));
return; // Preserve existing core_pinning_mask on error
}
// If string parsed without errors, update core pinning
base_inference->core_pinning_mask = core_mask;

int end = std::stoi(part.substr(dash_pos + 1));

// Set bits in the range
for (int i = start; i <= end; i++) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing bounds validation causes undefined behavior when core index >= 64. Shifting a 64-bit value by >= 64 bits is UB per C++ standard. Consider adding validation:

Suggested change
for (int i = start; i <= end; i++) {
// Set bits in the range
for (int i = start; i <= end; i++) {
if (i >= 0 && i < 64) {
core_mask |= (1ULL << i);
} else {
GST_WARNING_OBJECT(base_inference, "Core index %d out of range [0-63], skipping", i);
}
}

}
} else {
// Set single bit
core_mask |= (1ULL << std::stoi(part));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as the range loop: missing bounds check before bit shift. Consider adding validation:

Suggested change
core_mask |= (1ULL << std::stoi(part));
// Set single bit
int core_id = std::stoi(part);
if (core_id >= 0 && core_id < 64) {
core_mask |= (1ULL << core_id);
} else {
GST_WARNING_OBJECT(base_inference, "Core index %d out of range [0-63], skipping", core_id);
}

#define DEFAULT_SHARE_VADISPLAY_CTX TRUE

#define DEFAULT_CORE_PINNING nullptr

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic number 64 appears throughout the code. Consider defining it as a constant for maintainability:

Suggested change
#define DEFAULT_CORE_PINNING nullptr
#define MAX_CPU_CORES 64

Then use MAX_CPU_CORES in all the bounds checks and getter loop instead of hardcoded 64.

case PROP_SCHEDULING_POLICY:
g_value_set_string(value, base_inference->scheduling_policy);
break;
case PROP_CORE_PINNING: {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The getter outputs individual cores ("0,1,2,3") while the setter accepts range notation ("0-3"). This asymmetry means:

  • Setting "0-3" then getting returns "0,1,2,3"
  • Copy-pasting the getter output works, but loses the compact representation

Consider implementing range compaction in the getter to match the setter's format. For example:

  • Consecutive cores 0,1,2,3 → "0-3"
  • Non-consecutive 0,2,5,6,7 → "0,2,5-7"

This makes the property symmetric and more user-friendly for large core counts.

// parse input string and set bits in core_pinning accordingly
while (std::getline(iss, part, ',')) {
// Trim whitespace
part.erase(0, part.find_first_not_of(" \t"));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespace trimming only handles spaces and tabs. Consider using a more robust approach that handles all whitespace characters (including newlines, carriage returns, etc.):

Suggested change
part.erase(0, part.find_first_not_of(" \t"));
// Trim whitespace
auto is_space = [](unsigned char c) { return std::isspace(c); };
part.erase(0, std::find_if_not(part.begin(), part.end(), is_space) - part.begin());
part.erase(std::find_if_not(part.rbegin(), part.rend(), is_space).base(), part.end());

This handles all standard whitespace characters and is more maintainable.


try {
// Split by comma to get individual ranges or numbers
std::string str(range_str);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a full copy of the input string, then std::istringstream on line 636 copies it again. For a read-only parameter, this can be wasteful.

Since this is C++17 code (using std::from_chars would be better anyway), consider using std::string_view for the parameter:

void gva_base_inference_set_core_pinning(GvaBaseInference *base_inference, 
                                          const gchar *range_str) {
    std::string_view sv(range_str);
    // Then parse sv directly without copying
}

Or better yet, pass through to a helper function that takes std::string_view to avoid C string → string_view conversion overhead in the common path. 📎

if (part.find('-') != std::string::npos) {
// Parse range like "1-5"
size_t dash_pos = part.find('-');
int start = std::stoi(part.substr(0, dash_pos));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::stoi throws exceptions on invalid input, which is expensive for control flow. For performance-critical parsing, use C++17's std::from_chars which is:

  • Exception-free (returns error code)
  • Significantly faster (~10x in benchmarks)
  • Doesn't allocate or use locale
#include <charconv>

int start, end;
auto [ptr1, ec1] = std::from_chars(part.data(), part.data() + dash_pos, start);
auto [ptr2, ec2] = std::from_chars(part.data() + dash_pos + 1, part.data() + part.size(), end);
if (ec1 == std::errc{} && ec2 == std::errc{}) {
    // Valid parse
} else {
    // Handle error without exception overhead
}

Applies to lines 648, 649, and 657.

CPU_ZERO(&cpuset); // Initialize to zero
int num_cores = sysconf(_SC_NPROCESSORS_ONLN); // Get number of available CPU cores
for (int core_id = 0; core_id < num_cores; ++core_id) {
if (mask & (1ULL << core_id)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here mask can have bits set beyond num_cores

The input mask parameter (64-bit) can have bits set for cores that don't exist on the system. For example:

  • System has 16 cores (num_cores = 16)
  • User passes mask 0xFFFFFFFFFFFFFFFF (all 64 bits set)
  • Loop iterates 16 times, but bits 16-63 in mask are never validated
  • CPU_SET gets called only for existing cores, BUT the mask is accepted silently

This creates inconsistency: the user thinks they pinned to cores 0-63, but actually only pinned to 0-15.

Please consider validating mask against actual core count:

// After line 1016
if (mask >= (1ULL << num_cores)) {
    GVA_WARNING("Affinity mask 0x%lx has bits set beyond available cores (%d), truncating\n", mask, num_cores);
    mask &= ((1ULL << num_cores) - 1);  // Clear invalid bits
}

// Convert range of integer IDs to bitset representing cores for pinning.
// Example input: "1-5, 8, 10-12"
void gva_base_inference_set_core_pinning(GvaBaseInference *base_inference, const gchar *range_str) {
guint64 core_mask = 0;
Copy link
Copy Markdown
Contributor

@mholowni mholowni Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a possibility that we have a CPU with more than 64 cores?

@nszczygl9
Copy link
Copy Markdown
Contributor

please check why windows step fails

… to check for Intel Core Ultra Series 3 CPUs.
@tjanczak tjanczak marked this pull request as draft March 30, 2026 08:56
Copilot AI review requested due to automatic review settings April 9, 2026 15:31
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds CPU core pinning support to GStreamer inference elements to stabilize performance, including automatic pinning to P-cores on Intel Core Ultra 3xxH (PTL-H) when no affinity is otherwise configured.

Changes:

  • Introduces a new core-pinning property (string) on base inference elements and stores it as a core_pinning_mask.
  • Detects PTL-H (Ultra 3xxH, 16-core) CPUs and defaults affinity to the first 4 cores when process affinity is unrestricted.
  • Applies the resulting affinity mask by pinning the current thread during model creation.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 13 comments.

Show a summary per file
File Description
src/utils/utils.h Adds PTL-H core-count constant and declares PTL-H CPU detection helper (non-Windows).
src/utils/utils.cpp Implements PTL-H CPU detection via /proc/cpuinfo model name regex (non-Windows).
src/monolithic/gst/inference_elements/base/inference_impl.h Declares InferenceImpl::SetAffinityMask.
src/monolithic/gst/inference_elements/base/inference_impl.cpp Applies thread affinity mask during model creation and implements mask-to-cpuset pinning.
src/monolithic/gst/inference_elements/base/gva_base_inference.h Adds core_pinning_mask field to element state.
src/monolithic/gst/inference_elements/base/gva_base_inference.cpp Adds core-pinning property, default mask derivation from process affinity, and string-to-mask parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1006 to +1010
DWORD_PTR mask = static_cast<DWORD_PTR>(&mask);

HANDLE thread = GetCurrentThread(); // Get handle for the current thread
DWORD_PTR result = SetThreadAffinityMask(thread, mask);

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Windows implementation of affinity pinning is incorrect: it redeclares mask (shadowing the parameter) and casts &mask (address) to DWORD_PTR, which will pass a pointer value instead of the intended bitmask to SetThreadAffinityMask and likely fail or pin unpredictably. Use the incoming uint64_t mask value (converted to DWORD_PTR) and avoid shadowing; also ensure required Windows headers/types are included under _WIN32.

Copilot uses AI. Check for mistakes.
Comment on lines +992 to +997
int num_cores = sysconf(_SC_NPROCESSORS_ONLN); // Get number of available CPU cores
for (int core_id = 0; core_id < num_cores; ++core_id) {
if (mask & (1ULL << core_id)) {
CPU_SET(core_id, &cpuset); // Add the specific core
}
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linux affinity code builds a cpu_set_t by iterating core_id < num_cores but shifts 1ULL << core_id. If num_cores >= 64, shifting by >=64 is undefined behavior (and the uint64_t mask cannot represent cores beyond 63 anyway). Consider capping the loop at 64 (or std::min(num_cores, 64)) and/or rejecting masks that reference unsupported core IDs.

Copilot uses AI. Check for mistakes.
Comment on lines +987 to +993
void InferenceImpl::SetAffinityMask(uint64_t mask) {
GVA_INFO("Setting CPU affinity mask to 0x%lx\n", mask);
#ifndef _WIN32
cpu_set_t cpuset;
CPU_ZERO(&cpuset); // Initialize to zero
int num_cores = sysconf(_SC_NPROCESSORS_ONLN); // Get number of available CPU cores
for (int core_id = 0; core_id < num_cores; ++core_id) {
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file now uses cpu_set_t/CPU_SET + pthread_setaffinity_np + sysconf, but it doesn’t include the required platform headers (typically <sched.h>, <pthread.h>, <unistd.h> on Linux, and <windows.h> on Windows). Relying on transitive includes is fragile and may break builds; add the proper includes under the existing _WIN32 guards.

Copilot uses AI. Check for mistakes.
//if(Utils::isCPUPTLHSeries()) {
// base_inference->core_pinning_mask = 0xF; // Pin to first 4 cores , the P-Cores
//}
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _WIN32 branch of set_core_pinning_mask has an unterminated if (GetProcessAffinityMask(...)) { block: there’s no closing } before #endif, which will break Windows builds. Ensure all opened braces in the Windows branch are properly closed before the #endif and the function’s closing brace.

Suggested change
}
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +700 to +713
if (part.find('-') != std::string::npos) {
// Parse range like "1-5"
size_t dash_pos = part.find('-');
int start = std::stoi(part.substr(0, dash_pos));
int end = std::stoi(part.substr(dash_pos + 1));

// Set bits in the range
for (int i = start; i <= end; i++) {
core_mask |= (1ULL << i);
}
} else {
// Set single bit
core_mask |= (1ULL << std::stoi(part));
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gva_base_inference_set_core_pinning builds a uint64_t mask using 1ULL << i where i comes from user input/ranges. Negative core IDs (e.g. "-1") or IDs >= 64 will cause undefined behavior when shifting. Validate parsed core IDs (and ranges) are within [0, 63] (and ideally within available CPU cores) before shifting/setting bits.

Copilot uses AI. Check for mistakes.
Comment on lines +420 to +424
g_object_class_install_property(
gobject_class, PROP_CORE_PINNING,
g_param_spec_string("core-pinning", "Core Pinning",
"List or range of CPU cores to pin this inference element to (e.g., '0-3' or '0,2,3')",
nullptr, param_flags));
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New core-pinning property and parsing logic are introduced here, but there are existing GStreamer pipeline unit tests under tests/unit_tests/tests_gstgva/ that exercise gvadetect/gvaclassify properties. Please add coverage for core-pinning (valid list/range, invalid format, and default behavior) to prevent regressions (especially around parse errors and mask application).

Copilot uses AI. Check for mistakes.
Comment on lines +513 to +519
auto cpu_set_to_bitmask = [&num_cores](const cpu_set_t *cpu_set) -> guint64 {
guint64 bitmask = 0;
for (int i = 0; i < num_cores; ++i) {
if (CPU_ISSET(i, cpu_set)) {
bitmask |= (1ULL << i);
}
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cpu_set_to_bitmask iterates i < num_cores and does bitmask |= (1ULL << i). If the system has 64+ online cores, shifting by >=64 is undefined behavior and the 64-bit mask can’t represent those cores anyway. Cap the loop at 64 (or at least guard if (i < 64)), and document/handle truncation explicitly.

Copilot uses AI. Check for mistakes.
Comment on lines +987 to +1004
void InferenceImpl::SetAffinityMask(uint64_t mask) {
GVA_INFO("Setting CPU affinity mask to 0x%lx\n", mask);
#ifndef _WIN32
cpu_set_t cpuset;
CPU_ZERO(&cpuset); // Initialize to zero
int num_cores = sysconf(_SC_NPROCESSORS_ONLN); // Get number of available CPU cores
for (int core_id = 0; core_id < num_cores; ++core_id) {
if (mask & (1ULL << core_id)) {
CPU_SET(core_id, &cpuset); // Add the specific core
}
}

pthread_t current_thread = pthread_self(); // Get current thread handle
int result = pthread_setaffinity_np(current_thread, sizeof(cpu_set_t), &cpuset);

if (result != 0) {
GVA_ERROR("Error pinning thread: (%d)\n", result);
}
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SetAffinityMask will call pthread_setaffinity_np even when mask == 0, producing an empty cpu_set_t and typically failing with EINVAL. Consider treating mask == 0 as “no pinning requested” (skip setting affinity) or returning/logging a clear error without attempting to set an empty CPU set.

Copilot uses AI. Check for mistakes.
Comment on lines +83 to +84
#define DEFAULT_CORE_PINNING nullptr

Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DEFAULT_CORE_PINNING is defined but never used. Please remove it or wire it into the core-pinning property default to avoid dead code and confusion about the intended default behavior.

Suggested change
#define DEFAULT_CORE_PINNING nullptr

Copilot uses AI. Check for mistakes.
case PROP_SHARE_VADISPLAY_CTX:
base_inference->share_va_display_ctx = g_value_get_boolean(value);
break;
case PROP_CORE_PINNING:
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike several other property setters in this file (e.g., model, model-proc, labels) that check check_gva_base_inference_stopped() and warn/error when changed at runtime, core-pinning is applied unconditionally. Since affinity is only set during model creation, changing core-pinning while the element is running likely won’t take effect. Consider enforcing the same “only mutable while stopped” rule (or re-applying affinity when changed) to avoid a misleading runtime-configurable property.

Suggested change
case PROP_CORE_PINNING:
case PROP_CORE_PINNING:
if (GST_STATE(base_inference) > GST_STATE_READY) {
GST_WARNING_OBJECT(base_inference,
"core-pinning property can only be changed while the element is stopped");
break;
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants