Affinity CPU thread affinity parameter to gvadetect/gvaclassify by tjanczak · Pull Request #719 · open-edge-platform/dlstreamer

tjanczak · 2026-03-24T08:44:39Z

Description

Add new parameter in gvadetect/gvaclassify elements - set CPU affinity mask for stable performance results.
In addition if no mask is set by a user (either through new parameter or process-wide taskset) and Intel Core Ultra 3xxH is detected, then set affinity mask to p-cores for inference elements.

Fixes # (issue)

No specific bug filed, but performance results are fluctuating if core pinning is not used with 3xxH series.

Any Newly Introduced Dependencies

No new dependencies.

How Has This Been Tested?

Tested locally, to be verified in CI.

Checklist:

I agree to use the MIT license for my code changes.
I have not introduced any 3rd party components incompatible with MIT.
I have not included any company confidential information, trade secret, password or security token.
I have performed a self-review of my code.

… to check for Intel Core Ultra Series 3 CPUs.

oonyshch · 2026-03-24T10:41:11Z

src/monolithic/gst/inference_elements/base/gva_base_inference.cpp

+    }
+
+    // if string parsed without errrors, update core pinning
+    base_inference->core_pinning_mask = core_mask;


This line is outside the try/catch block. When parsing fails, core_mask remains 0 (from line 631 initialization) and gets assigned to core_pinning_mask, which silently disables all cores and overwrites any previously valid configuration.

Consider moving the assignment inside the try block after successful parsing to preserve the existing value on error:

Suggested change

base_inference->core_pinning_mask = core_mask;

} catch (const std::exception &e) {

GST_ELEMENT_ERROR(base_inference, RESOURCE, SETTINGS, ("Invalid core-pinning format"),

("Failed to parse core-pinning property: %s", e.what()));

return; // Preserve existing core_pinning_mask on error

}

// If string parsed without errors, update core pinning

base_inference->core_pinning_mask = core_mask;

oonyshch · 2026-03-24T11:17:16Z

src/monolithic/gst/inference_elements/base/gva_base_inference.cpp

+                int end = std::stoi(part.substr(dash_pos + 1));
+
+                // Set bits in the range
+                for (int i = start; i <= end; i++) {


Missing bounds validation causes undefined behavior when core index >= 64. Shifting a 64-bit value by >= 64 bits is UB per C++ standard. Consider adding validation:

Suggested change

for (int i = start; i <= end; i++) {

// Set bits in the range

for (int i = start; i <= end; i++) {

if (i >= 0 && i < 64) {

core_mask |= (1ULL << i);

} else {

GST_WARNING_OBJECT(base_inference, "Core index %d out of range [0-63], skipping", i);

}

}

oonyshch · 2026-03-24T11:18:17Z

src/monolithic/gst/inference_elements/base/gva_base_inference.cpp

+                }
+            } else {
+                // Set single bit
+                core_mask |= (1ULL << std::stoi(part));


Same issue as the range loop: missing bounds check before bit shift. Consider adding validation:

Suggested change

core_mask |= (1ULL << std::stoi(part));

// Set single bit

int core_id = std::stoi(part);

if (core_id >= 0 && core_id < 64) {

core_mask |= (1ULL << core_id);

} else {

GST_WARNING_OBJECT(base_inference, "Core index %d out of range [0-63], skipping", core_id);

}

oonyshch · 2026-03-24T11:19:15Z

src/monolithic/gst/inference_elements/base/gva_base_inference.cpp

 #define DEFAULT_SHARE_VADISPLAY_CTX TRUE

+#define DEFAULT_CORE_PINNING nullptr
+


The magic number 64 appears throughout the code. Consider defining it as a constant for maintainability:

Suggested change

#define DEFAULT_CORE_PINNING nullptr

#define MAX_CPU_CORES 64

Then use MAX_CPU_CORES in all the bounds checks and getter loop instead of hardcoded 64.

oonyshch · 2026-03-24T11:20:26Z

src/monolithic/gst/inference_elements/base/gva_base_inference.cpp

    case PROP_SCHEDULING_POLICY:
        g_value_set_string(value, base_inference->scheduling_policy);
        break;
+    case PROP_CORE_PINNING: {


The getter outputs individual cores ("0,1,2,3") while the setter accepts range notation ("0-3"). This asymmetry means:

Setting "0-3" then getting returns "0,1,2,3"

Copy-pasting the getter output works, but loses the compact representation

Consider implementing range compaction in the getter to match the setter's format. For example:

Consecutive cores 0,1,2,3 → "0-3"

Non-consecutive 0,2,5,6,7 → "0,2,5-7"

This makes the property symmetric and more user-friendly for large core counts.

oonyshch · 2026-03-24T11:21:43Z

src/monolithic/gst/inference_elements/base/gva_base_inference.cpp

+        // parse input string and set bits in core_pinning accordingly
+        while (std::getline(iss, part, ',')) {
+            // Trim whitespace
+            part.erase(0, part.find_first_not_of(" \t"));


Whitespace trimming only handles spaces and tabs. Consider using a more robust approach that handles all whitespace characters (including newlines, carriage returns, etc.):

Suggested change

part.erase(0, part.find_first_not_of(" \t"));

// Trim whitespace

auto is_space = [](unsigned char c) { return std::isspace(c); };

part.erase(0, std::find_if_not(part.begin(), part.end(), is_space) - part.begin());

part.erase(std::find_if_not(part.rbegin(), part.rend(), is_space).base(), part.end());

This handles all standard whitespace characters and is more maintainable.

oonyshch · 2026-03-24T11:31:22Z

src/monolithic/gst/inference_elements/base/gva_base_inference.cpp

+
+    try {
+        // Split by comma to get individual ranges or numbers
+        std::string str(range_str);


This creates a full copy of the input string, then std::istringstream on line 636 copies it again. For a read-only parameter, this can be wasteful.

Since this is C++17 code (using std::from_chars would be better anyway), consider using std::string_view for the parameter:

void gva_base_inference_set_core_pinning(GvaBaseInference *base_inference, const gchar *range_str) { std::string_view sv(range_str); // Then parse sv directly without copying }

Or better yet, pass through to a helper function that takes std::string_view to avoid C string → string_view conversion overhead in the common path. 📎

oonyshch · 2026-03-24T11:32:28Z

src/monolithic/gst/inference_elements/base/gva_base_inference.cpp

+            if (part.find('-') != std::string::npos) {
+                // Parse range like "1-5"
+                size_t dash_pos = part.find('-');
+                int start = std::stoi(part.substr(0, dash_pos));


std::stoi throws exceptions on invalid input, which is expensive for control flow. For performance-critical parsing, use C++17's std::from_chars which is:

Exception-free (returns error code)

Significantly faster (~10x in benchmarks)

Doesn't allocate or use locale

#include <charconv> int start, end; auto [ptr1, ec1] = std::from_chars(part.data(), part.data() + dash_pos, start); auto [ptr2, ec2] = std::from_chars(part.data() + dash_pos + 1, part.data() + part.size(), end); if (ec1 == std::errc{} && ec2 == std::errc{}) { // Valid parse } else { // Handle error without exception overhead }

Applies to lines 648, 649, and 657.

oonyshch · 2026-03-24T11:38:44Z

src/monolithic/gst/inference_elements/base/inference_impl.cpp

+    CPU_ZERO(&cpuset);                             // Initialize to zero
+    int num_cores = sysconf(_SC_NPROCESSORS_ONLN); // Get number of available CPU cores
+    for (int core_id = 0; core_id < num_cores; ++core_id) {
+        if (mask & (1ULL << core_id)) {


Here mask can have bits set beyond num_cores

The input mask parameter (64-bit) can have bits set for cores that don't exist on the system. For example:

System has 16 cores (num_cores = 16)

User passes mask 0xFFFFFFFFFFFFFFFF (all 64 bits set)

Loop iterates 16 times, but bits 16-63 in mask are never validated

CPU_SET gets called only for existing cores, BUT the mask is accepted silently

This creates inconsistency: the user thinks they pinned to cores 0-63, but actually only pinned to 0-15.

Please consider validating mask against actual core count:

// After line 1016 if (mask >= (1ULL << num_cores)) { GVA_WARNING("Affinity mask 0x%lx has bits set beyond available cores (%d), truncating\n", mask, num_cores); mask &= ((1ULL << num_cores) - 1); // Clear invalid bits }

mholowni · 2026-03-24T14:48:23Z

src/monolithic/gst/inference_elements/base/gva_base_inference.cpp

+// Convert range of integer IDs to bitset representing cores for pinning.
+// Example input: "1-5, 8, 10-12"
+void gva_base_inference_set_core_pinning(GvaBaseInference *base_inference, const gchar *range_str) {
+    guint64 core_mask = 0;


What about a possibility that we have a CPU with more than 64 cores?

nszczygl9 · 2026-03-27T12:11:55Z

please check why windows step fails

… to check for Intel Core Ultra Series 3 CPUs.

Copilot

Pull request overview

Adds CPU core pinning support to GStreamer inference elements to stabilize performance, including automatic pinning to P-cores on Intel Core Ultra 3xxH (PTL-H) when no affinity is otherwise configured.

Changes:

Introduces a new core-pinning property (string) on base inference elements and stores it as a core_pinning_mask.
Detects PTL-H (Ultra 3xxH, 16-core) CPUs and defaults affinity to the first 4 cores when process affinity is unrestricted.
Applies the resulting affinity mask by pinning the current thread during model creation.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 13 comments.

Show a summary per file

File	Description
src/utils/utils.h	Adds PTL-H core-count constant and declares PTL-H CPU detection helper (non-Windows).
src/utils/utils.cpp	Implements PTL-H CPU detection via `/proc/cpuinfo` model name regex (non-Windows).
src/monolithic/gst/inference_elements/base/inference_impl.h	Declares `InferenceImpl::SetAffinityMask`.
src/monolithic/gst/inference_elements/base/inference_impl.cpp	Applies thread affinity mask during model creation and implements mask-to-cpuset pinning.
src/monolithic/gst/inference_elements/base/gva_base_inference.h	Adds `core_pinning_mask` field to element state.
src/monolithic/gst/inference_elements/base/gva_base_inference.cpp	Adds `core-pinning` property, default mask derivation from process affinity, and string-to-mask parsing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-09T15:41:52Z

src/monolithic/gst/inference_elements/base/inference_impl.cpp

+    DWORD_PTR mask = static_cast<DWORD_PTR>(&mask);
+
+    HANDLE thread = GetCurrentThread(); // Get handle for the current thread
+    DWORD_PTR result = SetThreadAffinityMask(thread, mask);
+


Windows implementation of affinity pinning is incorrect: it redeclares mask (shadowing the parameter) and casts &mask (address) to DWORD_PTR, which will pass a pointer value instead of the intended bitmask to SetThreadAffinityMask and likely fail or pin unpredictably. Use the incoming uint64_t mask value (converted to DWORD_PTR) and avoid shadowing; also ensure required Windows headers/types are included under _WIN32.

Copilot · 2026-04-09T15:41:52Z

src/monolithic/gst/inference_elements/base/inference_impl.cpp

+    int num_cores = sysconf(_SC_NPROCESSORS_ONLN); // Get number of available CPU cores
+    for (int core_id = 0; core_id < num_cores; ++core_id) {
+        if (mask & (1ULL << core_id)) {
+            CPU_SET(core_id, &cpuset); // Add the specific core
+        }
+    }


Linux affinity code builds a cpu_set_t by iterating core_id < num_cores but shifts 1ULL << core_id. If num_cores >= 64, shifting by >=64 is undefined behavior (and the uint64_t mask cannot represent cores beyond 63 anyway). Consider capping the loop at 64 (or std::min(num_cores, 64)) and/or rejecting masks that reference unsupported core IDs.

Copilot · 2026-04-09T15:41:52Z

src/monolithic/gst/inference_elements/base/inference_impl.cpp

+void InferenceImpl::SetAffinityMask(uint64_t mask) {
+    GVA_INFO("Setting CPU affinity mask to 0x%lx\n", mask);
+#ifndef _WIN32
+    cpu_set_t cpuset;
+    CPU_ZERO(&cpuset);                             // Initialize to zero
+    int num_cores = sysconf(_SC_NPROCESSORS_ONLN); // Get number of available CPU cores
+    for (int core_id = 0; core_id < num_cores; ++core_id) {


This file now uses cpu_set_t/CPU_SET + pthread_setaffinity_np + sysconf, but it doesn’t include the required platform headers (typically <sched.h>, <pthread.h>, <unistd.h> on Linux, and <windows.h> on Windows). Relying on transitive includes is fragile and may break builds; add the proper includes under the existing _WIN32 guards.

Copilot · 2026-04-09T15:41:53Z

src/monolithic/gst/inference_elements/base/gva_base_inference.cpp

+            //if(Utils::isCPUPTLHSeries()) {
+            //    base_inference->core_pinning_mask = 0xF; // Pin to first 4 cores , the P-Cores
+            //}
+        }  


The _WIN32 branch of set_core_pinning_mask has an unterminated if (GetProcessAffinityMask(...)) { block: there’s no closing } before #endif, which will break Windows builds. Ensure all opened braces in the Windows branch are properly closed before the #endif and the function’s closing brace.

Suggested change

}

}

}

Copilot · 2026-04-09T15:41:53Z

src/monolithic/gst/inference_elements/base/gva_base_inference.cpp

+            if (part.find('-') != std::string::npos) {
+                // Parse range like "1-5"
+                size_t dash_pos = part.find('-');
+                int start = std::stoi(part.substr(0, dash_pos));
+                int end = std::stoi(part.substr(dash_pos + 1));
+
+                // Set bits in the range
+                for (int i = start; i <= end; i++) {
+                    core_mask |= (1ULL << i);
+                }
+            } else {
+                // Set single bit
+                core_mask |= (1ULL << std::stoi(part));
+            }


gva_base_inference_set_core_pinning builds a uint64_t mask using 1ULL << i where i comes from user input/ranges. Negative core IDs (e.g. "-1") or IDs >= 64 will cause undefined behavior when shifting. Validate parsed core IDs (and ranges) are within [0, 63] (and ideally within available CPU cores) before shifting/setting bits.

Copilot · 2026-04-09T15:41:54Z

src/monolithic/gst/inference_elements/base/gva_base_inference.cpp

+    g_object_class_install_property(
+        gobject_class, PROP_CORE_PINNING,
+        g_param_spec_string("core-pinning", "Core Pinning",
+                            "List or range of CPU cores to pin this inference element to (e.g., '0-3' or '0,2,3')",
+                            nullptr, param_flags));


New core-pinning property and parsing logic are introduced here, but there are existing GStreamer pipeline unit tests under tests/unit_tests/tests_gstgva/ that exercise gvadetect/gvaclassify properties. Please add coverage for core-pinning (valid list/range, invalid format, and default behavior) to prevent regressions (especially around parse errors and mask application).

Copilot · 2026-04-09T15:41:54Z

src/monolithic/gst/inference_elements/base/gva_base_inference.cpp

+    auto cpu_set_to_bitmask = [&num_cores](const cpu_set_t *cpu_set) -> guint64 {
+        guint64 bitmask = 0;
+        for (int i = 0; i < num_cores; ++i) {
+            if (CPU_ISSET(i, cpu_set)) {
+                bitmask |= (1ULL << i);
+            }
+        }


cpu_set_to_bitmask iterates i < num_cores and does bitmask |= (1ULL << i). If the system has 64+ online cores, shifting by >=64 is undefined behavior and the 64-bit mask can’t represent those cores anyway. Cap the loop at 64 (or at least guard if (i < 64)), and document/handle truncation explicitly.

Copilot · 2026-04-09T15:41:55Z

src/monolithic/gst/inference_elements/base/inference_impl.cpp

+void InferenceImpl::SetAffinityMask(uint64_t mask) {
+    GVA_INFO("Setting CPU affinity mask to 0x%lx\n", mask);
+#ifndef _WIN32
+    cpu_set_t cpuset;
+    CPU_ZERO(&cpuset);                             // Initialize to zero
+    int num_cores = sysconf(_SC_NPROCESSORS_ONLN); // Get number of available CPU cores
+    for (int core_id = 0; core_id < num_cores; ++core_id) {
+        if (mask & (1ULL << core_id)) {
+            CPU_SET(core_id, &cpuset); // Add the specific core
+        }
+    }
+
+    pthread_t current_thread = pthread_self(); // Get current thread handle
+    int result = pthread_setaffinity_np(current_thread, sizeof(cpu_set_t), &cpuset);
+
+    if (result != 0) {
+        GVA_ERROR("Error pinning thread: (%d)\n", result);
+    }


SetAffinityMask will call pthread_setaffinity_np even when mask == 0, producing an empty cpu_set_t and typically failing with EINVAL. Consider treating mask == 0 as “no pinning requested” (skip setting affinity) or returning/logging a clear error without attempting to set an empty CPU set.

Copilot · 2026-04-09T15:41:55Z

src/monolithic/gst/inference_elements/base/gva_base_inference.cpp

+#define DEFAULT_CORE_PINNING nullptr
+


DEFAULT_CORE_PINNING is defined but never used. Please remove it or wire it into the core-pinning property default to avoid dead code and confusion about the intended default behavior.

Suggested change

#define DEFAULT_CORE_PINNING nullptr

Copilot · 2026-04-09T15:41:55Z

src/monolithic/gst/inference_elements/base/gva_base_inference.cpp

    case PROP_SHARE_VADISPLAY_CTX:
        base_inference->share_va_display_ctx = g_value_get_boolean(value);
        break;
+    case PROP_CORE_PINNING:


Unlike several other property setters in this file (e.g., model, model-proc, labels) that check check_gva_base_inference_stopped() and warn/error when changed at runtime, core-pinning is applied unconditionally. Since affinity is only set during model creation, changing core-pinning while the element is running likely won’t take effect. Consider enforcing the same “only mutable while stopped” rule (or re-applying affinity when changed) to avoid a misleading runtime-configurable property.

Suggested change

case PROP_CORE_PINNING:

case PROP_CORE_PINNING:

if (GST_STATE(base_inference) > GST_STATE_READY) {

GST_WARNING_OBJECT(base_inference,

"core-pinning property can only be changed while the element is stopped");

break;

}

tjanczak and others added 3 commits March 6, 2026 13:57

Experimental feature: core pining for inference elements.

125a9b8

Update SetAffinityMask to use the number of available CPU cores

9dac795

Implement P-cores pinning if no affinitty mask is set and add utility…

23579ac

… to check for Intel Core Ultra Series 3 CPUs.

tjanczak requested review from BaoHuiling, OskarFiedot, ZiningLi, dmichalo, jmotow, marcin-wadolkowski, mholowni, msmiatac, nszczygl9, oonyshch, pbartosik, qianlongding, tbujewsk, walidbarakat, yangjianfeng1208 and yunowo as code owners March 24, 2026 08:44

oonyshch reviewed Mar 24, 2026

View reviewed changes

oonyshch requested changes Mar 24, 2026

View reviewed changes

oonyshch reviewed Mar 24, 2026

View reviewed changes

mholowni reviewed Mar 24, 2026

View reviewed changes

Implement P-cores pinning if no affinitty mask is set and add utility…

3397bdc

… to check for Intel Core Ultra Series 3 CPUs.

tjanczak marked this pull request as draft March 30, 2026 08:56

tbujewsk added 2 commits April 7, 2026 11:24

Merge branch 'main' into affinity_mask_param

551037d

Merge branch 'main' into affinity_mask_param

8a0b709

Copilot AI review requested due to automatic review settings April 9, 2026 15:31

Copilot started reviewing on behalf of tbujewsk April 9, 2026 15:32 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

-    base_inference->core_pinning_mask = core_mask;
+    } catch (const std::exception &e) {
+        GST_ELEMENT_ERROR(base_inference, RESOURCE, SETTINGS, ("Invalid core-pinning format"),
+                         ("Failed to parse core-pinning property: %s", e.what()));
+        return; // Preserve existing core_pinning_mask on error
+    }
+    // If string parsed without errors, update core pinning
+    base_inference->core_pinning_mask = core_mask;

-                core_mask |= (1ULL << std::stoi(part));
+                // Set single bit
+                int core_id = std::stoi(part);
+                if (core_id >= 0 && core_id < 64) {
+                    core_mask |= (1ULL << core_id);
+                } else {
+                    GST_WARNING_OBJECT(base_inference, "Core index %d out of range [0-63], skipping", core_id);
+                }

		#define DEFAULT_SHARE_VADISPLAY_CTX TRUE

		#define DEFAULT_CORE_PINNING nullptr


	#define DEFAULT_CORE_PINNING nullptr
	#define MAX_CPU_CORES 64

-            part.erase(0, part.find_first_not_of(" \t"));
+            // Trim whitespace
+            auto is_space = [](unsigned char c) { return std::isspace(c); };
+            part.erase(0, std::find_if_not(part.begin(), part.end(), is_space) - part.begin());
+            part.erase(std::find_if_not(part.rbegin(), part.rend(), is_space).base(), part.end());

Conversation

tjanczak commented Mar 24, 2026

Description

Any Newly Introduced Dependencies

How Has This Been Tested?

Checklist:

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mholowni Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nszczygl9 commented Mar 27, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mholowni Mar 24, 2026 •

edited

Loading