-
Notifications
You must be signed in to change notification settings - Fork 225
Fix incorrect CPU topology initialization #3304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
// Call to `getCpuId` changes global settings, in particular, | ||
// changes default number of threads in the threading layer | ||
const int cpuid = env->getCpuId(); | ||
sys_info_["top_enabled_cpu_extension"] = std::make_any<cpu_extension>(from_daal_cpu_type(cpuid)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be converted into a struct instead? Looks like the fields are always the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it could. It also might help to fix the other part of the issue.
@Vika-F I can confirm that something in this PR is indeed fixing issues with undefined memory access, at the very least on ice lake, as confirmed by memsan no longer reporting incorrect accesses, and by asan no longer reporting leaked memory from a corrupted glktsn object. It still segfaults if sklearnex is imported but not used, and there might still be other issues causing segfaults upon import in other scenarios, but this at least solves the main problem. I still see issues like this reported from tysan when running the C++ examples though:
.. but guess there's some chance that they might be due to some unrelated memory corruption that coincidentally happens to affect that object at that point. |
… std::array usage to check for out of bound access
/intelci: run |
/intelci: run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves error handling and fixes bugs in CPU topology initialization in oneDAL/DAAL. The changes focus on making the global CPU topology detection more robust and preventing undefined behavior when the Environment object is uninitialized.
- Fixed null pointer safety by adding checks for uninitialized Environment instances
- Improved error handling in CPU topology detection functions
- Fixed memory safety issues in CPU topology analysis functions
Reviewed Changes
Copilot reviewed 12 out of 13 changed files in this pull request and generated 2 comments.
Show a summary per file
File | Description |
---|---|
cpp/oneapi/dal/detail/parameters/system_parameters_impl.hpp | Changed class to struct and updated map key type from string to const char* |
cpp/oneapi/dal/detail/parameters/system_parameters_impl.cpp | Added null pointer checks for Environment and improved parameter initialization |
cpp/oneapi/dal/detail/parameters/system_parameters.cpp | Simplified smart pointer construction |
cpp/oneapi/dal/detail/cpu.hpp | Updated CPU feature map key type and moved architecture defines |
cpp/oneapi/dal/common.hpp | Added architecture detection defines moved from cpu.hpp |
cpp/daal/src/services/service_topo.h | Removed large amounts of platform-specific code and type definitions |
cpp/daal/src/services/library_version_info.cpp | Added null pointer check for Environment instance |
cpp/daal/src/services/env_detect.cpp | Enhanced Environment initialization with error handling and null checks |
cpp/daal/src/externals/service_service_mkl.h | Added error checking for CPU topology status |
cpp/daal/src/data_management/data_conversion.cpp | Added null pointer check for Environment instance |
cpp/daal/src/algorithms/k_nearest_neighbors/kdtree_knn_classification_train_dense_default_impl.i | Added null pointer check for Environment instance |
cpp/daal/include/algorithms/algorithm_base_common.h | Added null pointer check for Environment instance |
|
||
private: | ||
unsigned dim[2] = { 0, 0 }; // xdim and ydim | ||
unsigned * data = nullptr; // data array to be malloc'd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it use std::unique_ptr
or std::vector
instead?
Thanks. I can confirm that this fixed at least one memory corruption. But it seems there's one more file that might need to be revisited - see trace from msan:
Besides that one, I still see issues like these when executing the C++ oneAPI examples which suggest that the global object is getting corrupted:
And I still see a segfault now happening on the hyperparameters initialization, but it looks like this is solving some of the uninitialized memory issues at least. |
Also since these are being revisited - much more minor thing, but bit-shifting negative integers is technically undefined behavior:
Could it perhaps be done with a full unsigned integer instead? |
In case it helps, here's a different usage of what I guess is the same detection logic: And this is where memsan reports an undefined memory usage:
Also that file is converting pointers |
And looks like clang provides a built-in to extract this info:
|
@david-cortes-intel I'll try to replace it with __cpuidex for all compilers. |
…ifting in service_topo.cpp
Thanks, can confirm that the changes here fixed the undefined access on the cpuid call, and the undefined negative bit shifting. I still see some hints of memory corruption with tysan like these:
.. but they might be coming from somewhere else. |
|
||
private: | ||
unsigned dim[2] = { 0, 0 }; // xdim and ydim | ||
std::unique_ptr<unsigned[], void (*)(unsigned *)> data; // data array to be malloc'd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a benefit in using oneDAL's custom allocators and deallocators? I understand the advantage is that they return aligned memory and can thus use SIMD instructions on them, but does it matter here? Aren't the dispatchers for SIMD-enabled things meant to work only after all of the info here has been collected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is done mostly not to have an aligned memory here, but because we believe that custom malloc is faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a feeling that we might find out it actually isn't faster if it's behind 3 function calls, for a very small size, and requires returning aligned memory. But maybe you could add it as a comment then that it is due to being faster than standard malloc.
}; | ||
|
||
// Global CPU topology object | ||
static glktsn globalCPUTopology; | ||
|
||
static char scratch[BLOCKSIZE_4K]; // scratch space large enough for OS to write SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static char scratch[BLOCKSIZE_4K]; // scratch space large enough for OS to write SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX | |
static std::array<char, BLOCKSIZE_4K> scratch; // scratch space large enough for OS to write SYSTEM_LOGICAL_PROCESSOR_INFORMATION_EX |
For more debuggability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR is in a mergeable state. Remainder of the unsolved issues will be left for follow-up PRs. Actually there's now some issues popping up in riscv:
malloc_consolidate(): unaligned fastbin chunk detected
I just realized I had been using sanitizers incorrectly. I can actually see a lot more issues now popping up about uninitialized memory. For example, there appears to be an issue in this specific line: oneDAL/cpp/daal/src/services/service_topo.cpp Line 1609 in a934f8a
Here's the first two things that pop out of MSan as of the latest commit here:
|
Merge changes from main
Looks like the CPU detection is now broken on AMD runners. The investigation is in progress. |
Description
The error handling in the global object that defines CPU topology in oneDAL/DAAL and in global
daal::services::Environment
object was improved.Several related bugs were fixed.
setChkProcessAffinityConsistency()
was changed to allow affinity masks that contain zeros. Previously having the process affinity mask with zeros lead to undefined behavior because the global object that defines CPU topology ended up in non-initialized state in that case.analyzeEachCHierarchy()
function. PreviouslypDetectedEachCIDs
andpDetectThreadIDsperEachC
were invalidly writing the memory after the end of arrays in the branchif (!CacheMarked)
.daal::services::Environment
class was updated to always callgetCpuId()
method which initializes theEnvironment
instance. Without this call an uninitialized version ofEnvironment
might be used.daal::services::Environment::getInstance()
function that returns the pointer to a global DAAL'sEnvironment
object was changed. Now the function returnsnullptr
in case the globalEnvironment
object is uninitialized. The respective null pointer checks were added into the calling code.PR completeness and readability
Testing
Performance
not applicable