-
Notifications
You must be signed in to change notification settings - Fork 225
feature: add noise_variance computation on oneDAL side #3101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feature: add noise_variance computation on oneDAL side #3101
Conversation
/intelci: run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully this review is useful, I see some basic for loops which could use pragmas, and then some locations where its done on CPU where maybe a GPU solution could also be done without too much work.
DAAL_INT iu = static_cast<DAAL_INT>(nFeatures); | ||
DAAL_INT m; | ||
DAAL_INT info; | ||
// Could be modified to be a function parameter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd add a little snippet referring to the application notes for values < 0 (https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-c/2025-0/syevx.html)
return services::Status(services::ErrorPCAFailedToComputeCorrelationEigenvalues); | ||
} | ||
|
||
for (size_t i = 0; i < nComponents; ++i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any way to add vectorization pragmas here? (thoughts @Vika-F ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't merged the PR with pragmas change yet :-\ need to collect the perf data.
But I think the inner loop should be easily vectorizable, maybe even without the pragmas. But pragmas can help to guide the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My opinion that vectorization pragmas could be added later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make/share a ticket for that. I don't think it would be too bad if @Vika-F 's PR is merged soon.
auto eigvals = arr_eigval.get_data(); | ||
auto total_variance = arr_vars.get_data(); | ||
|
||
double noiseVariance = 0.0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pragmas here too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think no
@@ -126,6 +126,21 @@ static result_t call_daal_kernel(const context_cpu& ctx, | |||
result.set_singular_values(homogen_table::wrap(arr_singular_values, 1, component_count)); | |||
} | |||
|
|||
if (desc.get_result_options().test(result_options::noise_variance)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this done twice once for batch and once for incremental?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, in general it should be done once for each of method:
batch: cov, svd, precomputed
incrementa: svd, cov
cpp/oneapi/dal/algo/pca/backend/gpu/finalize_train_kernel_cov_impl_dpc.cpp
Show resolved
Hide resolved
std::int64_t rows_count_ = rows_count_global; | ||
auto range = std::min(rows_count_, column_count) - component_count; | ||
auto noise_variance = | ||
compute_noise_variance_on_host(q, eigvals_host_tmp, range, { syevd_event }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am reading this right, this would mean that the compute_noise_variance
is only done on host? If so please note this in the code and in the description (possibly as a follow up, or why it is done this way).
auto eigvals_ptr = eigenvalues.get_data(); | ||
|
||
Float sum = 0; | ||
for (std::int64_t i = 0; i < range; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reason not to have an equivalent add reduce
call on GPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess its a performance question, usually the range is kinda small and reduce(parallel_for kerenl initialization) takes more time than move data to cpu and compute via cpu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd leave a comment behind that design choice then, as that should help guide developers who may see this in the future.
Float sum = 0; | ||
for (std::int64_t i = 0; i < column_count; ++i) { | ||
sum += vars_ptr[i]; | ||
} | ||
|
||
ONEDAL_ASSERT(sum > 0); | ||
const Float inverse_sum = 1.0 / sum; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why this was done on CPU?
copyArray(nFeatures * nComponents, fullEigenvectorsArray, eigenvectorsArray); | ||
copyArray(nComponents, fullEigenvaluesArray, eigenvaluesArray); | ||
// SYEVR branch | ||
// In this case, we compute only nComponents eigenvectors and then sort them in descending order |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorting is mentioned in the comment, but there is no sorting in the code. Is it Ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sorting for syevr is already included in computeEigenVectorsInplaceSyevr, but I can split the functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To my opinion, it would be better to align the behavior of computeEigenvectorsInplace
and computeEigenvectorsInplaceSyevr
: either to include sorting into both, or to exclude sorting from both.
Otherwise, it is harder to get what's happening in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All lapack functions for symmetric eigenvalues produce the results in sorted order by design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean descending and ascending ordering sort
/intelci: run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces noise variance computation into oneDAL’s PCA implementation and adds a new syevr‐based eigen decomposition function to enhance performance and improve variance estimation.
- Added noise variance getter/setter and corresponding computation in both CPU and GPU backends.
- Updated tests and common option identifiers to support noise variance.
- Removed legacy sign flip code where replaced with GPU-specific implementations.
Reviewed Changes
Copilot reviewed 28 out of 28 changed files in this pull request and generated 1 comment.
Show a summary per file
File | Description |
---|---|
examples/oneapi/dpc/source/pca/pca_cor_dense_batch.cpp | Added console output for noise variance in PCA train result. |
cpp/oneapi/dal/algo/pca/train_types.{hpp,cpp} | Introduced noise variance getter/setter and implementation functions. |
cpp/oneapi/dal/algo/pca/common.{hpp,cpp} | Added noise_variance result option and updated indices consistently. |
cpp/oneapi/dal/algo/pca/backend/* (multiple files) | Integrated noise variance computation and replaced sign_flip usage. |
cpp/daal/src/algorithms/pca/* | Updated DAAL integration to compute noise variance and added syevr branch. |
@@ -91,6 +91,60 @@ auto syevd_computation(sycl::queue& queue, | |||
return std::make_tuple(eigenvalues, syevd_event); | |||
} | |||
|
|||
template <typename Float> | |||
auto prepare_eigenvectors_svd(sycl::queue& queue, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a description.
// SYEVR branch | ||
// In this case, we compute only nComponents eigenvectors and then sort them in descending order | ||
// inside the 'computeEigenvectorsInplaceSyevr' function | ||
if (nComponents < nFeatures) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could it also limit the components by the number of rows in the data by this point? Is that info available here?
eigenvalues[i] = temp_eigenvalues[idx]; | ||
for (size_t j = 0; j < nFeatures; ++j) | ||
{ | ||
eigenvectors[j + i * nFeatures] = temp_eigenvectors[j + idx * nFeatures]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a dedicated lapack function to reorder columns:
https://www.netlib.org/lapack/explore-3.2.1-html/dlapmt.f.html
Plus there's C++ 'reverse' for vectors:
https://en.cppreference.com/w/cpp/algorithm/reverse.html
for (size_t i = 0; i < nComponents; ++i) | ||
{ | ||
size_t idx = nComponents - 1 - i; | ||
eigenvalues[i] = temp_eigenvalues[idx]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be discarding components with too small eigenvalues?
|
||
Float max_val = row[0]; | ||
Float abs_max = std::abs(row[0]); | ||
for (std::int64_t j = 1; j < column_count; j++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it perhaps be faster to use idamax
from blas?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a todo for investigation. For now looks like makes the pr bigger
auto explained_variances_ratio_ptr = explained_variances_ratio.get_mutable_data(); | ||
|
||
Float sum = 0; | ||
for (std::int64_t i = 0; i < column_count; ++i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could oneDPL be used for this kind of things?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add a todo mark, but may be done in next pr
auto eigvals_ptr = eigenvalues.get_data(); | ||
auto singular_values_ptr = singular_values.get_mutable_data(); | ||
|
||
const Float factor = row_count - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why '-1' here? Wouldn't this make the result not meet the necessary property that
auto compute_event = queue.submit([&](sycl::handler& h) { | ||
h.depends_on(deps); | ||
h.parallel_for(sycl::range<1>(component_count), [=](sycl::id<1> i) { | ||
singular_values_ptr[i] = sycl::sqrt(factor * eigvals_ptr[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
syevr only generates results that are valid up to numerical tolerance. So a very small eigenvalue that should in theory be positive or zero could still end up as a very small negative number.
@@ -45,6 +45,8 @@ class PCADenseBase : public Kernel | |||
services::Status computeExplainedVariancesRatio(const data_management::NumericTable & eigenvalues, | |||
const data_management::NumericTable & variances, | |||
data_management::NumericTable & explained_variances_ratio); | |||
services::Status computeNoiseVariances(const data_management::NumericTable & eigenvalues, const data_management::NumericTable & variances, | |||
double & noise_variance); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Vika-F does it make sense to name the variable noiseVariance for daal like style?
algorithmFPType tmp = eigenvalues[i]; | ||
algorithmFPType tmp_rev = eigenvalues[nComponents - 1 - i]; | ||
|
||
if (tmp < 0) tmp = algorithmFPType(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a singular value should in theory be zero when calculated to infinite precision, lapack will very likely output a number (either positive or negative) in the neighborhood of zero. The approach of discarding values below machine epsilon is more sensible given the way lapack works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify please in terms of code whats your suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion is to compare the (signed) values against a (positive) threshold - e.g.:
if (tmp < thr) {
// discard
} else {
// take
Changes Summary
Added
noise_variance
computationImplemented calculation of noise variance as part of the PCA result options. This allows better estimation of the unexplained variance in the dataset.
Added
syevr
-based eigen decomposition functionIntroduced a new function utilizing LAPACK's
syevr
routine to improve the performance of eigenvector and eigenvalue computations for symmetric matrices.PR should start as a draft, then move to ready for review state after CI is passed and all applicable checkboxes are closed.
This approach ensures that reviewers don't spend extra time asking for regular requirements.
You can remove a checkbox as not applicable only if it doesn't relate to this PR in any way.
For example, PR with docs update doesn't require checkboxes for performance while PR with any change in actual code should have checkboxes and justify how this code change is expected to affect performance (or justification should be self-evident).
Checklist to comply with before moving PR from draft:
PR completeness and readability
Testing
Performance