Skip to content

Polar decomposition QDWH#2

Open
dsukkari wants to merge 50 commits intoicl-utk-edu:masterfrom
dsukkari:polar
Open

Polar decomposition QDWH#2
dsukkari wants to merge 50 commits intoicl-utk-edu:masterfrom
dsukkari:polar

Conversation

@dsukkari
Copy link
Contributor

The polar decomposition QDWH of a general matrix A = U * H, where U is orthogonal polar factor and H is hermitian polar factor.

QDWH iterations rely on Cholesky based and QR based iterations to compute the orthogonal polar factor U.

For the QR based iterations, new customized geqrf_qdwh_full and unmqr_qdwh_full are included to take advantage of the identity structure of the matrix involved during the QR based iterations.

The 2-norm estimate (norm2est) of the original matrix is required, the norm2est using power iteration is implemented and called in QDWH.

The following figure present the performance of SLATE_QDWH on Summit using various number of nodes.
image

@mgates3
Copy link
Collaborator

mgates3 commented Jan 27, 2023

Check for warnings, i.e., add -Werror to CXXFLAGS in make.inc.

Copy link
Collaborator

@mgates3 mgates3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass through. Probably more changes later on.

@dsukkari
Copy link
Contributor Author

dsukkari commented Feb 8, 2023

All test passed, except one failure for gels using cholqr.

lapack::Gflop<scalar_t>::potrf(m) +
blas::Gflop<scalar_t>::trsm(slate::Side::Left, m, n) );

double gflop_compute_H = blas::Gflop<scalar_t>::her2k(n, m);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, this is really gemm, but eventually it should be herk (i.e., herkx) instead of her2k.

(I'll fix.)

Copy link
Collaborator

@mgates3 mgates3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack! Pending comments from a long time ago. I didn't confirm if these make sense.

slate_error("Failed to converge.");
}
itconv++;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a previous commit about using double to avoid overflow. Can we just use double for everything instead of casting everything to real_t?

We can simplify by using some constants, e.g., const real_t r2 = 2.0. Or in double, just use constants 1.0, 2.0, etc. in the formulas.

printf("\nConverged after %d. Check what is the issue, "
"because QDWH needs <= 6 iterations.\n",
itqr+itpo);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved these outputs to the tester, to maintain xSDK compatibility.

//her2k(one, A, W10, rzero, H, opts);
//auto AL = HermitianMatrix<scalar_t>(
// slate::Uplo::Lower, H );
//slate::copy(AL, H, opts);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

herkx or gemmtr, not her2k.

sqd = sqrt( r_one + real_t(dd) );
a1 = sqd + sqrt( real_t(8.0) - real_t(4.0) * real_t(dd) +
real_t(8.0) * ( real_t(2.0) - L2 ) / ( L2 * sqd ) ) / real_t(2.0);
a = real(a1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a1 and a are both double, so what does this real( ) call do?

auto R = TriangularMatrix<scalar_t>(
Uplo::Upper, slate::Diag::NonUnit, R1 );
normR = norm(slate::Norm::One, R, opts);
slate::trcondest(slate::Norm::One, R, &Li, opts);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If trcondest takes Rnorm as does gecondest, then we can actually set Rnorm = 1.0 and avoid computing it entirely, since it just gets cancelled:
smin_est = Rnorm * rcond = Rnorm * 1 / (Rnorm * || R^{-1} ||_1) = 1 / || R^{-1} ||_1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants