diff --git a/docs/src/manual/images/projection.svg b/docs/src/manual/images/projection.svg new file mode 100644 index 00000000..b3ac0a3a --- /dev/null +++ b/docs/src/manual/images/projection.svg @@ -0,0 +1,150 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/src/manual/images/projection.tex b/docs/src/manual/images/projection.tex new file mode 100644 index 00000000..bf286b03 --- /dev/null +++ b/docs/src/manual/images/projection.tex @@ -0,0 +1,29 @@ +\documentclass[tikz,convert={outfile=projection.svg}]{standalone} +\usepackage{graphicx} % Required for inserting images +\usepackage{tikz} +\title{projection drawings} +\author{Nathaniel Pritchard} +\date{October 2025} + +\begin{document} + +\begin{tikzpicture}[scale = 2] + \draw[->] (0,0) -- +(0,1.5); + \draw[->] (0,0) -- +(2,0); + \fill[color = red] (1.5,1) circle(2pt); + \draw[->, red, dashed, shorten >= 1mm] (1.5,1) -- +(0,-1); + \node at (1,-.3) {\textbf{Orthogonal Projection}}; + \node at (0, 1.7) {$\mathcal{N}$}; + \node at (2.2, 0) {$\mathcal{M}$}; + + \draw[->] (3,0) -- +(1.5,1.5); + \draw[->] (3,0) -- +(2,0); + \fill[color = red] (4.5,1) circle(2pt); + \draw[->,red, dashed, shorten >= 1mm] (4.5,1) -- +(-1,-1); + \node at (4,-.3) {\textbf{Oblique Projection}}; + \node at (5.2, 0) {$\mathcal{M}$}; + \node at (4.6, 1.6) {$\mathcal{Z}$}; +\end{tikzpicture} + +\end{document} + diff --git a/docs/src/manual/low_rank_approximators.md b/docs/src/manual/low_rank_approximators.md index bac2ac07..ea3e5b6f 100644 --- a/docs/src/manual/low_rank_approximators.md +++ b/docs/src/manual/low_rank_approximators.md @@ -1,58 +1,50 @@ # Low-Rank Approximations of Matrices -Often large matrices contain a lot of redundant information. This means that it is often -possible to form representations of large matrices with far fewer vectors than what the -original matrix contains. Representing a matrix with a small number of vectors -is known as low-rank approximation. Generally, low-rank approximations of -a matrix ``A \in \mathbb{R}^{m \times n}`` take two forms either a two matrix form where +Large matrices often contain redundant information. This means that it is +possible to form representations of large matrices with less data than what the +original matrix contains. +One way of representing a matrix with less data is through +low-rank approximations. +Generally, low-rank approximations of +a matrix ``A \in \mathbb{R}^{m \times n}`` take two forms: `` A \approx MN, `` where ``M \in \mathbb{R}^{m \times r}`` and ``N \in \mathbb{R}^{r \times n}``, -or the three matrix form where +or `` -A \approx MBN +A \approx MBN, `` -and ``M \in \mathbb{R}^{m \times r}``, ``N \in \mathbb{R}^{s \times n}``, and +where ``M \in \mathbb{R}^{m \times r}``, ``N \in \mathbb{R}^{s \times n}``, and ``B \in \mathbb{R}^{r \times s}``. -Once one of the above representations has been obtained they can then be used to speed up: +Once one of the above representations has been obtained they can then be used +to speed up a number of computations including matrix multiplication, clustering, or approximate eigenvalue decompositions [halko2011finding, eckart1936approximation, udell2019why, park2025curing](@cite). -Low rank approximations can take two different forms one being the orthogonal projection -form where coordinates are projected perpendicularly to onto a plane and the second being -the oblique forms where points are projected along another plane (see the below figure -for a visualization). -```@raw html - -``` - -We also can consider low-rank approximations for symmetric matrices and general matrices. -For symmetric and general matrices, the RandomizedSVD can be used as the orthogonal -projection method [halko2011finding](@cite). - -As far as oblique methods go, the difference between symmetric and asymmetric decompositions -becomes more complicated. For symmetric matrices, the go to approximation is the Nystrom -approximation. For the non-symmetric matrices, we can have a generalization of Nystrom -known as Generalized Nystrom or we can interpolative approaches, which select subsets of -the rows and/or columns to a matrix. If it these interpolative decompositions are performed -to select only columns or only rows then they are known as one sided IDs, if they are used +The type of approximation depends on the symmetry of the matrix. +For symmetric matrices, we can use the Nystrom +approximation. For non-symmetric matrices, we can use +the Generalized Nystrom decomposition or interpolative decompositions (IDs), +which select subsets of the rows and/or columns of a matrix. If these interpolative +decompositions are performed to select only columns or only +rows then they are known as one-sided IDs, if they are used to select both columns and rows then they are known as a CUR decomposition. Below, we present a summary of the decompositions in a table. -|Approximation Name| General Matrices| Interpolative| Type| Form of Approximation| -|:-----------------|:----------------|:-------------|:----|:---------------------| -|RandRangeFinder| Yes| No| Orthogonal| ``A \approx QQ^\top A``| -|RandSVD|Yes|No|Orthogonal|``A \approx U \Sigma V^\top``| -|Nystrom| Symmetric| Can be| Oblique| ``(AS)((SA)^\top AS)^\dagger(AS)^\top``| -|Generalizedd Nystrom| Yes| Can be| Oblique| ``(AS_1)(S_2A AS_1)^\dagger S_2 A``| -|CUR| Yes| Yes| Oblique| ``(A[:,J])U(A[I,:])``| -|One-Sided-ID| Yes| Yes| Oblique| ``A[:,J]U_c`` or ``U_r A[I,:]``| - -In RLinearAlgebra, -once you have obtained a low-rank approximation `Recipe` you can then use it to perform -multiplications in all cases or in some specific areas use it to precondition a linear -system through the `ldiv!` function. Below we have the table of approximation recipes +|Approximation Name|Matrix Type| Form of Approximation| +|:-----------------|:----------------|:---------------------| +|RandRangeFinder| General| ``A \approx QQ^\top A``| +|RandSVD|General|``A \approx U \Sigma V^\top``| +|Nystrom| Symmetric| ``(AS)((SA)^\top AS)^\dagger(AS)^\top``| +|Generalized Nystrom| General| ``(AS_1)(S_2A AS_1)^\dagger S_2 A``| +|CUR| Yes| Yes| General| ``(A[:,J])U(A[I,:])``| +|One-Sided ID| General| ``A[:,J]U_c`` or ``U_r A[I,:]``| + +In `RLinearAlgebra`, +once you have obtained a low-rank approximation `Recipe`, you can use it +to do multiplications (via `mul!`) and/or left inversion (via `ldiv!`). +Below we have the table of approximation recipes and indicate how they can be used. |Approximation Name| `mul!`| `ldiv!`| @@ -61,55 +53,55 @@ and indicate how they can be used. |RandSVDRecipe|Yes| No| |NystromRecipe|Yes| No| |CURRecipe|Yes| No| -|IDRecipe(One-Sided-ID)|Yes|No| +|IDRecipe (One-Sided ID)|Yes|No| + # The Randomized Rangefinder -The idea behind the randomized range finder is to find an orthogonal matrix ``Q`` such that -``A \approx QQ^\top A``. In their seminal work [halko2011finding](@cite) showed that -forming ``Q`` was as simple as compressing ``A`` from the right and storing the Q from the -resulting QR factorization. Despite the simplicity of this procedure they were able to show - if the compression dimension, ``k>2``, then +The idea behind the randomized range finder is to find +an orthogonal matrix ``Q`` such that +``A \approx QQ^\top A``. [halko2011finding](@citet) showed that +forming ``Q`` was as simple as compressing ``A`` from the right +and storing the ``Q`` from the +resulting QR factorization. +Despite the simplicity of this procedure, + if the compression dimension, ``k``, exceeds 2, then ``\|A - QQ^\top A\|_F \leq \sqrt{k+1} (\sum_{i=k+1}^{\min{(m,n)}}\sigma_{i})^{1/2}``, - where ``\sigma_{k+1}`` is the ``k+1^\text{th}`` singular value of A (see Theorem 10.5 -of [halko2011finding](@cite)). This is very close to the error from the truncated SVD, which +where ``\sigma_{k+1}`` is the ``k+1^\text{th}`` singular value of ``A`` +[halko2011finding; Theorem 10.5](@cite). +This is very close to the error from the truncated SVD, which is known to be the lowest achievable error. - - -For many matrices that singular values that decay quickly, this bound can be far more -conservative than the observed performance. However, for some matrices whose singular values -decay slowly this bound is fairly tight. Luckily, using power iterations we can -still improve the quality of the approximation. Power Iterations basically involve -multiplying the matrix with itself, which results in raising each singular value to a -higher power. This powering of the singular values increases the gap between the singular -values making them easier to accurately capture. -In `RLinearAlgebra`, you can control the number of power iterations using the `power_its` -keyword in the constructor. +For matrices whose singular values decay quickly, this bound can be +conservative in comparison to observed performance. +However, for some matrices whose singular values +decay slowly this bound is fairly tight. +`RLinearAlgebra` implements the randomized rangefinder using +[`RangeFinder`](@ref). + +Power iterations can be added to improve the quality of the approximation. +Power iterations basically involve multiplying the matrix with itself, +which results in raising each singular value to a +higher power. This powering of the singular values increases the gap between the +singular values making them easier to resolve. +In `RLinearAlgebra`, you can control the number of power iterations +using the `power_its` keyword in [`RangeFinder`](@ref). One issue with power iterations is that they can sometimes be -unstable. We can also improve the stability of these iterations by orthogonalizing between -power iterations. Meaning that instead of computing ``A A^\top A`` as is done in the power -iterations we compute ``A^\top A`` and take a QR factorization of this matrix to obtain a -``Q`` then compute ``A Q``. In RLinearAlgebra you can control whether or not the +unstable. +We can improve the stability of these iterations by orthogonalizing between +power iterations (akin to Arnoldi iterations). +In `RLinearAlgebra` you can control whether or not the orthogonalization is performed using the `orthogonalize` keyword argument in the -constructor. +[`RangeFinder`](@ref). !!! info If the cardinality of the compressor in the `RangeFinder` is not `Right()` a warning - will be returned and the approximation may be incorrect. + will be returned and the approximation may be inefficient or less accurate. ## A RangeFinder Example -Lets say that we wish to obtain a rank-5 RandomizedSVD to matrix with 1000 rows and columns. -In RLinearAlgebra.jl we can do this by first generating the `RandomizedSVD` `Approximator`. -This will require us to specify a `Compressor` with the desired rank of approximation as the -`compression_dim` and the `cardinality=Right()`, the number of power iterations we want -to be performed, and whether we want to orthogonalize the power iterations. - - -To begin, we consider forming an approximation to a rank 5 matrix with 1000 rows and -columns. Then will define a `RangeFinder` structure with a `FJLT` compressor with a -`compression_dim = 5` and a -`cardinality = Right()`. After defining this structure we will then use `rapproximate` to -generate a `RangeFinderRecipe`, which we will then compute the error relying on the ability -to multiply `RangeFinderRecipe`s. +Let's say that we wish to obtain a five-dimensional Randomized Rangefinder approximation +to matrix with 1000 rows and columns and a rank of five. +In `RLinearAlgebra`, we can easily do this with the Fast Johnson-Lindenstrauss +Compressor (see [`FJLT`](@ref)), say, as follows. + ```julia using RLinearAlgebra, LinearAlgebra @@ -127,11 +119,17 @@ range_A = rapproximate(approx, A) # Check the error of the approximation norm(A - range_A * (range_A' * A)) ``` -To see the benefits of power iterations we consider the same example but now with -`compression_dim = 3`, then we consider the truncated SVD error, -the error of an approximation with no power iterations, -the error of an approximation with 10 power iterations but no -orthogonalization, and the error of an approximation with 10 power iterations and + +### Adding Power Iterations with and without Orthogonalization + +To see the benefits of power iterations, consider the same example but now with +`compression_dim = 3`. + +Below, we first compute the best possible truncation error using +a singular value decomposition of rank 3. +Then, we compute the Randomized Rangefinder _without_ the power iteration. +Then, we compute the Randomized Rangefinder _with_ the power iteration. +Finally, we compute the Randomized Rangefinder _with_ the power iteration _and_ orthogonalization. ```julia @@ -184,7 +182,7 @@ printstyled("Error with 10 Power its and Orthogonalization:", ``` # The RandSVD The RandomizedSVD is a form of low-rank approximation that returns the approximate -singular values and vectors to the truncated SVD. Algorithmically, it is implemented as +singular values and vectors of the truncated SVD. Algorithmically, it is implemented as three additional steps to the Randomized Rangefinder in [halko2011finding](@cite). Specifically these steps are: 1. Take the ``Q`` matrix from the Randomized Rangefinder and compute ``Q^\top A``. @@ -238,3 +236,6 @@ x = rand(1000); norm(A * x - randsvd_A * x) ``` + +!!! info + As for the RandomizedSVD if the cardinality of the compressor is not `Right()` a warning will be returned and the approximation may be incorrect. diff --git a/docs/src/refs.bib b/docs/src/refs.bib index 43727bfe..022340c6 100644 --- a/docs/src/refs.bib +++ b/docs/src/refs.bib @@ -36,36 +36,13 @@ @article{halko2011finding bdsk-url-1 = {https://doi.org/10.1137/090771806}} @misc{martinsson2020randomized, - title = {Randomized {{Numerical Linear Algebra}}: {{Foundations}} \& {{Algorithms}}}, - shorttitle = {Randomized {{Numerical Linear Algebra}}}, - author = {Martinsson, Per-Gunnar and Tropp, Joel}, - year = {2020}, - publisher = {arXiv}, - doi = {10.48550/ARXIV.2002.01387}, -} - -@article{needell2014paved, - title = {Paved with Good Intentions: {{Analysis}} of a Randomized Block {{Kaczmarz}} Method}, - shorttitle = {Paved with Good Intentions}, - author = {Needell, Deanna and Tropp, Joel A.}, - year = {2014}, - month = jan, - journal = {Linear Algebra and its Applications}, - volume = {441}, - pages = {199--221}, - issn = {00243795}, - doi = {10.1016/j.laa.2012.12.022}, - langid = {english}, -} - -@article{pilanci2014iterative, - title = {Iterative {{Hessian}} Sketch: {{Fast}} and Accurate Solution Approximation for Constrained Least-Squares}, - shorttitle = {Iterative {{Hessian}} Sketch}, - author = {Pilanci, Mert and Wainwright, Martin J.}, - year = {2016}, - Journal = {Journal of Machine Learning Research}, - volume = {17} -} + author = {Martinsson, Per-Gunnar and Tropp, Joel}, + doi = {10.48550/ARXIV.2002.01387}, + publisher = {arXiv}, + shorttitle = {Randomized {{Numerical Linear Algebra}}}, + title = {Randomized {{Numerical Linear Algebra}}: {{Foundations}} \& {{Algorithms}}}, + year = {2020}, + } @article{motzkin1954relaxation, author = {Motzkin, T. S. and Schoenberg, I. J.},