Algebraic-Programming
diff --git a/‎ALP_Transition_Path_Tutorial.tex‎
Lines changed: 258 additions & 0 deletions b/‎ALP_Transition_Path_Tutorial.tex‎
Lines changed: 258 additions & 0 deletions
@@ -0,0 +1,258 @@
+
+\section*{Introduction to ALP and Transition Paths}
+
+\textbf{Algebraic Programming (ALP)} is a C++ framework for high-performance linear algebra that can auto-parallelize and auto-optimize your code. A key feature of ALP is its transition path APIs, which let you use ALP through standard interfaces without changing your existing code. In practice, ALP generates drop-in replacements for established linear algebra APIs. You simply re-link your program against ALP’s libraries to get optimized performance (no code modifications needed). ALP v0.8 provides transition-path libraries for several standards, including the NIST Sparse BLAS and a CRS-based iterative solver interface (ALP/Solver). This means you can take an existing C/C++ program that uses a supported API and benefit from ALP’s optimizations (such as vectorization and parallelism) just by linking with ALP’s libraries [1].
+
+One of these transition paths, and the focus of this tutorial, is ALP’s \textbf{sparse Conjugate Gradient (CG) solver}. This CG solver accepts matrices in Compressed Row Storage (CRS) format (also known as CSR) and solves $Ax=b$ using an iterative method. Under the hood it leverages ALP’s non-blocking execution model, which overlaps computations and memory operations for efficiency. From the user’s perspective, however, the solver is accessed via a simple C-style API that feels synchronous. In this workshop, we’ll learn how to use this CG solver interface step by step: from setting up ALP, to coding a solution for a small linear system, to building and running the program.
+
+
+\section*{Setup: Installing ALP and Preparing to Use the Solver}
+This section explains how to install ALP on a Linux system and compile a simple example. ALP (Algebraic Programming) provides a C++17 library implementing the GraphBLAS interface for linear-algebra-based computations.
+
+\subsection*{Installation on Linux}
+
+\begin{enumerate}
+\item Install prerequisites: Ensure you have a C++11 compatible compiler (e.g. \texttt{g++} 4.8.2 or later) with OpenMP support, CMake (>= 3.13) and GNU Make, plus development headers for libNUMA and POSIX threads. 
+For example, on Debian/Ubuntu:
+\begin{verbatim}
+sudo apt-get install build-essential libnuma-dev libpthread-stubs0-dev cmake
+\end{verbatim}
+
+\item Obtain ALP: Download or clone the ALP repository (from the official GitHub). For instance:
+\begin{verbatim}
+git clone https://github.com/Algebraic-Programming/ALP.git
+\end{verbatim}
+Then enter the repository directory.
+
+\item Build ALP: Create a build directory and invoke the provided bootstrap script to configure the project with CMake, then compile and install:
+\begin{lstlisting}[language=bash]
+$ cd ALP && mkdir build && cd build
+$ ../bootstrap.sh --prefix=../install # configure the build
+$ make -j # compile the ALP library
+$ make -j install # install to ../install
+$ source ../install/bin/setenv
+
+\end{lstlisting}
+(You can choose a different installation prefix as needed.)
+
+\item Set up environment: After installation, activate the ALP environment by sourcing the script setenv in the install directory:
+\begin{lstlisting}[language=bash]
+$ source ../install/bin/setenv
+\end{lstlisting}
+This script updates paths to make ALP’s compiler wrapper and libraries available.
+
+\item Compile an example: ALP provides a compiler wrapper \texttt{grbcxx} to compile programs that use the ALP/GraphBLAS API. This wrapper automatically adds the correct include paths and links against the ALP library and its dependencies. For example, to compile the provided sp.cpp sample:
+\begin{lstlisting}[language=bash]
+$ grbcxx ../examples/sp.cpp -o sp_example
+\end{lstlisting}
+By default this produces a sequential program; you can add the option \texttt{-b reference\_omp} to use the OpenMP parallel backend (shared-memory parallelism). The wrapper \texttt{grbcxx} accepts other backends as well (e.g.\ \texttt{-b hybrid} for distributed memory).
+
+\item Run the program: Use the provided runner \texttt{grbrun} to execute the compiled binary. For a simple shared-memory program, running with \texttt{grbrun} is similar to using \texttt{./program} directly. For example:
+\begin{lstlisting}[language=bash]
+$ grbrun ./sp_example
+\end{lstlisting}
+(The \texttt{grbrun} tool is more relevant when using distributed backends or controlling the execution environment; for basic usage, the program can also be run directly.)
+\end{enumerate}
+
+You can also specify a backend with the -b flag. For instance, -b reference builds a sequential version, while -b reference\_omp enables ALP’s shared-memory (OpenMP) parallel backend . If you built ALP with distributed-memory support, you might use -b hybrid or -b bsp1d for hybrid or MPI- nstyle backends. In those cases, you would run the program via grbrun (which handles launching multiple processes) – but for this tutorial, we will use a single-process, multi-threaded backend, so running the program normally is fine.
+\\
+
+\textbf{Direct linking option}: If you prefer to compile with your usual compiler, you need to include ALP’s headers and link against the ALP libraries manually. For the CG solver transition path, that typically means linking against the sparse solver library (e.g. libspsolver\_shmem\_parallel for the parallel version) and any core ALP libraries it depends on. For example, if ALP is installed in /opt/alp , you might compile with:
+
+\begin{lstlisting}[language=bash]
+gcc -I/opt/alp/include -L/opt/alp/lib \
+-lspsolver_shmem_parallel -lalp_cspblas_shmem_parallel \
+my_program.c -o my_program
+\end{lstlisting}
+
+
+(ALP’s documentation provides details on which libraries to link for each backend [3].) Using grbcxx is recommended for simplicity, but it’s good to know what happens under the hood. Now that our environment is set up, let’s look at the CG solver API.
+
+
+\section*{Overview of ALP’s Non-Blocking Sparse CG API}
+
+The ALP/Solver transition path provides a C-style interface for initializing and running a Conjugate Gradient solver. All functions are exposed via a header (e.g. solver.h in ALP’s include directory) and use simple types like pointers and handles. The main functions in this API are:
+
+\begin{itemize}
+\item \textbf{sparse\_cg\_init(\&handle, n, vals, cols, offs):} Initializes a CG solver instance. It allocates/assigns a solver context to the user-provided sparse\_cg\_handle\_t (an opaque handle type defined by ALP). The matrix $A$ is provided in CRS format by three arrays: vals (the nonzero values), cols (the column indices for each value), and offs (offsets in the vals array where each row begins). The dimension n (number of rows, which should equal number of columns for $A$) is also given. After this call, the handle represents an internal solver state with matrix $A$ stored. Return: Typically returns 0 on success (and a non-zero error code on failure) [4].
+
+\item \textbf{sparse\_cg\_set\_preconditioner(handle, func, data):} (Optional) Sets a preconditioner for the iterative solve. The user provides a function func that applies the preconditioner $M^{-1}$ to a vector (i.e. solves $Mz = r$ for a given residual $r$), along with a user data pointer. ALP will call this func(z, r, data) in each CG iteration to precondition the residual. If you don’t call this, the solver will default to no preconditioning (i.e. $M = I$). You can use this to plug in simple preconditioners (like Jacobi, with data holding the diagonal of $A$) or even advanced ones, without modifying the solver code. Return: 0 on success, or error code if the handle is invalid, etc.
+
+\item \textbf{sparse\_cg\_solve(handle, x, b):} Runs the CG iteration to solve $Ax = b$. Here b is the right-hand side vector (input), and x is the solution vector (output). You should allocate both of these arrays of length n beforehand. The solver will iterate until it converges to a solution within some default tolerance or until it reaches an iteration limit. On input, you may put an initial guess in x. If not, it’s safe to initialize x to zero (the solver will start from $x_0 = 0$ by default in that case). Upon return, x will contain the approximate solution. Return: 0 if the solution converged (or still 0 if it ran the maximum iterations – specific error codes might indicate divergence or other issues in future versions).
+
+\item \textbf{sparse\_cg\_destroy(handle):} Destroys the solver instance and releases any resources associated with the given handle. After this, the handle becomes invalid. Always call this when you are done solving to avoid memory leaks. Return: 0 on success (and the handle pointer may be set to NULL or invalid after). This API is non-blocking in the sense that internally ALP may overlap operations (like sparse matrix-vector multiplications and vector updates) and use asynchronous execution for performance. However, the above functions themselves appear synchronous. For example, sparse\_cg\_solve will only return after the solve is complete (there’s no separate “wait” call exposed in this C interface). The benefit of ALP’s approach is that you, the developer, don’t need to manage threads or message passing at all ALP’s GraphBLAS engine handles parallelism behind the scenes. You just call these routines as you would any standard library. Now, let’s put these functions into practice with a concrete example.
+\end{itemize}
+
+This API is non-blocking in the sense that internally ALP may overlap operations (like sparse matrix-vector multiplications and vector updates) and use asynchronous execution for performance. However, the above functions themselves appear synchronous. For example, sparse\_cg\_solve will only return after the solve is complete (there’s no separate “wait” call exposed in this C interface). The benefit of ALP’s approach is that you, the developer, don’t need to manage threads or message passing at all. ALP’s GraphBLAS engine handles parallelism behind the scenes. You just call these routines as you would any standard library. Now, let’s put these functions into practice with a concrete example.
+
+  
+\section*{Example: Solving a Linear System with ALP’s CG Solver}
+
+Suppose we want to solve a small system $Ax = b$ to familiarize ourselves with the CG interface. We will use the following $3 \times 3$ symmetric positive-definite matrix $A$: $$ A = \begin{pmatrix} 4 & 1 & 0\\ 
+        1 & 3 & -1\\ 
+        0 & -1 & 2 \end{pmatrix}, $$ 
+        and we choose a right-hand side vector $b$ such that the true solution is easy to verify. If we take the solution to be $x = (1,\;2,\;3)$, then $b = A x$ can be calculated as: $$ b = \begin{pmatrix}6 \ 4 \ 4 \end{pmatrix}, $$ since $4\cdot1 + 1\cdot2 + 0\cdot3 = 6$, $1\cdot1 + 3\cdot2 + (-1)\cdot3 = 4$, and $0\cdot1 + (-1)\cdot2 + 2\cdot3 = 4$. Our goal is to see if the CG solver recovers $x = (1,2,3)$ from $A$ and $b$. 
+        
+We will hard-code $A$ in CRS format (also called CSR: Compressed Sparse Row) for the solver. In CRS, the matrix is stored by rows, using parallel arrays for values and column indices, plus an offset index for where each row starts. For matrix $A$ above:
+
+
+
+\begin{itemize}
+\item Nonzero values in row 0 are [4, 1] (at columns 0 and 1), in row 1 are [1, 3, -1] (at cols 0,1,2), and in row 2 are [-1, 2] (at cols 1,2). So the vals array will be \{4, 1, 1, 3, -1, -1, 2\} (concatenated by row).
+\item Corresponding column indices cols would be \{0, 1, 0, 1, 2, 1, 2\} aligned with each value.
+\item The offsets offs marking the start of each row’s data in the vals array would be: row0 starts at index 0, row1 at index 2, row2 at index 5, and an extra final entry = 7 (total number of nonzeros). So offs = \{0, 2, 5, 7\}.
+\end{itemize}
+
+
+Below is a complete program using ALP’s CG solver to solve for $x$. We include the necessary ALP header
+for the solver API, set up the matrix and vectors, call the API functions in order, and then print the result.
+
+\begin{lstlisting}[ label=lst:example, showstringspaces=false]
+
+#include <stdio.h>
+#include <stdlib.h>
+
+// Include ALPs solver API header
+#include <transition/solver.h> // (path may vary based on installation)
+
+int main(){
+    // Define the 3x3 test matrix in CRS format
+    const size_t n = 3;
+    
+    double A_vals[] = {
+        4.0, 1.0, // row 0 values
+        1.0, 3.0, -1.0, // row 1 values
+        -1.0, 2.0 // row 2 values
+    };
+    
+    int A_cols[] = {
+        0, 1, // row 0 column indices
+        0, 1, 2, // row 1 column indices
+        1, 2 // row 2 column indices
+    };
+    
+    int A_offs[] = { 0, 2, 5, 7 }; // row start offsets: 0,2,5 and total nnz=7
+    
+    // Right-hand side b and solution vector x
+    double b[] = { 6.0, 4.0, 4.0 }; // b = A * [1,2,3]^T
+    double x[] = { 0.0, 0.0, 0.0 }; // initial guess x=0 (will hold the solution)
+    
+    // Solver handle
+    sparse_cg_handle_t handle;
+    
+    int err = sparse_cg_init_dii(&handle, n, A_vals, A_cols, A_offs);
+    if (err != 0) {
+        fprintf(stderr, "CG init failed with error %d\n", err);
+        return EXIT_FAILURE;
+    }
+    
+    // (Optional) set a preconditioner here if needed, e.g. Jacobi or others.
+    // We skip this, so no preconditioner (effectively M = Identity).
+    err = sparse_cg_solve_dii(handle, x, b);
+    if (err != 0) {
+        fprintf(stderr, "CG solve failed with error %d\n", err);
+        // Destroy handle before exiting
+        sparse_cg_destroy_dii(handle);
+        return EXIT_FAILURE;
+    }
+    
+    // Print the solution vector x
+    printf("Solution x = [%.2f, %.2f, %.2f]\n", x[0], x[1], x[2]);
+    
+    // Clean up
+    sparse_cg_destroy_dii(handle);
+    
+    return 0;
+}
+
+
+\end{lstlisting}
+
+Let’s break down what happens here:
+
+\begin{itemize}
+
+    \item We included <graphblas/solver.h> (the exact include path might be alp/solver.h or similar depending on ALP’s install, but typically it resides in the GraphBLAS include directory of ALP). This header defines the sparse\_cg\_* functions and the \textbf{sparse\_cg\_handle\_t} type.
+    
+    \item We set up the matrix data in CRS format. For clarity, the values and indices are grouped by row in the code. The offsets array \{0,2,5,7\} indicates: row0 uses vals[0..1], row1 uses vals[2..4] , row2 uses vals[5..6]. The matrix dimension n is 3.
+    
+    \item We prepare the vectors b and x. b is initialized to \{6,4,4\} as computed above. We initialize x to all zeros (as a starting guess). In a real scenario, you could start from a different guess, but zero is a common default.
+    
+    \item We create a \textbf{sparse\_cg\_handle\_t} and call {sparse\_cg\_init}. This hands the matrix to ALP’s solver. Under the hood, ALP will likely copy or reference this data and possibly analyze $A$ for the CG algorithm. We check the return code err, if non-zero, we print an error and exit. (For example, an error might occur if n or the offsets are inconsistent. In our case, it should succeed with err == 0.)
+    
+    \item We do not call \textbf{sparse\_cg\_set\_preconditioner} in this example, which means the CG will run unpreconditioned. If we wanted to, we could implement a simple preconditioner. For instance, a Jacobi preconditioner would use the diagonal of $A$: we’d create an array with $\text{diag}(A) = [4,3,2]$ and a function to divide the residual by this diagonal. We would then call \textbf{sparse\_cg\_set\_preconditioner(handle, my\_prec\_func, diag\_data)}. For brevity, we skip this. ALP will just use the identity preconditioner by default (no acceleration).
+    
+    \item Next, we call \textbf{sparse\_cg\_solve(handle, x, b)}. ALP will iterate internally to solve $Ax=b$. When this function returns, x should contain the solution. We again check err. A non-zero code could indicate the solver failed to converge (though typically it would still return 0 and one would check convergence via a status or residual, ALP’s API may evolve to provide more info). In our small case, it should converge in at most 3 iterations since $A$ is $3\times3$.
+    
+    \item We print the resulting x. We expect to see something close to [1.00, 2.00, 3.00]. Because our matrix and $b$ were consistent with an exact solution of $(1,2,3)$, the CG method should find that exactly (within floating-point rounding). You can compare this output with the known true solution to verify the solver worked correctly.
+    
+    \item Finally, we call \textbf{sparse\_cg\_destroy(handle)} to free ALP’s resources for the solver. This is important especially for larger problems to avoid memory leaks. After this, we return from main.
+    
+\end{itemize}
+
+
+
+\section*{Building and Running the Example}
+To compile the above code with ALP, we will use the direct linking option as discussed. 
+\begin{lstlisting}[language=bash]
+g++ example.cpp -o cg_demo \
+  -I install/include \
+  -L install/lib \
+  -L install/lib/sequential \
+  -Wl,-rpath,$PWD/install/lib/sequential \
+  -lspsolver_shmem_parallel \
+  -lalp_cspblas_shmem_parallel \
+  -lgraphblas \
+  -lalp_utils \
+  -lksolver \
+  -fopenmp \
+  -lnuma \
+  -no-pie
+\end{lstlisting}
+After a successful compile, you can run the program:
+    \begin{lstlisting}[language=bash]
+    ./cg_demo
+    \end{lstlisting}
+It should output something like:
+    \begin{lstlisting}[language=bash]
+    Solution x = [1.00, 2.00, 3.00]
+    \end{lstlisting}
+
+
+
+Congratulations! You’ve now written and executed a Conjugate Gradient solver using ALP’s transition path
+API! By using this C-style interface, we got the benefits of ALP’s high-performance GraphBLAS engine
+without having to dive into template programming or parallelization details. From here, you can explore
+other parts of ALP’s API: for instance, ALP/GraphBLAS provides a full GraphBLAS-compatible API for linear
+algebra operations on vectors and matrices, and ALP also offers a Pregel-like graph processing interface. All
+of these can be integrated with existing code in a similar fashion (just link against ALP’s libraries) [2][3].
+
+
+In summary, ALP’s transition paths enable a smooth adoption of advanced HPC algorithms – you write
+standard code, and ALP handles the rest. We focused on the CG solver, but the same idea applies to other
+supported interfaces. Feel free to experiment with different matrices, add a custom preconditioner function
+to see its effect, or try other solvers if ALP introduces them in future releases. Happy coding!
+
+    
+
+
+\bibliographystyle{plain}
+\begin{thebibliography}{9}
+
+\bibitem{alp-transition}
+ALP v0.8 Transition Path Overview. \\
+\url{http://albert-jan.yzelman.net/alp/v0.8-preview/group__TRANS.html}
+
+\bibitem{alp-docs}
+ALP Documentation and Examples. GitHub. \\
+\url{https://github.com/Algebraic-Programming/ALP}
+
+\bibitem{alp-graphblas-use}
+Use ALP GraphBLAS in Your Project. GitHub Docs. \\
+\url{https://github.com/Algebraic-Programming/ALP/blob/develop/docs/Use_ALPGraphBLAS_in_your_own_project.md}
+
+\bibitem{alp-transition-use}
+Using ALP Transition Interfaces. GitHub Docs. \\
+\url{https://github.com/Algebraic-Programming/ALP/blob/develop/docs/Transition_use.md}
+
+\end{thebibliography}