Skip to content

Commit ea1a9b0

Browse files
Update Libraries/oneMKL/sparse_conjugate_gradient/README.md
Co-authored-by: Nicolas Offermans <[email protected]>
1 parent 1753a4e commit ea1a9b0

File tree

1 file changed

+1
-1
lines changed
  • Libraries/oneMKL/sparse_conjugate_gradient

1 file changed

+1
-1
lines changed

Libraries/oneMKL/sparse_conjugate_gradient/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Sparse Conjugate Gradient uses oneMKL sparse linear algebra routines to solve a
1717
This sample performs its computations on the default SYCL* device. You can set the `SYCL_DEVICE_TYPE` environment variable to `cpu` or `gpu` to select the device to use.
1818

1919
## Key Implementation Details
20-
oneMKL sparse routines use a two-stage method where the sparse matrix is analyzed to prepare subsequent calculations (the _optimize_ step). Sparse matrix-vector multiplication and triangular solves (`gemv` and `trsv`) are used to implement the main loop, along with vector routines from BLAS. Two implementations are provided: The first implementation, in `sparse_cg.cpp`, has several places where a device to host copy and wait are initiated to allow the alpha and beta coefficients to be initiated in the BLAS vector routines as host scalars. The second implementation, in `sparse_cg2.cpp`, keeps the coefficients for alpha and beta on the device, which require that custom axpby2 and axpy3 fucntions are written to handle the construction of alpha and beta coefficients on-the-fly from the device. This removes some of the synchronization points that are seen in the first implementation.
20+
oneMKL sparse routines use a two-stage method where the sparse matrix is analyzed to prepare subsequent calculations (the _optimize_ step). Sparse matrix-vector multiplication and triangular solves (`gemv` and `trsv`) are used to implement the main loop, along with vector routines from BLAS. Two implementations are provided: The first implementation, in `sparse_cg.cpp`, has several places where a device to host copy and wait are initiated to allow the alpha and beta coefficients to be initiated in the BLAS vector routines as host scalars. The second implementation, in `sparse_cg2.cpp`, keeps the coefficients for alpha and beta on the device, which require that custom axpby2 and axpy3 functions are written to handle the construction of alpha and beta coefficients on-the-fly from the device. This removes some of the synchronization points that are seen in the first implementation.
2121

2222
## Using Visual Studio Code* (Optional)
2323
You can use Visual Studio Code (VS Code) extensions to set your environment, create launch configurations,

0 commit comments

Comments
 (0)