Skip to content

Commit fa27e44

Browse files
authored
Merge pull request oneapi-src#2543 from spencerpatty/dev/spatty/update_pcg_usm
[oneMKL][Sparse BLAS] Add preconditioned conjugate gradient in USM with two examples
2 parents 5be6adc + 12748ba commit fa27e44

File tree

7 files changed

+1254
-225
lines changed

7 files changed

+1254
-225
lines changed

Libraries/oneMKL/sparse_conjugate_gradient/GNUmakefile

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,9 @@ default: run
44

55
all: run
66

7-
run: sparse_cg
7+
run: sparse_cg sparse_cg2
88
./sparse_cg
9+
./sparse_cg2
910

1011
MKL_COPTS = -DMKL_ILP64 -qmkl -qmkl-sycl-impl="blas,sparse"
1112

@@ -14,7 +15,10 @@ DPCPP_OPTS = $(MKL_COPTS) -fsycl-device-code-split=per_kernel
1415
sparse_cg: sparse_cg.cpp
1516
icpx $< -fsycl -o $@ $(DPCPP_OPTS)
1617

18+
sparse_cg2: sparse_cg2.cpp
19+
icpx $< -fsycl -o $@ $(DPCPP_OPTS)
20+
1721
clean:
18-
-rm -f sparse_cg genxir
22+
-rm -f sparse_cg sparse_cg2 genxir
1923

2024
.PHONY: clean run all

Libraries/oneMKL/sparse_conjugate_gradient/README.md

Lines changed: 111 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ Sparse Conjugate Gradient uses oneMKL sparse linear algebra routines to solve a
1717
This sample performs its computations on the default SYCL* device. You can set the `SYCL_DEVICE_TYPE` environment variable to `cpu` or `gpu` to select the device to use.
1818

1919
## Key Implementation Details
20-
oneMKL sparse routines use a two-stage method where the sparse matrix is analyzed to prepare subsequent calculations (the _optimize_ step). Sparse matrix-vector multiplication and triangular solves (`gemv` and `trsv`) are used to implement the main loop, along with vector routines from BLAS.
20+
oneMKL sparse routines use a two-stage method where the sparse matrix is analyzed to prepare subsequent calculations (the _optimize_ step). Sparse matrix-vector multiplication and triangular solves (`gemv` and `trsv`) are used to implement the main loop, along with vector routines from BLAS. Two implementations are provided: The first implementation, in `sparse_cg.cpp`, has several places where a device to host copy and wait are initiated to allow the alpha and beta coefficients to be initiated in the BLAS vector routines as host scalars. The second implementation, in `sparse_cg2.cpp`, keeps the coefficients for alpha and beta on the device, which require that custom axpby2 and axpy3 functions are written to handle the construction of alpha and beta coefficients on-the-fly from the device. This removes some of the synchronization points that are seen in the first implementation.
2121

2222
## Using Visual Studio Code* (Optional)
2323
You can use Visual Studio Code (VS Code) extensions to set your environment, create launch configurations,
@@ -62,12 +62,13 @@ Run `nmake` to build and run the sample. `nmake clean` removes temporary files.
6262
## Running the Sparse Conjugate Gradient Sample
6363

6464
### Example of Output
65-
If everything is working correctly, the example program will rapidly converge to a solution and display the solution vector's first few entries. The test will run in both single and double precision (if available on the selected device).
65+
If everything is working correctly, the example programs will rapidly converge to a solution. Each test will run in both single and double precision (if available on the selected device).
6666

67+
The first PCG implementation with host side coefficients:
6768
```
6869
./sparse_cg
6970
########################################################################
70-
# Sparse Conjugate Gradient Solver
71+
# Sparse Preconditioned Conjugate Gradient Solver with USM
7172
#
7273
# Uses the preconditioned conjugate gradient algorithm to
7374
# iteratively solve the symmetric linear system
@@ -79,36 +80,118 @@ If everything is working correctly, the example program will rapidly converge to
7980
#
8081
# Uses the symmetric Gauss-Seidel preconditioner.
8182
#
83+
# alpha and beta constants in PCG algorithm are host side.
84+
#
8285
########################################################################
8386
84-
Running tests on Intel(R) Gen9 HD Graphics NEO.
87+
Running tests on Intel(R) Data Center GPU Max 1550.
8588
Running with single precision real data type:
86-
relative norm of residual on 1 iteration: 0.0856119
87-
relative norm of residual on 2 iteration: 0.00204826
88-
relative norm of residual on 3 iterations: 6.68015e-05
89-
90-
Preconditioned CG process has successfully converged, and
91-
the following solution has been obtained:
92-
93-
x[0] = 0.0666633
94-
x[1] = 0.0835483
95-
x[2] = 0.0835491
96-
x[3] = 0.0666627
97-
...
89+
90+
sparse PCG parameters:
91+
A size: (4096, 4096)
92+
Preconditioner = Symmetric Gauss-Seidel
93+
max iterations = 500
94+
relative tolerance limit = 1e-05
95+
absolute tolerance limit = 0.0005
96+
relative norm of residual on 1 iteration: 0.178532
97+
relative norm of residual on 2 iteration: 0.0280123
98+
relative norm of residual on 3 iteration: 0.0048948
99+
relative norm of residual on 4 iteration: 0.000796108
100+
relative norm of residual on 5 iteration: 0.000119025
101+
relative norm of residual on 6 iteration: 1.86945e-05
102+
absolute norm of residual on 6 iteration: 0.000149556
103+
104+
Preconditioned CG process has successfully converged in absolute error in 6 steps with
105+
relative error ||r||_2 / ||r_0||_2 = 1.86945e-05 > 1e-05
106+
absolute error ||r||_2 = 0.000149556 < 0.0005
107+
98108
Running with double precision real data type:
99-
relative norm of residual on 1 iteration: 0.0856119
100-
relative norm of residual on 2 iteration: 0.00204827
101-
relative norm of residual on 3 iteration: 6.68017e-05
102-
103-
Preconditioned CG process has successfully converged, and
104-
the following solution has been obtained:
105-
106-
x[0] = 0.0666633
107-
x[1] = 0.0835483
108-
x[2] = 0.0835491
109-
x[3] = 0.0666627
110-
...
109+
110+
sparse PCG parameters:
111+
A size: (4096, 4096)
112+
Preconditioner = Symmetric Gauss-Seidel
113+
max iterations = 500
114+
relative tolerance limit = 1e-05
115+
absolute tolerance limit = 0.0005
116+
relative norm of residual on 1 iteration: 0.178532
117+
relative norm of residual on 2 iteration: 0.0280123
118+
relative norm of residual on 3 iteration: 0.0048948
119+
relative norm of residual on 4 iteration: 0.000796108
120+
relative norm of residual on 5 iteration: 0.000119025
121+
relative norm of residual on 6 iteration: 1.86945e-05
122+
absolute norm of residual on 6 iteration: 0.000149556
123+
124+
Preconditioned CG process has successfully converged in absolute error in 6 steps with
125+
relative error ||r||_2 / ||r_0||_2 = 1.86945e-05 > 1e-05
126+
absolute error ||r||_2 = 0.000149556 < 0.0005
127+
128+
```
129+
130+
and the second PCG implementation with device side coefficients:
111131
```
132+
./sparse_cg2
133+
########################################################################
134+
# Sparse Preconditioned Conjugate Gradient Solver with USM 2
135+
#
136+
# Uses the preconditioned conjugate gradient algorithm to
137+
# iteratively solve the symmetric linear system
138+
#
139+
# A * x = b
140+
#
141+
# where A is a symmetric sparse matrix in CSR format, and
142+
# x and b are dense vectors.
143+
#
144+
# Uses the symmetric Gauss-Seidel preconditioner.
145+
#
146+
# alpha and beta constants in PCG algorithm are kept
147+
# device side.
148+
#
149+
########################################################################
150+
151+
Running tests on Intel(R) Data Center GPU Max 1550.
152+
Running with single precision real data type:
153+
154+
sparse PCG parameters:
155+
A size: (4096, 4096)
156+
Preconditioner = Symmetric Gauss-Seidel
157+
max iterations = 500
158+
relative tolerance limit = 1e-05
159+
absolute tolerance limit = 0.0005
160+
relative norm of residual on 1 iteration: 0.178532
161+
relative norm of residual on 2 iteration: 0.0280123
162+
relative norm of residual on 3 iteration: 0.0048948
163+
relative norm of residual on 4 iteration: 0.000796109
164+
relative norm of residual on 5 iteration: 0.000119025
165+
relative norm of residual on 6 iteration: 1.86945e-05
166+
absolute norm of residual on 6 iteration: 0.000149556
167+
168+
Preconditioned CG process has successfully converged in absolute error in 6 steps with
169+
relative error ||r||_2 / ||r_0||_2 = 1.86945e-05 > 1e-05
170+
absolute error ||r||_2 = 0.000149556 < 0.0005
171+
172+
Running with double precision real data type:
173+
174+
sparse PCG parameters:
175+
A size: (4096, 4096)
176+
Preconditioner = Symmetric Gauss-Seidel
177+
max iterations = 500
178+
relative tolerance limit = 1e-05
179+
absolute tolerance limit = 0.0005
180+
relative norm of residual on 1 iteration: 0.178532
181+
relative norm of residual on 2 iteration: 0.0280123
182+
relative norm of residual on 3 iteration: 0.0048948
183+
relative norm of residual on 4 iteration: 0.000796108
184+
relative norm of residual on 5 iteration: 0.000119025
185+
relative norm of residual on 6 iteration: 1.86945e-05
186+
absolute norm of residual on 6 iteration: 0.000149556
187+
188+
Preconditioned CG process has successfully converged in absolute error in 6 steps with
189+
relative error ||r||_2 / ||r_0||_2 = 1.86945e-05 > 1e-05
190+
absolute error ||r||_2 = 0.000149556 < 0.0005
191+
192+
```
193+
194+
112195

113196
### Troubleshooting
114197
If an error occurs, troubleshoot the problem using the Diagnostics Utility for Intel® oneAPI Toolkits.

Libraries/oneMKL/sparse_conjugate_gradient/makefile

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,20 @@ default: run
44

55
all: run
66

7-
run: sparse_cg.exe
7+
run: sparse_cg.exe sparse_cg2.exe
88
.\sparse_cg
9+
.\sparse_cg2
910

10-
DPCPP_OPTS=/I"$(MKLROOT)\include" /Qmkl /Qmkl-sycl-impl="blas,sparse" /EHsc -fsycl-device-code-split=per_kernel OpenCL.lib
11+
SYCL_OPTS=/I"$(MKLROOT)\include" /Qmkl /Qmkl-sycl-impl="blas,sparse" /EHsc -fsycl-device-code-split=per_kernel OpenCL.lib
1112

1213
sparse_cg.exe: sparse_cg.cpp
13-
icx-cl -fsycl sparse_cg.cpp /Fesparse_cg.exe $(DPCPP_OPTS)
14+
icx-cl -fsycl sparse_cg.cpp /Fesparse_cg.exe $(SYCL_OPTS)
15+
16+
sparse_cg2.exe: sparse_cg2.cpp
17+
icx-cl -fsycl sparse_cg2.cpp /Fesparse_cg2.exe $(SYCL_OPTS)
1418

1519
clean:
1620
del /q sparse_cg.exe sparse_cg.exp sparse_cg.lib
21+
del /q sparse_cg2.exe sparse_cg2.exp sparse_cg2.lib
1722

1823
pseudo: clean run all

Libraries/oneMKL/sparse_conjugate_gradient/sample.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
"guid": "3814C3C8-6CD1-40C9-A94B-CB4D4F58E2B9",
33
"name": "Sparse Conjugate Gradient",
44
"categories": ["Toolkit/oneAPI Libraries/oneMKL"],
5-
"description": "Solve Sparse linear systems with the Conjugate Gradient method using Intel® oneMKL sparse BLAS",
5+
"description": "Solve Sparse linear systems with the Conjugate Gradient method using Intel® oneMKL Sparse BLAS",
66
"toolchain": [ "dpcpp" ],
77
"dependencies": [ "mkl" ],
88
"languages": [ { "cpp": { "properties": { "projectOptions": [ { "projectType": "makefile" } ] } } } ],

0 commit comments

Comments
 (0)