Skip to content

Commit 342bb32

Browse files
committed
Add design section to tridiagonal solver design doc
1 parent 3157a4a commit 342bb32

File tree

1 file changed

+110
-10
lines changed

1 file changed

+110
-10
lines changed

components/omega/doc/design/TridiagonalSolver.md

Lines changed: 110 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -22,26 +22,27 @@ This requires specialized algorithms that can handle this situation stably.
2222

2323
### 2.1 Requirement: Modularity
2424

25-
There should be one implementation of batched tridiagonal solver that can be called every time a solution to such a system is needed.
25+
There should be one module that provides tridiagonal solvers that are sufficiently general to handle every need in Omega.
2626

2727
### 2.2 Requirement: Stability for vertical mixing problems
2828

29-
The solver must be stable for vertical mixing problems with large variations in mixing coefficients.
29+
A specialized solver that is stable for vertical mixing problems with large variations in mixing coefficients must be
30+
provided.
3031

3132
### 2.3 Requirement: Bottom boundary conditions
3233

33-
When applied to the implicit vertical mixing of momentum, the solver should be
34+
When applied to the implicit vertical mixing of momentum, the specialized diffusion solver should be
3435
sufficiently general to be able to incorporate various bottom drag formulations.
3536

3637
### 2.4 Requirement: Performance
3738

38-
The solver must be performant on CPU and GPU architectures.
39+
All solvers must be performant on CPU and GPU architectures. On CPUs this means supporting vectorization.
3940

4041
### 2.5 Desired: Ability to fuse kernels
4142

4243
Implicit vertical mixing will require some pre-processing (e.g. setup of the system) and post-processing work.
4344
It is desirable to handle all of that in one computational kernel. This requires the ability to call the
44-
solver inside a `parallelFor`.
45+
solvers inside a `parallelFor`.
4546

4647
## 3 Algorithmic Formulation
4748
A general tridiagonal system has the form:
@@ -173,15 +174,114 @@ These two equations form the basis of stable (parallel) cyclic reduction for dif
173174

174175
## 4 Design
175176

176-
TODO
177+
Four different algorithms will be implemented:
178+
- Thomas algorithm for general tridiagonal systems on CPUs
179+
- PCR algorithm for general tridiagonal systems on GPUs
180+
- Thomas algorithm for diffusion systems with improved stability properties on CPUs
181+
- PCR algorithm for diffusion systems with improved stability properties on GPUs
182+
183+
The algorithms will be designed to work within Kokkos team policies, with each team of threads
184+
solving one column system on GPUs and `VecLength` column systems on CPUs.
185+
The user interface for CPU and GPU solvers will be the same. There will be
186+
two type aliases `TriDiagSolver` and `TriDiagDiffSolver` that will resolve to the
187+
optimal solver class based on the chosen architecture.
188+
189+
### 4.1 Data types and parameters
190+
191+
#### 4.1.1 Parameters
192+
193+
No parameters are required.
194+
195+
#### 4.1.2 Class/structs/data types
196+
197+
There will be a solver struct for each algorithm. The solvers inputs and outputs will be encapsulated in
198+
two scratch data structs, one for the general solver and one for the diffusion solver.
199+
200+
#### 4.1.2.1 Scratch data structs
201+
202+
To facilitate constructing the systems on the fly and for performance reasons the solver inputs and outputs will be using the Kokkos scratch memory space. The scratch data for the general tridiagonal solver will be encapsulated in a struct called `TriDiagScratch`
203+
```c++
204+
struct TriDiagScratch {
205+
ScratchArray2DReal DL;
206+
ScratchArray2DReal D;
207+
ScratchArray2DReal DU;
208+
ScratchArray2DReal X;
209+
};
210+
```
211+
where `DL`, `D`, `DU`, `X` are views of size (`NRow`, `VecLength`) in the scratch memory space. The views
212+
`DL`, `D`, `DU` are inputs denoting the lower, main, and upper diagonal, respectively. The view `X` should contain
213+
the rhs at input and will be overwritten with the solution after `solve` is called.
214+
215+
The scratch data for the specialized diffusion tridiagonal solver will be encapsulated in a struct called `TriDiagDiffScratch`
216+
```c++
217+
struct TriDiagDiffScratch {
218+
ScratchArray2DReal G;
219+
ScratchArray2DReal H;
220+
ScratchArray2DReal X;
221+
ScratchArray2DReal Alpha;
222+
};
223+
```
224+
where `G`, `H`, `X`, `Alpha` are views of size (`NRow`, `VecLength`) in the scratch memory space. The views
225+
`G` and `H` are inputs corresponding to the variables `g` and `h` introduced in [Section 3](#3-algorithmic-formulation).
226+
The view `X` has the same meaning as in the general case.
227+
The view `Alpha` is an internal workspace used by the algorithm.
228+
229+
#### 4.1.2.2 Solver structs
230+
231+
The four solver algorithms will be implemented as four structs `ThomasSolver`, `PCRSolver`, `ThomasDiffusionSolver`, and `PCRDiffusionSolver`. Currently, there is no plan for those structs to have any data members and they will only provide static methods, acting essentially as namespaces.
232+
233+
### 4.2 Methods
234+
235+
#### 4.2.1 Scratch Constructors
236+
The constructors of scratch spaces take a team member and system size (`NRow`)
237+
```c++
238+
TriDiagScratch(const TeamMember &Member, int NRow);
239+
TriDiagDiffScratch(const TeamMember &Member, int NRow);
240+
```
241+
242+
#### 4.2.2 Policy creation
243+
Every solver will provide a static method
244+
```c++
245+
static TeamPolicy makeTeamPolicy(int NBatch, int NRow);
246+
```
247+
that creates an appropriate team policy for solving `NBatch` systems of size `NRow`.
248+
249+
#### 4.2.3 Solve Methods
250+
The general solvers `ThomasSolver` and `PCRSolver` will have a static solve method
251+
```c++
252+
static void solve(const TeamMember &Member, const TriDiagScratch &Scratch);
253+
```
254+
that takes a team member and an initialized general scratch space. After calling this method `Scratch.X`
255+
will contain the solution. There will also be a convenience method
256+
```c++
257+
static void solve(const TeamMember &Member,
258+
const Array2DReal &DL, const Array2DReal &D, const Array2DReal &DU, const Array2DReal &X);
259+
```
260+
that loads the inputs from global arrays.
261+
262+
The diffusion solvers `ThomasDiffusionSolver` and `PCRDiffusionSolver` will provide a similar method
263+
```c++
264+
static void solve(const TeamMember &Member, const TriDiagDiffScratch &Scratch);
265+
```
266+
differing only in the type of scratch space. Similarly, there will be a convenience method
267+
```c++
268+
static void solve(const TeamMember &Member,
269+
const Array2DReal &G, const Array2DReal &H, const Array2DReal &X);
270+
```
271+
which loads the inputs from global arrays.
177272

178273
## 5 Verification and Testing
179274

180-
### 5.1 Test correctness
275+
### 5.1 Test solvers correctness using prescribed matrix
181276

182-
The solver will be compared against exact solution for a variety of (batch size, system size) combinations.
277+
Given analytically prescribed matrix `A` and vector `y` the solution to the problem `A x = z` with `z = A y` will be
278+
checked to see if the resulting `x` is equal to the prescribed vector `y`. This will be done for all solvers for a variety of (batch size, system size) combinations.
183279

184-
### 5.2 Test stability
280+
### 5.2 Test diffusion solvers convergence using manufactured solution
185281

186-
The solver stability will be tested on an idealized vertical mixing problem with abrupt changes in
282+
The convergence of diffusion solvers will be tested using a manufactured solution.
283+
284+
### 5.3 Test stability
285+
286+
The diffusion solvers stability will be tested on an idealized vertical mixing problem with abrupt changes in
187287
the diffusion coefficient.

0 commit comments

Comments
 (0)