Design

Aims

separate data layout from interface
make it easier to use smaller pieces of code
make the code / physics easier to understand
separate vertical and horizontal where possible:
- horizontal:
  - will be spectral element
  - use quad elements with LGL quadrature
  - duplicate values at colocated nodes (supporting both DG and CG)
  - either regular grid (LES box), cubed sphere (GCM) or potentially arbitrary conformal mesh (e.g. for ocean and land)
    - support for loading an arbitrary 2D grid from standard mesh formats (GMSH)
  - can be distributed
- vertical
  - nodes will be vertical (i.e. radial on a sphere) aligned
  - should support a variety of discretizations (e.g. spectral element, FV, staggered grids)
  - support both duplicated or shared nodes for CG
  - implicit solves in the vertical
Support for 2D horizontal surfaces (e.g. boundary conditions, barotropic modes in ocean models, etc.)
Make interface more functional:
- i.e. users should write functions that receive and return immutable objects, rather than in-place modification

First steps plan

Data backends for 3D/2D/column/pancake

StructArray or roll our own?
Unit tests + GPU

Horizontal grid structure

topology
metrics
DSS

Horizontal operators

horiz gradient
horiz divergence
broadcasting

Basic examples

Poisson equation / Heat equation
- regular 2D mesh
- irregular 2D mesh
Wave equation

Column grid structure

1D tests of column
3D tests of both combined

Basic considerations

We only need to consider 2D topologies
- i.e. connectivity is the same for all elements in a stack
Aim for deep atmosphere
- optional support for shallow atmosphere shouldn't be too difficult: modify metric terms and Coriolis terms
Remainder model
- becomes a splitting of the vertical
- needs to only be vertical-only

Interfaces

column(values, i, j, h)
- view that can be indexed by [k, v], returning a struct
- used for everything vertical (divergence, gradient, integration, implicit solves, search up/down/bisect, interpolated view)
- for 3d fields, this will act like a function
pancake(values, k, v, h)
view that can be indexed by [i,j], returning a struct
face(pancake, face#) => view indexed by [i] (for horz num fluxes)
opposing(face) will give the face opposing the element
elements (read only)
- vizualization, VTK export
key assumptions:
- columns never access horizontal data other than gradients, etc, which can be computed before being passed in
- horizontal terms only act via derivatives (divergence / gradients / curl)
broadcasting (incl. 2D horz -> 3D fields), lazy variant
- reduction (integrate 3D -> 2D)

potential layouts

(some we will want, flexibility to add more)

GPU: thread model 1 thread per (i,j) within an element
CPU: ?
- 1 thread per pancake
- 1 thread per column
i,j horizontal node indices
k vertical node index
field: field index (to be figured out)
v: vertical element index
h: horizontal element index (i.e. uniquely identifies an element stack) (a) integer for arbitrary mesh - how to handle real vs ghost elements? (b) (hx,hy) for regular grid (LES) (c) (hx,hy,hface) for cubed sphere

TODO

How to handle (expensive) column physics on duplicated columns?
primary -> secondary sharing mechanism:
duplicate work
duplicate across processes
ultimately depends on threading model
How to handle columns that use different vertical staggering?
Interpolation:
1. what is h?
- regular grid/cubed sphere we can compute a direct lookup
- arbitrary mesh build some quad tree
1. what is v?
- linear lookup (ideally)
1. compute coordinate in reference element
2. barycentric interpolation
  - weighted average of all nodes in an element
  - if horizontal-only, can do it via a pancake

Communication

only certain fields require communication (e.g. tendencies, grads in DG)
can reuse communication buffers for efficiency
load data directly from recv buffers
data buffer

DG

opposing face 4 possible faces * 2 orientations

for a ghost element has only one face store whole faces contiguously (duplicate shared nodes if necessary)

elemtoelem[f,h] => if > 0, real element, if <0 then its the ghost face #
elemtoface[g,h] => if real element, 1:4 positive orienataion, -4:-1 flipped orientation; ghost face 1/-1

CG

store columns contiguous, based on the logical order for the sender

don't duplicate nodes unique numbering of columns on a given rank
if positive, real column, use linear indexing c = NqNq(h-1) + Nq*(j-1) + i
if < 0, ghost column, look up in the receive buffer

Stuff

Data: a value stored at each node
Grid: geometric information
- physical location
- metric terms
- face normals (need for DG/boundaries); can be computed from metrics if required
- stores its information using a Data object
Topology (move out of Grid)
- connection between horizontal
  - DG opposing horizontal faces and orientation
  - CG which horizontal nodes are collacated (0 for internal, 1 for non-vertex faces, 2+ for vertices)
- communication / rank partitioning
Field: Data + Grid
- map
- vertical reduce (integral)
- broadcast + lazy broadcast (e.g. combine 3D field, 2D field, scalar)
- grad, div, curl (split into horizontal/vertical)
- use a StructArray (or something that acts like an array of structs)
  - main limitation: can only be something take spatial derivatives of
  - requires scalar multiplication + addition

Fields

idea field element type can be one of:

real scalars (floating point numbers)
vector coefficient types (basically coeffients of a vector space)
- CartesianVector(x1,x2,x3)
- SphericalCartesian(u,v,w)
- Contravariant/Covariant wrt the reference element
- Contravariant1/Contravariant2/Contravariant3 projections onto contravariant
- to convert between these, you need geometry information SphericalCartesian(CartesianVector(1,2,3), geom)
tensors?
composite object (tuples or named tuples of the above)
- support for Tuples/NamedTuples
- add support for custom structs?
- should not support vector coefficients to avoid e.g. adding two SphericalCartesian objects at different locations

Composable operations

Can we compose kernels by combining multiple operations, such as broadcasting and tensor product operations.

gradient:
1. extract/ compute X to shared memory
2. synchronize
3. tensor product
```
A1 = (D ⊗ I) * X
A2 = (I ⊗ D) * X
```
4. combine
```
∂ξ∂x .* Covariant(A1,A2)
```

strong divergence

map to shared memory

C1 = J .* contravariant1(V)
C2 = J .* contravariant1(V)

synchronize

tensor on each matrix

D1 = (D ⊗ I) * C1
D2 = (I ⊗ D) * C2

combine
```
(D1 .+ D2) ./ J
```

filtering / interpolation
1. compute/load to shared memory
2. synchronize
3. tensor dimension 1 to shared meory
```
A = (F ⊗ I) * X
```
4. synchronize
5. tensor dimension 2
```
Y = (I ⊗ F) * A
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Design

Design

Aims

First steps plan

Basic considerations

Interfaces

potential layouts

TODO

Communication

DG

CG

Stuff

Fields

Composable operations

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally