Skip to content

Commit 98f2dee

Browse files
authored
Merge pull request #1 from 2decomp-fft/API_decomposition
Api decomposition
2 parents 31fc0c5 + 93881f5 commit 98f2dee

File tree

2 files changed

+234
-10
lines changed

2 files changed

+234
-10
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
build*

docs/source/pages/api_domain.rst

Lines changed: 233 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,24 +4,247 @@
44

55
This page explains the key public interfaces of the 2D decomposition library. After reading this section, users should be able to easily build applications using this domain decomposition strategy. The library interface is designed to be very simple. One can refer to the sample applications for a quick start.
66

7-
The 2D Pencil Decomposition API is defined in one Fortran module which should be used by applications:
7+
The 2D Pencil Decomposition API is defined in three Fortran module which should be used by applications as:
88

9-
``use decomp_2d``
9+
::
10+
11+
use decomp_2d_constants
12+
use decomp_2d_mpi
13+
use decomp_2d
14+
15+
The ``use decomp_2d_constants`` defines all the parameters, ``use decomp_2d_mpi`` introduces all the MPI
16+
related interfaces and ``use decomp_2d`` cointains the main decomposition and transposition APIs.
1017

11-
**Global Variables**
18+
**Module decomp_2d_constant: Global Variables**
19+
20+
The ``decomp_2d_constants`` cointains global parameters that used to define the KIND of floating
21+
point data (e.g. single or double precision).
22+
These are used to consistently define the precision of the data type for the viariables
23+
and for MPI operations.
24+
25+
* ``mytype`` - Use this variable to define the KIND of floating-point data,
26+
e.g. ``real(mytype) :: var`` or ``complex(mytype) :: cvar``.
27+
Depending on configuring options this type will point to single or double.
28+
29+
* ``real_type, complex_type`` - These are the proper MPI datatypes to be used
30+
(for real and complex numbers, respectively) if applications need to call MPI library routines directly.
31+
These types will point to single of double depending on the configuring options.
32+
33+
* ``real2_type`` - This type double the precision of the baseline ``real_type``.
34+
35+
* ``mytype_single, real_type_single`` - These two types are used to define the data type fpr the IO operations.
36+
37+
The module contains additional parameters to control :
38+
39+
* the log on output,
40+
41+
* the log on debug,
42+
43+
* activate the degugger (caliper)
1244

13-
Following is the list of global variables defined by the library that can be used in applications. Obviously these names should not be redefined in applications to avoid conflict. Also note that some variables contain duplicate or redundant information just to simplify the programming.
45+
* activate the different FFT backends (generic, FFTW, MKL, cuFFT)
1446

15-
* ``mytype`` - Use this variable to define the KIND of floating-point data, e.g. real(mytype) :: var or complex(mytype) :: cvar. This makes it easy to switch between single precision and double precision (more details).
47+
* define the release major and minor version
1648

17-
* ``real_type, complex_type`` - These are the proper MPI datatypes to be used (for real and complex numbers, respectively) if applications need to call MPI library routines directly.
49+
**Module decomp_2d_mpi: MPI communication**
50+
51+
The ``decomp_2d_mpi`` cointains global parameters that are used for MPI operation:
52+
53+
* ``nproc`` - the total number of MPI processes. [INT].
54+
55+
* ``nrank`` - the rank of the current MPI process. [INT].
56+
57+
* ``decomp_2d_comm`` - global MPI communicator [INT].
58+
59+
* ``decomp_2d_about`` - interface to display error message and call MPI_ABORT function.
60+
61+
* ``decomp_2d_warning`` - interface to display error message together with line number and function.
62+
63+
**Module decomp_2d: decompostion module**
64+
65+
The ``decomp_2d`` cointains the variables and the routines to perform the global transpostion operations.
66+
The important variables are
1867

1968
* ``nx_global, ny_global, nz_global`` - size of the global data.
2069

21-
* ``nproc`` - the total number of MPI processes.
70+
* ``xsize(i), ysize(i), zsize(i), i=1,2,3`` - sizes of the sub-domains held by the current process.
71+
The first letter refers to the pencil orientation and the three 1D array elements contain
72+
the sub-domain sizes in X, Y and Z directions, respectively.
73+
In a 2D pencil decomposition, there is always one dimension which completely resides in local memory.
74+
So by definition ``xsize(1)==nx_global``, ``ysize(2)==ny_global`` and ``zsize(3)==nz_global``.
75+
76+
* ``xstart(i), ystart(i), zstart(i), xend(i), yend(i), zend(i), i=1,2,3`` - the starting and ending indices
77+
for each sub-domain, as in the global coordinate system.
78+
Obviously, it can be seen that ``xsize(i)=xend(i)-xstart(i)+1``.
79+
It may be convenient for certain applications to use global coordinate
80+
(for example when extracting a 2D plane from a 3D domain,
81+
it is easier to know which process owns the plane if global index is used).
82+
83+
Decomposition informations are also available using the data type ``DECOMP_INFO`` which provides the
84+
following derived types:
85+
86+
* ``xst(i), yst(i), zst(i), i=1,2,3`` - the starting indices for each sub-domain, as in the global coordinate system.
87+
88+
* ``xen(i), yen(i), zen(i), i=1,2,3`` - the end indices for each sub-domain, as in the global coordinate system.
89+
90+
* ``xsz(i), ysz(i), zsz(i), i=1,2,3`` - the size for each sub-domain, as in the global coordinate system.
91+
92+
93+
Arrays can also be stored on smaller mesh sizes where points are skipped:
94+
95+
* ``iskipS, jskipS, kskipS`` - points skipped in the x, y and z direction for a generic scalar field
96+
97+
* ``iskipV, jskipV, kskipV`` - points skipped in the x, y and z direction for the velocity field
98+
99+
* ``iskipP, jskipP, kskipP`` - points skipped in the x, y and z direction for the pressure field
100+
101+
* ``xszS, yszS, zszS, xstS, ystS, zstS, xenS, yenS, zenS`` - size, starting and final indexes for the
102+
reduce size mesh for a generic scalar
103+
104+
* ``xszV, yszV, zszV, xstV, ystV, zstV, xenV, yenV, zenV`` - size, starting and final indexes for the
105+
reduce size mesh for the velocity field
106+
107+
* ``xszP, yszP, zszP, xstP, ystP, zstP, xenP, yenP, zenP`` - size, starting and final indexes for the
108+
reduce size mesh for the pressure field
109+
110+
The module provides memory allocations API that are recomended to correctly define its major data structures
111+
within the main program. It is recomended that all major arrays are defined as ``allocable``.
112+
113+
* ``alloc_x(var, decomp, global)`` - allocation using a x-pencil decomposition
114+
115+
* ``alloc_y(var, decomp, global)`` - allocation using a y-pencil decomposition
116+
117+
* ``alloc_y(var, decomp, global)`` - allocation using a z-pencil decomposition
118+
119+
where ``var`` is the allocable array name,
120+
``decomp[optional]`` the relative ``DECOMP_INFO`` data type and
121+
``global[optional]`` is a logical [True/False] flag to indicate if the array is allocated in the global coordinate system.
122+
The allocation for a x pencil decomposition would be equivalent to the statement:
123+
124+
::
125+
126+
allocate(var(decomp%xsz(1), decomp%xsz(2), decomp%xsz(3))) ! if global==.false.
127+
allocate(var(decomp%xst(1):decomp%xen(1), decomp%xst(2):decomp%xen(2), &
128+
decomp%xst(3):decomp%xen(3))) ! if global==.true.
129+
130+
Allocated arrays can be simply released with an ``deallocate(var)`` statement.
131+
132+
**Basic 2D Decomposition API**
133+
134+
All the global variables described above, the defualt common type ``decomp`` and the MPI initialization is done
135+
using the following call
136+
137+
::
138+
139+
call decomp_2d_init(nx, ny, nz, p_row, p_col)
140+
141+
where ``nx``, ``ny`` and ``nz`` are the size of 3D global data to be distributed over
142+
a 2D processor grid :math:`p_row \times p_col`.
143+
Note that none of the dimensions need to be divisible by ``p_row`` or ``p_col``, i.e. the library can handle non-evenly distributed data.
144+
In case of ``p_row=p_col=0`` an automatic decomposition is selected among all possible combination available.
145+
The algorithm will choose the closest combination such as
146+
147+
.. math::
148+
149+
n\_proc=n\_col=\sqrt{nproc}
150+
151+
In case the root is not exact the closest combitation to have :math:`n\_proc \approx n\_col` with
152+
`n\_proc < n\_col` is used.
153+
154+
An optional parameter may be passed to this initialisation routine:
155+
156+
::
157+
158+
call decomp_2d_init(nx, ny, nz, p_row, p_col,periodic_bc)
159+
160+
Here periodic_bc is a 1D array containing 3 logical values that specify whether periodic boundary condition
161+
should apply in certain dimensions. Note this is only applicable if halo-cell communication is to be used.
162+
163+
A key element of this library is a set of communication routines that actually perform the data transpositions.
164+
As mentioned, one needs to perform 4 global transpositions to go through all 3 pencil orientations.
165+
Correspondingly, the library provides 4 communication subroutines:
166+
167+
::
168+
169+
call transpose_x_to_y(var_in,var_out)
170+
call transpose_y_to_z(var_in,var_out)
171+
call transpose_z_to_y(var_in,var_out)
172+
call transpose_y_to_x(var_in,var_out)
173+
174+
The input array ``var_in`` and ``var_output`` array out should have been defined
175+
and contain distributed data for the correct pencil orientations.
176+
177+
Note that the library is written using Fortran's generic interface so different data types are supported
178+
without user intervention. That means in and out above can be either real arrays or complex arrays,
179+
the latter being useful for FFT-type of applications.
180+
181+
As seen, the communication details are packed within a black box. From a user's perspective,
182+
it is not necessary to understand the internal logic of these transposition routines.
183+
From the developer's perspective, he has the freedom to change the implementation without breaking user codes.
184+
185+
It is however noted that the communication routines are expensive,
186+
especially when running on large number of processors.
187+
So applications should try to minimize the number of calls to them by adjusting the algorithms in use,
188+
even sometimes by duplicating computations.
189+
190+
Finally, before exit, applications should clean up the memory by:
191+
192+
::
193+
194+
call decomp_2d_finalize
195+
196+
**Advanced 2D Decomposition API**
197+
198+
While the basic decomposition API is very user-friendly, there may be situations in which
199+
applications need to handle more complex data structures. There are quite a few examples:
200+
201+
* While using real-to-complex FFTs, applications need to store both the real input
202+
(say, of global size nx*ny*nz)
203+
and the corresponding complex output (of smaller global size - such as (nx/2+1)*ny*nz -
204+
where roughly half the output is dropped due to conjugate symmetry).
205+
206+
* Many CFD applications use a staggered mesh system which requires different storage for global quantities
207+
(e.g. cell-centred vs. cell-interface storage).
208+
209+
* In applications using spectral method, for anti-aliasing purpose,
210+
it is a common practice to enlarge the spatial domain before applying the Fourier transforms.
211+
212+
In all these examples, there are multiple global sizes and applications need to be able to distributed
213+
different data sets as 2D pencils.
214+
``2decomp&FFT`` provides a powerful and flexible programming interface to handle this:
215+
216+
::
217+
218+
TYPE(DECOMP_INFO) :: new_decomp
219+
call decomp_info_init(n1, n2, n3, new_decomp)
220+
221+
Here decomp is an instance of Fortran derived data type DECOMP_INFO encapsulating
222+
the 2D decomposition information associated with one particular global size :math:`n1\times n2 \times n3`.
223+
The decomposition object can be initialised using the ``decomp_info_init`` routine as:
224+
225+
::
226+
227+
call decomp_info_init(n1,n2,n3, new_decomp)
228+
229+
This object then can be passed to the communication routines defined in the basic interface as a third parameter.
230+
For example:
231+
232+
::
233+
234+
call transpose_x_to_y(var_in, var_out, new_decomp)
235+
236+
The input and output arrays can be allocated as:
237+
238+
::
239+
240+
call alloc_x(var_in, new_decomp, .true.)
241+
call alloc_y(var_out, new_decomp, .true.)
242+
243+
Finally the defined type needs also to be nullified using:
244+
245+
::
246+
247+
call decomp_info_finalize(new_decomp)
22248

23-
* ``nrank`` - the rank of the current MPI process.
24249

25-
* ``xsize(i), ysize(i), zsize(i), i=1,2,3`` - sizes of the sub-domains held by the current process. The first letter refers to the pencil orientation and the three 1D array elements contain the sub-domain sizes in X, Y and Z directions, respectively. In a 2D pencil decomposition, there is always one dimension which completely resides in local memory. So by definition xsize(1)==nx_global, ysize(2)==ny_global and zsize(3)==nz_global.
26250

27-
* ``xstart(i), ystart(i), zstart(i), xend(i), yend(i), zend(i), i=1,2,3`` - the starting and ending indices for each sub-domain, as in the global coordinate system. Obviously, it can be seen that xsize(i)=xend(i)-xstart(i)+1. It may be convenient for certain applications to use global coordinate (for example when extracting a 2D plane from a 3D domain, it is easier to know which process owns the plane if global index is used).

0 commit comments

Comments
 (0)