Skip to content

Commit 19fdbed

Browse files
authored
Merge pull request #1669 from stfc/450_mv_set_dirty_take_2
(Closes #450) remove set_dirty/clean from ACC regions.
2 parents 8b95c95 + c81d539 commit 19fdbed

27 files changed

+468
-432
lines changed

changelog

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
1) PR #1747 for #1720. Adds support for If blocks to PSyAD.
2+
2) PR #1669 for #450. Remove set_dirty/clean from ACC regions
23

34
release 2.3.0 9th June 2022
45

doc/user_guide/examples.rst

Lines changed: 28 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -481,35 +481,34 @@ better job when optimising the code.
481481
Example 14: OpenACC
482482
^^^^^^^^^^^^^^^^^^^
483483

484-
Example of adding OpenACC directives in the dynamo0.3 API. This is a
485-
work in progress so the generated code may not work as
486-
expected. However it is never-the-less useful as a starting
487-
point. Three scripts are provided.
488-
489-
The first script (``acc_kernels.py``) shows how to add OpenACC Kernels
490-
directives to the PSy-layer. This example only works with distributed
491-
memory switched off as the OpenACC Kernels transformation does not yet
492-
support halo exchanges within an OpenACC Kernels region.
493-
494-
The second script (``acc_parallel.py``)shows how to add OpenACC Loop,
495-
Parallel and Enter Data directives to the PSy-layer. Again this
496-
example only works with distributed memory switched off as the OpenACC
497-
Parallel transformation does not support halo exchanges within an
498-
OpenACC Parallel region.
499-
500-
The third script (``acc_parallel_dm.py``) is the same as the second
501-
except that it does support distributed memory being switched on by
502-
placing an OpenACC Parallel directive around each OpenACC Loop
503-
directive, rather than having one for the whole invoke. This approach
504-
avoids having halo exchanges within an OpenACC Parallel region.
505-
506-
The generated code has a number of problems including 1) it does not
507-
modify the kernels to include the OpenACC Routine directive, 2) a
508-
loop's upper bound is computed via a derived type (this should be
509-
computed beforehand) 3) set_dirty and set_clean calls are placed
510-
within an OpenACC Parallel directive and 4) there are no checks on
511-
whether loops are parallel or not, it is just assumed they are -
512-
i.e. support for colouring or locking is not yet implemented.
484+
Example of adding OpenACC directives in the dynamo0.3 API.
485+
A single transformation script (``acc_parallel_dm.py``) is provided
486+
which demonstrates how to add OpenACC Loop, Parallel and Enter Data
487+
directives to the PSy-layer. It supports distributed memory being
488+
switched on by placing an OpenACC Parallel directive around each
489+
OpenACC Loop directive, rather than having one for the whole invoke.
490+
This approach avoids having halo exchanges within an OpenACC Parallel
491+
region. The script also uses :ref:`ACCRoutineTrans <available_kernel_trans>`
492+
to transform the one user-supplied kernel through
493+
the addition of an ``!$acc routine`` directive. This ensures that the
494+
compiler builds a version suitable for execution on the accelerator (GPU).
495+
496+
The generated code has two problems:
497+
498+
1. There are no checks on whether loops are safe to parallelise or not,
499+
it is just assumed they are - i.e. support for colouring or locking
500+
is not yet implemented.
501+
2. Although the user-supplied kernel is transformed so as to have the
502+
necessary ``!$acc routine`` directive, the associated (but unnecessary)
503+
``use`` statement in the transformed Algorithym layer still uses the
504+
name of the original, untransformed kernel (issue #1724).
505+
506+
Since no colouring is required in this case, the generated Alg layer
507+
may be fixed by hand (by simply deleting the offending ``use`` statement)
508+
and the resulting code compiled and run on GPU. However, performance will
509+
be very poor as, with the limited optimisations and directives currently
510+
applied, the NVIDIA compiler refuses to run the user-supplied kernel in
511+
parallel.
513512

514513
Example 15: CPU Optimisation of Matvec
515514
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

doc/user_guide/transformations.rst

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -522,6 +522,8 @@ variable that is available to it from the enclosing module scope.
522522
.. note:: these rules *only* apply to kernels that are the target of
523523
PSyclone kernel transformations.
524524

525+
.. _available_kernel_trans:
526+
525527
Available Kernel Transformations
526528
++++++++++++++++++++++++++++++++
527529

@@ -1011,9 +1013,10 @@ user-supplied kernel routines are called from within
10111013
PSyclone-generated loops in the PSy layer. PSyclone therefore provides
10121014
the ``ACCRoutineTrans`` transformation which, given a Kernel node in
10131015
the PSyIR, creates a new version of that kernel with the ``routine``
1014-
directive added. Again, please see PSyclone/examples/gocean/eg2 for an
1015-
example. This transformation is currently not supported for kernels in
1016-
the Dynamo0.3 API.
1016+
directive added. See either PSyclone/examples/gocean/eg2 or
1017+
PSyclone/examples/lfric/eg14 for an example (although please note that
1018+
this transformation is not yet fully working for kernels in
1019+
the LFRic (Dynamo0.3) API - see #1724).
10171020

10181021
SIR
10191022
---

examples/lfric/README.md

Lines changed: 11 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -278,34 +278,19 @@ psyclone -s ./kernel_constants.py \
278278
## Example 14: OpenACC
279279

280280
This example shows how OpenACC directives can be added to the LFRic
281-
PSy-layer. This is work in progress so the resultant code is not
282-
expected to run correctly but it gives a starting point for
283-
evaluation.
281+
PSy-layer. It adds OpenACC enter data, parallel and loop directives in the
282+
presence of halo exchanges. It also transforms the (one) user-supplied
283+
kernel with the addition of an `ACC routine` directive.
284284

285-
1. Adding OpenACC kernels directives. -nodm is used as an exception is
286-
raised if Halo Exchange nodes are found within an OpenACC kernels
287-
region.
288-
```sh
289-
cd eg14/
290-
psyclone -s ./acc_kernels.py -nodm main.x90
291-
```
292-
293-
2. Adding OpenACC enter data, parallel and loop directives. -nodm is
294-
used as an exception is raised if Halo Exchange nodes are found within
295-
an OpenACC parallel region.
296-
```sh
297-
cd eg14/
298-
psyclone -s ./acc_parallel.py -nodm main.x90
299-
```
285+
```sh
286+
cd eg14/
287+
psyclone -s ./acc_parallel_dm.py main.x90
288+
```
300289

301-
3. Adding OpenACC enter data, parallel and loop directives in the
302-
presence of halo exchanges. This does not currently produce compilable code
303-
because calls to set_clean()/dirty() end up within parallel regions - TODO
304-
#450.
305-
```sh
306-
cd eg14/
307-
psyclone -s ./acc_parallel_dm.py main.x90
308-
```
290+
The supplied Makefile defines a `compile` target that will build the
291+
transformed code. Currently the compilation will fail because the
292+
generated PSy-layer code does not contain the correct name for the
293+
transformed kernel module (issue #1724).
309294

310295
## Example 15: Optimise matvec Kernel for CPU
311296

examples/lfric/code/dg_matrix_vector_kernel_mod.F90

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ module dg_matrix_vector_kernel_mod
4545
GH_FIELD, GH_OPERATOR, &
4646
GH_REAL, GH_READ, GH_WRITE, &
4747
ANY_DISCONTINUOUS_SPACE_1, &
48-
ANY_SPACE_1, &
48+
ANY_SPACE_1, GH_READWRITE, &
4949
CELL_COLUMN
5050

5151
use constants_mod, only : r_def, i_def

examples/lfric/eg14/.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
example_openacc
22
main_alg.f90
33
main_psy.f90
4+
testkern_w0_kernel_0_mod.f90

examples/lfric/eg14/Makefile

Lines changed: 22 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -36,20 +36,22 @@
3636

3737
# The compiler to use may be specified via the F90 and F90FLAGS
3838
# environment variables. To use the NVIDIA compiler and enable
39-
# openacc compilation, use:
39+
# openacc compilation with managed memory, use:
40+
#
4041
# export F90=nvfortran
41-
# export F90FLAGS="-acc"
42+
# export F90FLAGS="-acc=gpu -Minfo=all -gpu=managed"
4243

4344
PSYROOT=../../..
4445

4546
include $(PSYROOT)/examples/common.mk
4647

47-
GENERATED_FILES = *.o *.mod $(EXEC) main_alg.f90 main_psy.f90
48+
GENERATED_FILES = *.o *.mod $(EXEC) main_alg.f90 main_psy.f90 \
49+
testkern_w0_kernel_0_mod.f90
4850

4951
F90 ?= gfortran
5052
F90FLAGS ?= -Wall -g
5153

52-
OBJ = main_psy.o main_alg.o testkern_w0_kernel_mod.o
54+
OBJ = main_psy.o main_alg.o testkern_w0_kernel_0_mod.o
5355

5456
EXEC = example_openacc
5557

@@ -59,33 +61,27 @@ LFRIC_LIB=$(LFRIC_DIR)/lib$(LFRIC_NAME).a
5961

6062
F90FLAGS += -I$(LFRIC_DIR)
6163

62-
.PHONY: transformtransform_kernels transform_parallel transform_parallel_dm
63-
64-
transform: transform_kernels transform_parallel transform_parallel_dm
65-
66-
transform_kernels:
67-
${PSYCLONE} -nodm -s ./acc_kernels.py \
68-
-opsy main_psy.f90 -oalg main_alg.f90 main.x90
69-
70-
transform_parallel:
71-
${PSYCLONE} -nodm -s ./acc_parallel.py \
72-
-opsy main_psy.f90 -oalg main_alg.f90 main.x90
64+
.PHONY: transform compile run
7365

74-
transform_parallel_dm:
66+
# This makefile assumes that the transformed kernel will be named
67+
# 'testkern_w0_kernel_0_mod.f90'. However, if it already exists then PSyclone
68+
# will create 'testkern_..._1_mod.f90' so remove it first.
69+
transform:
70+
rm -f testkern_w0_kernel_0_mod.f90
7571
${PSYCLONE} -dm -s ./acc_parallel_dm.py \
7672
-opsy main_psy.f90 -oalg main_alg.f90 main.x90
7773

78-
79-
%_psy.f90: %.x90
74+
%_psy.f90: %.x90
8075
${PSYCLONE} -s ./acc_parallel_dm.py \
8176
-opsy $*_psy.f90 -oalg $*_alg.f90 $<
8277

83-
#TODO #1669 - the code currently does not compile
84-
# set_dirty calls inside openacc region
85-
#TODO #1694 - the code currently does not compile
86-
# incorrect variable names and constants
87-
# when using builtin
88-
compile: transform_parallel_dm $(EXEC)
78+
testkern_w0_kernel_0_mod.f90: main_psy.f90
79+
80+
# TODO #1724 - compilation currently fails because module name in use
81+
# statement needs correcting following ACCRoutineTrans of Kernel.
82+
compile: transform
83+
@echo "No compilation supported for lfric/eg14 due to #1724"
84+
8985

9086
run: compile
9187
./$(EXEC)
@@ -97,8 +93,8 @@ $(LFRIC_LIB):
9793
$(MAKE) -C $(LFRIC_DIR)
9894

9995
# Dependencies
100-
main_psy.o: testkern_w0_kernel_mod.o
101-
main_alg.o: main_psy.o
96+
main_psy.o: testkern_w0_kernel_0_mod.o
97+
main_alg.o: main_psy.o testkern_w0_kernel_0_mod.o
10298

10399
%.o: %.F90
104100
$(F90) $(F90FLAGS) -c $<
@@ -111,9 +107,5 @@ main_alg.o: main_psy.o
111107

112108
main_alg.f90: main_psy.f90
113109

114-
%_psy.f90: %.x90
115-
${PSYCLONE} -s ./acc_parallel_dm.py \
116-
-opsy $*_psy.f90 -oalg $*_alg.f90 $<
117-
118110
allclean: clean
119111
$(MAKE) -C $(LFRIC_DIR) allclean

examples/lfric/eg14/README.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,16 @@ uses OpenACC. The framework for this stand-alone example is explained in
55
more details in the directory
66
``<PSYCLONEHOME>/examples/lfric/eg17/full_example``.
77

8-
The script ``acc_parallel_dm.py`` applies the OpenACC transformation to all
9-
kernels. See the [OpenACC](https://psyclone.readthedocs.io/en/stable/transformations.html#openacc)
10-
section of the PSyclone documentation for details about this transformation.
8+
The script ``acc_parallel_dm.py`` applies various OpenACC transformations
9+
to all kernels. See the PSyclone User Guide for [details](https://psyclone.readthedocs.io/en/stable/examples.html#example-14-openacc).
1110

1211
## Compilation
1312

14-
A simple makefile is provided to compile the example. It needs:
13+
Note that due to #1724 compilation will currently fail. A temporary workaround
14+
is to edit the generated Alg file (``main_alg.f90``) and remove the
15+
``use testkern_w0_kernel_mod, only: ...`` line.
16+
17+
A simple Makefile is provided to compile the example. It needs:
1518
- the infrastructure library ``liblfric.a`` provided in
1619
``<PSYCLONEHOME>/src/psyclone/tests/test_files/dynamo0p3/infrastructure``
1720

examples/lfric/eg14/acc_kernels.py

Lines changed: 0 additions & 58 deletions
This file was deleted.

examples/lfric/eg14/acc_parallel.py

Lines changed: 0 additions & 66 deletions
This file was deleted.

0 commit comments

Comments
 (0)