-
Notifications
You must be signed in to change notification settings - Fork 43
Expand file tree
/
Copy pathexecution_model.tex
More file actions
96 lines (85 loc) · 5.77 KB
/
execution_model.tex
File metadata and controls
96 lines (85 loc) · 5.77 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
An \openshmem program consists of a set of \openshmem processes called
\acp{PE}. While not required by \openshmem, in typical usage, \acp{PE} are
executed using a single program, multiple data (\ac{SPMD}) model. \ac{SPMD}
requires each \ac{PE} to use the same executable; however, \acp{PE} are able to
follow divergent control paths. \acp{PE} are often implemented using \ac{OS}
processes and \acp{PE} are permitted to create additional
threads, when supported by the \openshmem library.
\ac{PE} execution is loosely coupled, relying on \openshmem operations to
communicate and synchronize among executing \acp{PE}. The \openshmem phase in
a program begins with a call to the initialization routine \FUNC{shmem\_init}
or \FUNC{shmem\_init\_thread}, which must be performed before using any of the
other \openshmem library routines.
An \openshmem program concludes its use of the \openshmem library when all \acp{PE} call
\FUNC{shmem\_finalize} or any \ac{PE} calls \FUNC{shmem\_global\_exit}.
During a call to \FUNC{shmem\_finalize}, the \openshmem library must
complete all pending communication and release all the resources associated to
the library using an implicit collective synchronization across \acp{PE}.
Calling any \openshmem routine before initialization or after
\FUNC{shmem\_finalize} leads to undefined behavior. After finalization, a
subsequent initialization call also leads to undefined behavior.
The \acp{PE} of the \openshmem program are identified by unique integers. The
identifiers are integers assigned in a monotonically increasing manner from zero
to one less than the total number of \acp{PE}. \ac{PE} identifiers are used for
\openshmem calls (e.g., to specify \OPR{put} or \OPR{get} routines on symmetric
data objects, collective synchronization calls) or to dictate a control flow for
\acp{PE} using constructs of \Cstd. The identifiers are fixed for
the duration of the \openshmem phase of a program.
\subsection{Progress of OpenSHMEM Operations}\label{subsec:progress}
The \openshmem model assumes that computation and communication are naturally
overlapped. \openshmem programs are expected to exhibit progression of
communication both with and without \openshmem calls. For point-to-point
operations, consider a \ac{PE} that is
engaged in a computation with no \openshmem calls. Other \acp{PE} should be able
to communicate (e.g., \OPR{put}, \OPR{get}, \OPR{atomic}, etc.) and
complete communication operations with that computationally-bound \ac{PE}
without that \ac{PE} issuing any explicit \openshmem calls. One-sided \openshmem
communication calls involving that \ac{PE} should progress regardless of when
that \ac{PE} next engages in an \openshmem call. Similarly,
for nonblocking collectives, consider the \acp{PE} that are part of a team
issuing a nonblocking collective and overlapping collective completion with
computation. Once a nonblocking collective operation is initiated by
all of the \acp{PE} in the team of the collective, any \ac{PE} in the team must
eventually observe completion through a call to \FUNC{shmem\_req\_test} or a
call to \FUNC{shmem\_req\_wait}.
\parimpnotes{
An \openshmem implementation for hardware that does not provide
asynchronous communication capabilities may require a software progress
thread in order to process remotely-issued communication requests without
explicit program calls to the \openshmem library.
High performance implementations of \openshmem are expected to leverage
hardware offload capabilities and provide asynchronous one-sided
communication without software assistance.
Implementations should avoid deferring the execution of one-sided
operations until a synchronization point where data is known to be
available. High-quality implementations should attempt asynchronous delivery
whenever possible, for performance reasons. Additionally, the \openshmem
community discourages releasing \openshmem implementations that do not
provide asynchronous one-sided operations, as these have very limited
performance value for \openshmem programs.
}
\subsection{Invoking OpenSHMEM Operations}\label{subsec:invoking_openshmem_operations}
Pointer arguments to \openshmem routines that point to non-\CTYPE{const} data must not
overlap in memory with other arguments to the same \openshmem operation, with
the exception of in-place reductions as described in Section~\ref{subsec:shmem_reductions}.
Otherwise, the behavior is undefined. Two arguments overlap in memory if any
of their data elements are contained in the same physical memory locations.
For example, consider an address $a$ returned by the \FUNC{shmem\_ptr} operation
for symmetric object $A$ on \ac{PE} $i$. Providing the local address $a$ and
the symmetric address of object $A$ to an \openshmem operation targeting
\ac{PE} $i$ results in undefined behavior.
Buffers provided to \openshmem routines are \emph{in-use} until the
corresponding \openshmem operation has completed at the calling \ac{PE}.
Updates to a buffer that is in-use, including updates performed through locally
and remotely issued \openshmem operations, result in undefined behavior.
Similarly, reads from a buffer that is in-use are allowed only when the buffer
was provided as a \CTYPE{const}-qualified argument to the \openshmem routine for
which it is in-use. Otherwise, the behavior is undefined. Exceptions are made for
buffers that are in-use by \acp{AMO}, as described in
Section~\ref{subsec:amo_guarantees}. For information regarding the completion
of \openshmem operations, see Section~\ref{subsec:memory_order}.
\openshmem routines with multiple symmetric object arguments do not require
these symmetric objects to be located within the same symmetric memory segment.
For example, objects located in the symmetric data segment and objects located
in the symmetric heap can be provided as arguments to the same \openshmem
operation.