Skip to content

Commit 20ce831

Browse files
author
Damian Rouson
committed
Final submisssion.
1 parent d504e57 commit 20ce831

File tree

3 files changed

+6
-9
lines changed

3 files changed

+6
-9
lines changed

body.tex

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ \subsection{Motivation and Background}
5353
\end{figure*}
5454

5555
\subsection{Objectives}
56-
This evaluates alternatives for compiling, linking, and executing one
56+
This paper evaluates alternatives for compiling, linking, and executing one
5757
\gls{mini-app} \gls{caf} source code using the following technologies:
5858
\begin{itemize}
5959
\item \gls{mpi}~\cite{mpiforum2016mpi} and OpenSHMEM~\cite{openshmem2016} communication layers,
@@ -70,10 +70,7 @@ \subsection{Objectives}
7070
\item Puts: an image stores data in memory managed by
7171
another image without the receiving image's involvement.
7272
\end{enumerate}
73-
Fortran semantics necessitates that gets block. Puts are non-blocking.
74-
75-
%Our results will inform our future decisions around the choice of compiler, communication
76-
%layer, platform, and data access patterns.
73+
We also explore the performance and scalability of multi- versus many-core processors.
7774

7875
\section{Methodology}
7976
\subsection{Physics and numerics}
@@ -105,7 +102,7 @@ \subsection{Compilers, runtimes, and hardware}
105102
Cheyenne uses a Mellanox EDR Infiniband interconnect with a partial 9D Enhanced Hypercube single-plane topology.
106103
We compiled coarray-\gls{icar} on the \gls{nersc} systems using the Cray Fortran compiler version 8.6.0. We compiled
107104
at \gls{ncar} using the \gls{gcc} version 6.3 Fortran front end, which uses the
108-
OpenCoarrays \gls{abi}~\cite{fanfarillo2014opencoarrays} to support \gls{caf}. We tested two OpenCoarrays parallel runtime libraries:
105+
OpenCoarrays \gls{abi}~\cite{fanfarillo2014opencoarrays} to support \gls{caf}. We tested two parallel runtime libraries that implement the OpenCoarrays \gls{abi}:
109106
\begin{enumerate}
110107
\item The default \gls{mpi} library using the SGI MPT \gls{mpi},
111108
\item The recently released OpenSHMEM library.
@@ -138,7 +135,7 @@ \section{Discussion of Results}
138135
To aid interpretation, the scaling results are reported as a fraction of the ideal scaling for each machine in the top left.
139136
These results show that the Cray compiler+system scales better than the gfortran+SGI system,
140137
with 75\% efficiency at >10k cores, while on Cheyenne, only 55\% of ideal was achieved.
141-
This also shows that the KNL system scales well out to large core counts (60\% ideal with \num{19200} cores),
138+
This also shows that the KNL system scales well out to large core counts (60\% of the ideal with \num{19200} cores),
142139
but that the total runtimes are significantly slower than the equivalent runtimes on Xeons (top right).
143140
The KNL performance might be improved in the future by implementing OpenMP threaded parallelism within a node.
144141

main.pdf

94 Bytes
Binary file not shown.

main.tex

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -94,8 +94,8 @@
9494
We examine the scalability and performance of an open-source, \gls{caf} \gls{mini-app} that solves
9595
several parallel, numerical algorithms known to dominate the execution of \gls{icar}~\cite{gutmann2016intermediate} model,
9696
a package developed at the \gls{ncar}.
97-
The \gls{mini-app} uses standard Fortran 2008, including several Fortran 2008 implementations of the collective
98-
subroutines defined in the Committee Draft of the upcoming Fortran 2015 standard. The ability of \gls{caf} to run atop various
97+
The \gls{mini-app} uses standard Fortran 2008, including one Fortran 2008 implementation of a collective
98+
subroutine defined in the Committee Draft of the upcoming Fortran 2018 standard. The ability of \gls{caf} to run atop various
9999
communication layers and the increasing \gls{caf} compiler availability facilitated evaluating several compilers,
100100
runtime libraries and hardware platforms. Results are presented for the GNU and Cray compilers, each of which offers
101101
different parallel runtime libraries employing one or more communication layers, including \gls{mpi}, OpenSHMEM, and proprietary

0 commit comments

Comments
 (0)