You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: body.tex
+4-7Lines changed: 4 additions & 7 deletions
Original file line number
Diff line number
Diff line change
@@ -53,7 +53,7 @@ \subsection{Motivation and Background}
53
53
\end{figure*}
54
54
55
55
\subsection{Objectives}
56
-
This evaluates alternatives for compiling, linking, and executing one
56
+
This paper evaluates alternatives for compiling, linking, and executing one
57
57
\gls{mini-app} \gls{caf} source code using the following technologies:
58
58
\begin{itemize}
59
59
\item\gls{mpi}~\cite{mpiforum2016mpi} and OpenSHMEM~\cite{openshmem2016} communication layers,
@@ -70,10 +70,7 @@ \subsection{Objectives}
70
70
\item Puts: an image stores data in memory managed by
71
71
another image without the receiving image's involvement.
72
72
\end{enumerate}
73
-
Fortran semantics necessitates that gets block. Puts are non-blocking.
74
-
75
-
%Our results will inform our future decisions around the choice of compiler, communication
76
-
%layer, platform, and data access patterns.
73
+
We also explore the performance and scalability of multi- versus many-core processors.
77
74
78
75
\section{Methodology}
79
76
\subsection{Physics and numerics}
@@ -105,7 +102,7 @@ \subsection{Compilers, runtimes, and hardware}
105
102
Cheyenne uses a Mellanox EDR Infiniband interconnect with a partial 9D Enhanced Hypercube single-plane topology.
106
103
We compiled coarray-\gls{icar} on the \gls{nersc} systems using the Cray Fortran compiler version 8.6.0. We compiled
107
104
at \gls{ncar} using the \gls{gcc} version 6.3 Fortran front end, which uses the
108
-
OpenCoarrays \gls{abi}~\cite{fanfarillo2014opencoarrays} to support \gls{caf}. We tested two OpenCoarrays parallel runtime libraries:
105
+
OpenCoarrays \gls{abi}~\cite{fanfarillo2014opencoarrays} to support \gls{caf}. We tested two parallel runtime libraries that implement the OpenCoarrays \gls{abi}:
109
106
\begin{enumerate}
110
107
\item The default \gls{mpi} library using the SGI MPT \gls{mpi},
111
108
\item The recently released OpenSHMEM library.
@@ -138,7 +135,7 @@ \section{Discussion of Results}
138
135
To aid interpretation, the scaling results are reported as a fraction of the ideal scaling for each machine in the top left.
139
136
These results show that the Cray compiler+system scales better than the gfortran+SGI system,
140
137
with 75\% efficiency at >10k cores, while on Cheyenne, only 55\% of ideal was achieved.
141
-
This also shows that the KNL system scales well out to large core counts (60\% ideal with \num{19200} cores),
138
+
This also shows that the KNL system scales well out to large core counts (60\%of the ideal with \num{19200} cores),
142
139
but that the total runtimes are significantly slower than the equivalent runtimes on Xeons (top right).
143
140
The KNL performance might be improved in the future by implementing OpenMP threaded parallelism within a node.
0 commit comments