You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[flang-rt] Optimise ShallowCopy and elemental copies in Assign
Using Descriptor.Element<>() when iterating through a rank-1 array is
currently inefficient, because the generic implementation suitable
for arrays of any rank makes the compiler unable to perform
optimisations that would make the rank-1 case considerably faster.
This is currently done inside ShallowCopy, as well as inside Assign
where the implementation of elemental copies is equivalent to
ShallowCopyDiscontiguousToDiscontiguous.
To address that, add a DescriptorIterator abstraction specialised both
for the optimised rank-1 case as well as for the generic case, and use
that throughout ShallowCopy to iterate over the arrays.
Furthermore, depending on the pointer type passed to memcpy, the
optimiser can remove the memcpy calls from ShallowCopy altogether which
can result in substantial performance improvements on its own. Check the
element size throughout ShallowCopy and use the pointer type that
matches it where applicable to make these optimisations possible.
Finally, replace the implementation of elemental copies inside Assign to
make use of the ShallowCopy* family of functions whenever possible.
For the thornado-mini application, this reduces the runtime by 27.7%.
Signed-off-by: Kajetan Puchalski <[email protected]>
0 commit comments