-
Notifications
You must be signed in to change notification settings - Fork 48
Description
After looking at the implementations of dash::copy today, I noticed a major flaw in the current implementation. In a nutshell, this is what dash::copy(T*, T*, GlobOutputIter) does:
memcpythe local portion of the transferputthe preceeding elementsputthe succeeding elements
Now, this will work fine for up to 3 units. However, experience from the last couple of years dictates that there well might be more than 3 units in the future so whenever a user tries to copy a local vector into parts of a dash::Array spanning >=4 units a unicorn will die a gruel death.
My guess is that DASH assumes that DART is aware of the continuous address space, i.e., a gptr is just the start of an arbitrary region that can potentially span over multiple units. The bad news is that DART communication operations are totally agnostic of this atm so the higher levels have to make sure not to write out-of-bounds of a single unit referenced in a gptr. In essence, #398 addresses this problem for dash::transform.
We could, however, give DART the notion of a continuous address space and let it handle multi-unit puts and gets rather easily. DART has the information on the size of each allocation on the individual units available and could thus nicely overlap remote transfers and (node-)local memcpys. It would also be a little more efficient since the meta-data queries would have to be done once instead of once for every individual target unit.
While I think this would be a worthwhile addition to DART, it also alters the semantics of it, shifting away from being a slim wrapper around MPI. But maybe this is how DASH already expects it to behave? The alternative would be to adapt dash::copy to handle these cases (not sure what other DASH features are affected).
Please comment.