PR #139 introduces the masked parameter to asarray to be able to collect a masked array from all ranks where the mask is the one given when initializing the DistributedArray.
@tharittk discovered an issue with the NCCL backend likely due to self.local_shapes (which call all GPUs in the world communicator). MPI does not have this problem because it sends the object and no buffer / array unrolling needed.