Skip to content

Conversation

@kwmsmith
Copy link
Contributor

@kwmsmith kwmsmith commented Sep 1, 2014

Working implementation of block (and no-dist) redistribution.

Two main cases:

  • global shape preserving redistribution, and
  • reshaping redistribution.

Both cases support redistribution on a different set of targets. The source and destination targets can be identical, overlapping, or disjoint.

Some small performance optimizations are implemented, but there is much room for improvement. This version just gets general redistribution working; future work will need to address performance and optimizations.

Kurt Smith added 27 commits July 8, 2014 14:30
Conflicts:
	distarray/globalapi/context.py
Necessary with redistribution, since we're calling functions on
processes where an array might not be defined.
Conflicts:
	distarray/globalapi/context.py
@kwmsmith kwmsmith added this to the 0.6 milestone Sep 1, 2014
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a docstring addition for default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually, what is default for? A default value when something is a Proxy but doesn't have a dereference attribute? Or is it when .dereference() raises an AttributeError over .name or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored things so default is no longer necessary, and added a long explanatory comment.

@bgrant
Copy link
Contributor

bgrant commented Sep 2, 2014

General notes:

  • Do we want an alias for distribute_as called reshape?
  • It would be nice to have an example of this added to the features notebook. That could be a separate PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests of a couple of error conditions would be nice. I assume you get a "Distributions not compatible" error if the shapes don't match up?

@bgrant
Copy link
Contributor

bgrant commented Sep 3, 2014

Seems like distribute_as will allow incompatible reshapes without complaining...

Shape too small

In [16]: da
Out[16]: <DistArray(shape=(60,), targets=[0, 1, 2, 3])>

In [17]: da.toarray()
Out[17]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
       34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50,
       51, 52, 53, 54, 55, 56, 57, 58, 59])

In [18]: da2 = da.distribute_as(asdistribution(context, (3, 4)))

In [19]: da2
Out[19]: <DistArray(shape=(3, 4), targets=[0, 1, 2])>

In [20]: da2.toarray()
Out[20]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

Shape too large

In [22]: da3 = da.distribute_as(asdistribution(context, (3, 4000)))

In [23]: da3
Out[23]: <DistArray(shape=(3, 4000), targets=[0, 1, 2])>

In [24]: da3.toarray()
Out[24]:
array([[                   0,                    1,                    2,
        ...,  5908722714605125632,      343399815184695,
         2977442303652352512],
       [-2305843009213693952, -2305843009213693952,           4301330448,
        ...,                    0,                    0,
                           0],
       [-3458764513820540928, -3458764513820540928,    28147497699930880,
        ...,    11258999083454976,            671088640,

Unsupported dist type

Here dist has maps 'cn'. This does throw an error, it's just not a good user-decipherable error.

In [9]: da.distribute_as(dist)
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-9-51a3cbe6d1b9> in <module>()
----> 1 da.distribute_as(dist)

/Users/robertgrant/development/distarray/distarray/globalapi/distarray.pyc in distribute_as(self, dist)
    497
    498     def distribute_as(self, dist):
--> 499         plan = self.distribution.get_redist_plan(dist)
    500         ubercomm, all_targets = self.distribution.comm_union(dist)
    501         result = DistArray(dist, dtype=self.dtype)

/Users/robertgrant/development/distarray/distarray/globalapi/maps.pyc in get_redist_plan(self, other_dist)
    940         plan = []
    941         for source_dd, dest_dd in source_dest_pairs:
--> 942             intersections = _intersection(source_dd, dest_dd)
    943             if intersections and all(i for i in intersections):
    944                 source_coords = tuple(dd['proc_grid_rank'] for dd in source_dd)

/Users/robertgrant/development/distarray/distarray/globalapi/maps.pyc in _redist_intersection_reshape(source_dimdata, dest_dimdata)
    909     def _redist_intersection_reshape(source_dimdata, dest_dimdata):
    910         source_flat = global_flat_indices(source_dimdata)
--> 911         dest_flat = global_flat_indices(dest_dimdata)
    912         return _global_flat_indices_intersection(source_flat, dest_flat)
    913

/Users/robertgrant/development/distarray/distarray/globalapi/maps.pyc in global_flat_indices(dim_data)
    971     glb_strides = strides_from_shape(glb_shape)
    972
--> 973     ranges = [range(dd['start'], dd['stop']) for dd in dim_data[:-1]]
    974     start_ranges = ranges + [[dim_data[-1]['start']]]
    975     stop_ranges = ranges + [[dim_data[-1]['stop']]]

KeyError: 'stop'

@bgrant
Copy link
Contributor

bgrant commented Sep 3, 2014

This is great! Summary of issues, as I see them:

  1. Doesn't check for incompatible shapes / dists
  2. Docstrings
  3. Allow shape_or_dist in distribute_as, or add a reshape alias?
  4. A missed rename in the Makefile

Kurt Smith added 5 commits September 5, 2014 11:47
Refactor `arg_kwarg_proxy_converter` into function in metadata_utils,
add explanatory comment.
Also allow `distribute_as` to take `shape_or_dist`.
@kwmsmith
Copy link
Contributor Author

kwmsmith commented Sep 5, 2014

@bgrant all comments addressed, except for the reshape one. I'd like to discuss that, but it shouldn't block this PR.

@bgrant
Copy link
Contributor

bgrant commented Sep 6, 2014

Sounds good. We should consider supporting a single negative dimension size as well, like numpy's reshape does.

bgrant added a commit that referenced this pull request Sep 6, 2014
@bgrant bgrant merged commit a3ec96b into master Sep 6, 2014
@bgrant bgrant deleted the feature/block-redistribution branch September 6, 2014 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants