Skip to content

Refactor Collective Views #1924

@lightsighter

Description

@lightsighter

One of the last remaining pieces of technical debt in Legion's physical analysis is the current implementation of collective views. In retrospect, the current implementation is scalable but doesn't make the right trade-offs in terms of implementation complexity. Currently we create collective views that represent a group of instances that all contain the same data (or the same reductions for reduction collective views). This makes it feasible to store a single collective view in an equivalence set and to reason about how to issue collective copies/reductions including the construction of broadcast and reduction trees as well as butterfly all-reduce networks. However, the implementation add to additional forms of complexity:

  1. It requires the construction of a collective view along with a deduplication against any prior constructions of the same collective view in the context, which requires additional communication.
  2. It creates a challenging aliasing problem to solve. The same physical instance can now be represented both with it's normal logical view as well as potentially multiple different collective views that contain it. Recognizing such aliasing to avoid unnecessary or duplicate copies is hard. Additionally it further complicates tracing capture and pre/post condition testing, making the tracing code onerous to maintain.

I'd like to refactor the physical analysis to get rid of the collective view data structures. Instead, we'll modify the implementation of equivalence sets to handle collective behavior. The following changes would be required:

  1. Currently equivalence sets have a notion of an owner node where the meta-data is stored for the equivalence set. This owner node can migrate around, but there is always exactly one. We would change this so that equivalence sets would effectively behave more like cache lines and be managed using a MOESI protocol. There could be multiple nodes with read-only or reduction instances at the same time (each node tracking instances local to it). However, writes would then invalidate the equivalence set so exactly the node(s) that perform the writes would be valid.
  2. Collective copy patterns would happen based on the current node(s) that have valid copies of data in the equivalence set. Broadcast and reduction trees as well as butterfly networks would all still be supported.
  3. We would still have the concept of collective mapping of tasks in an index space launch. This will make the state changes to the equivalence set MOESI protocol more scalable. It would also enable more efficient construction of broadcast/reduction trees and butterfly networks.
  4. Since there are no more collective views, we can use basic name testing for logical views when doing update analysis and trace pre/post condition testing.

This refactoring will effectively move the complexity of managing "collective" behavior from the collective view data structures into the equivalence sets which is a more natural place to handle such complexity. In general, our implementation of the MOESI protocol will be simpler than the traditional one because we know we have mapping dependences to protect the state changes and ensure that there are no races that need to be resolved during the state changes (eliminating the need for any intermediate states in the protocol). While this is true of the expressions that access equivalence sets, we can still have cases where two different (disjoint) subregions are analyzing an equivalence set in parallel with different privileges at the same time (because they are non-interfering in space), so care will need to be taken to handle such concurrent but non-interfering state changes. Another challenge will come from ordering read-only tasks (especially collectively mapped read-only tasks) to ensure that their updates to the equivalence sets do not interfere with each other (especially when the data in the equivalence set are pending reductions). In general I believe this will significantly improve the quality of the runtime internals and move complexity around into a more manageable place so that it doesn't ripple out to unrelated parts of the runtime (e.g. tracing).

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions