Skip to content

Improvement: Assign each worker to the corresponding non-overlapping partitioned tasks to minimize network communication #118

@NGA-TRAN

Description

@NGA-TRAN

By assigning workers to tasks based on their partitioning knowledge, we can reduce network communication and avoid unnecessary data serialization and deserialization.

Let us look at this input plan

Ref: See #117 for full context of this query

Image

If the files are partitioned or sorted by the hash key flag, status, we can split them into non-overlapping groups—making p0 identical to p0′, p1 to p1′, and p2 to p2′. With this alignment, assigning workers as shown below eliminates the need for data transfer between workers, except in the final stage.

Image

This ties into a broader topic: understanding data layout and strategically partitioning data to take full advantage of it

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions