Skip to content

Make Spark know the partitioning of the read data #153

@EnricoMi

Description

@EnricoMi

The connector partitions the graph to allow Spark to read it in parallel. But Spark does not know anything about the partitioning. Say the connector partitions the graph by predicate and uid range, Spark would not know that and repartition / shuffle the read data if it wanted to join on partition or uid. If Spark would know the exact partitioning scheme, it could avoid un-needed shuffle steps.

Check to what extend Spark allows data sources to tell it about its partitioning.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions