Skip to content

May I ask why presto on spark needs to perform rdd collect?Β #23830

@guobj

Description

@guobj

101e6ce847a6c737767f3b485
I want to know why we need to do rdd collect here. Doing this kind of operation in spark will put a lot of pressure on the driver. During the test, I found that the driver often reports OOM.
When I check the source code, the spark driver performs the commit operation of the coordinator_olnly type stage. I understand that only metadata needs to be submitted when doing the commit operation. Why do I need to submit the data together.
Is the coordinator implemented in this way in native presto?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    βœ… Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions