
I want to know why we need to do rdd collect here. Doing this kind of operation in spark will put a lot of pressure on the driver. During the test, I found that the driver often reports OOM.
When I check the source code, the spark driver performs the commit operation of the coordinator_olnly type stage. I understand that only metadata needs to be submitted when doing the commit operation. Why do I need to submit the data together.
Is the coordinator implemented in this way in native presto?