Skip to content

Excessive spilling in JVM shuffle #3198

@andygrove

Description

@andygrove

Describe the bug

Using the PySpark benchmark in the repo, I am comparing logging and metrics for JVM vs native shuffle.

JVM shuffle spills 96 times:

26/01/15 13:48:46 INFO CometShuffleExternalSorter: Thread 98 spilling sort data of 512.0 MiB to disk (1  time so far)
26/01/15 13:48:49 INFO CometShuffleExternalSorter: Thread 82 spilling sort data of 512.0 MiB to disk (2  times so far)
26/01/15 13:48:49 INFO CometShuffleExternalSorter: Thread 95 spilling sort data of 512.0 MiB to disk (2  times so far)
26/01/15 13:48:49 INFO CometShuffleExternalSorter: Thread 104 spilling sort data of 512.0 MiB to disk (2  times so far)
26/01/15 13:48:49 INFO CometShuffleExternalSorter: Thread 106 spilling sort data of 512.0 MiB to disk (2  times so far)
...
Image

Native shuffle spills 32 times:

26/01/15 15:42:36 INFO core/src/execution/shuffle/shuffle_writer.rs: ShuffleRepartitioner spilling shuffle data of 532719016 to disk while inserting (0 time(s) so far)
26/01/15 15:42:36 INFO core/src/execution/shuffle/shuffle_writer.rs: ShuffleRepartitioner spilling shuffle data of 532094760 to disk while inserting (0 time(s) so far)
26/01/15 15:42:36 INFO core/src/execution/shuffle/shuffle_writer.rs: ShuffleRepartitioner spilling shuffle data of 532772904 to disk while inserting (0 time(s) so far)
26/01/15 15:42:37 INFO core/src/execution/shuffle/shuffle_writer.rs: ShuffleRepartitioner spilling shuffle data of 532772904 to disk while inserting (0 time(s) so far)
26/01/15 15:42:37 INFO core/src/execution/shuffle/shuffle_writer.rs: ShuffleRepartitioner spilling shuffle data of 532719208 to disk while inserting (0 time(s) so far)
...
Image

Steps to reproduce

No response

Expected behavior

No response

Additional context

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions