File tree Expand file tree Collapse file tree 1 file changed +7
-2
lines changed
ballista/core/src/execution_plans/sort_shuffle Expand file tree Collapse file tree 1 file changed +7
-2
lines changed Original file line number Diff line number Diff line change 1717
1818//! Sort-based shuffle implementation for Ballista.
1919//!
20- //! This module provides an alternative to the hash-based shuffle that writes
20+ //! This module provides an alternative to the hash-based shuffle. It writes
2121//! a single consolidated file per input partition (sorted by output partition ID)
22- //! along with an index file mapping partition IDs to byte offsets .
22+ //! along with an index file mapping partition IDs to batch ranges .
2323//!
2424//! This approach reduces file count from `N × M` (N input partitions × M output partitions)
2525//! to `2 × N` files (one data + one index per input partition).
26+ //!
27+ //! The algorithm follows the approach used by Apache Spark: internally, results from
28+ //! individual map tasks are kept in memory until they can't fit. Then, these are
29+ //! sorted based on the target partition and written to a single file. On the reduce
30+ //! side, tasks read the relevant sorted blocks.
2631
2732mod buffer;
2833mod config;
You can’t perform that action at this time.
0 commit comments