Basic External-Merge-Sort

This repository showcases a basic implementation of external-merge-sort and its performance compared to regular sorting backed up swap-space / page-cache (MMAP).

Note: The code-quality, compilation / execution setup and documentation are exteremely basic (not expected the repository have grown like this, was supposed to be a few files).

Details

The sorting is performed on a file of 8-byte values (LONGs) using different approaches with limited memory allocated to the processes.

Approaches:

Dumb (user-space memory, double-buffering of file contents, sys-calls IO, using file-system cache): Reading all the data from the file into user-space, sorting it and writing it all back. The only reason why it works is due to the swap-space (requires it to be large enough to fit in all the user-space memory).
Mapped (kernel-space memory, no double-buffering, memory-mapped IO): Mapping the entire contents of the file into user-space (the actual physical-pages belong to the page-cache) and operating on them. This method works well enough as a low-effort solution.
External-Merge-Sort (ONLY user-space memory, direct IO, no file-system / page cache): Performing the actual external-merge-sort with minimum assistance of the OS (no page-cache / page-buffer, only using direct IO, allocating all the memory at once into user-space, not using swap-space).

See STATS.md for example stats of each approach under specific conditions.

Important: Unlike the other two approaches, the external-merge-sort can be optimized by A LOT, but only the basic implementation is used to showcase that even the most simple implementation still performs much better (see the "Future" section for the list of optimizations).

Running

Requirements:

Linux operating system (Ubuntu 24.04.03 was used).
C-Groups V2: Configuration of the memory-limit and inspection of the IO / memory stats depends on it.
Support of swap-space: Needed for running the "Dumb" approach.
GCC C++ compiler (with version 20 support) and GLIBC.
Admin-level permissions (performing sudo): Needed for C-Groups, Swap-space and other setup.

Setup:

C-Groups V2: Enabling the use of CPU, memory and IO controllers (./setup-cg.sh).
Swap-Space: Creating a 1GB swap-space for the "Dumb" approach to work (./setup-swap.sh).

Execution:

Populating a data-file with 8-byte values (./populate.sh): Generates LONG values from 1 to the specific limit, shuffles them into a random order and writes them out. Note: Separated from the actual sorting script to have no affect on the memory-usage / IO / file-system / etc.
Running the actual sorting (./run.sh): Used for running sorting using all of the approaches (comment out to select the target one for compilation, the rest of the script is the same).
1. Compiles the target sorting executable
2. Flushes / clears the page-cache and page-buffer (simulating complete cold-start / no data in memory).
3. Initializes a C-Group for running the sorting with a specific memory-limit and joining it (therefore the sorting process is also going to join).
4. Running the sorting and checking that it was successful.
5. Outputting stats about CPU, memory and IO utilization.
Checking that the sorting was done correclty (./check.sh): Confirms that the LONG values are in order with no items missing.

Organization

Top-level:

build: Generated for building executables (no object-files, cause the compilation setup is very basic)
src: Source-files for the utilities and sorting code.
workspace: Directory with the data-file as well as temporary work-files (only generated by external-merge-sort).
Scripts (run.sh / setup-cg.sh / etc.): Setup and execution of sorting (see the details above).

Code:

common.hpp: Common definitions of items, their sizes and various minor helpers.
data: Code for populating the data-file (populate.cpp) and checking it after sorting (check.cpp).
sort: Different sorting approaches implementations (everything related to the external-merge-sort is under external).

See the actual code for implementation details (rough for now, but should be readable).

Future

As mentione, external-merge-sort can be further optimized and improved:

Performing double-buffering: Reading in a new block of data for sorting while performing sorting on a previous block of data (interleaving CPU work and IO). Utilizing just a single CPU by having the IO be performed in non-blocking / asynchronous way.
Parallelizing: Performing sorting and merging in parallel using multiple threads / multiple CPUs.
Tuning work-memory separation / distribution of buffer-pages amongst components of the algorithm.
Optimizing data-structures / using more specialized algorithms.
etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Basic External-Merge-Sort

Details

Running

Organization

Future

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Basic External-Merge-Sort

Details

Running

Organization

Future