Skip to content

Commit 95dbc65

Browse files
committed
Make a difference between measures of disorder and measures of presortedness
1 parent 4e06423 commit 95dbc65

31 files changed

+168
-141
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ of the library:
4141
* Sorters can be wrapped in [sorter adapters](https://github.com/Morwenn/cpp-sort/wiki/Sorter-adapters) to augment their behaviour
4242
* The library provides a [sorter facade](https://github.com/Morwenn/cpp-sort/wiki/Sorter-facade) to easily build sorters
4343
* [Fixed-size sorters](https://github.com/Morwenn/cpp-sort/wiki/Fixed-size-sorters) can be used to efficiently sort tiny fixed-size collections
44-
* [Measures of presortedness](https://github.com/Morwenn/cpp-sort/wiki/Measures-of-presortedness) can be used to evaluate the disorder in a collection
44+
* [Measures of disorder](https://github.com/Morwenn/cpp-sort/wiki/Measures-of-disorder) can be used to evaluate the disorder in a collection
4545

4646
Here is a more complete example of what can be done with the library:
4747

benchmarks/benchmarking-tools/distributions.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -374,7 +374,7 @@ namespace dist
374374
};
375375

376376
////////////////////////////////////////////////////////////
377-
// Distributions: testing measures of presortedness
377+
// Distributions: testing measures of disorder
378378

379379
struct inv:
380380
base_distribution<inv>

benchmarks/presortedness/bench-presortedness.cpp renamed to benchmarks/disorder/bench-disorder.cpp

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
/*
2-
* Copyright (c) 2020-2024 Morwenn
2+
* Copyright (c) 2020-2025 Morwenn
33
* SPDX-License-Identifier: MIT
44
*/
55
#include <algorithm>
@@ -89,10 +89,10 @@ int main(int argc, char* argv[])
8989
double factor = 0.01 * idx;
9090
auto distribution = dist::inv(factor);
9191

92-
// Compute presortedness
92+
// Compute disorder
9393
collection_t collection;
9494
distribution(std::back_inserter(collection), size);
95-
auto presortedness = cppsort::probe::inv(collection);
95+
auto disorder = cppsort::probe::inv(collection);
9696

9797
// Compute the time it took
9898
std::vector<std::uint64_t> cycles;
@@ -109,8 +109,8 @@ int main(int argc, char* argv[])
109109
}
110110

111111
// Compute and display stats & numbers
112-
output_file << presortedness << ",";
113-
std::cout << presortedness << ",";
112+
output_file << disorder << ",";
113+
std::cout << disorder << ",";
114114
auto it = cycles.begin();
115115
output_file << *it;
116116
std::cout << *it;

benchmarks/presortedness/check-distribution.cpp renamed to benchmarks/disorder/check-distribution.cpp

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@
1717
// Distribution
1818
using dist_t = dist::runs;
1919

20-
// Measure of presortedness
21-
auto mop = cppsort::probe::runs;
20+
// Measure of disorder
21+
auto measure = cppsort::probe::runs;
2222

2323
// Size of the collections to check
2424
constexpr std::size_t size = 1'000'000;
@@ -28,7 +28,7 @@ constexpr std::size_t size = 1'000'000;
2828
//
2929
// The raison d'être of this script is to be able to visualize
3030
// the aspects of the distributions used to test whether some
31-
// sorter adapt to given measures of presortedness: for example
31+
// sorter adapt to given measures of disorder: for example
3232
// dist::inv, when given a percentage pct, should be able to
3333
// create a random collection X such as:
3434
// prove::inv(X) = pct * probe::inv.max_for_size(|X|)
@@ -45,21 +45,21 @@ int main()
4545
// Print metadata about the check
4646
std::cout << dist_t::name << ','
4747
<< size << ','
48-
<< mop.max_for_size(size) << ','
48+
<< measure.max_for_size(size) << ','
4949
<< seed << std::endl;
5050

5151
for (int idx = 0; idx <= 100; ++idx) {
5252
// Generate data distribution
5353
double factor = 0.01 * idx;
5454
auto distribution = dist_t(factor);
5555

56-
// Compute presortedness
56+
// Compute disorder
5757
std::vector<int> collection;
5858
collection.reserve(size);
5959
distribution(std::back_inserter(collection), size);
60-
auto presortedness = mop(collection);
60+
auto disorder = measure(collection);
6161

6262
// Display results
63-
std::cout << idx << ',' << presortedness << std::endl;
63+
std::cout << idx << ',' << disorder << std::endl;
6464
}
6565
}

benchmarks/presortedness/plot-distribution.py renamed to benchmarks/disorder/plot-distribution.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# -*- coding: utf-8 -*-
22

3-
# Copyright (c) 2023 Morwenn
3+
# Copyright (c) 2023-2025 Morwenn
44
# SPDX-License-Identifier: MIT
55

66
import argparse
@@ -22,7 +22,7 @@
2222

2323
with path.open() as fd:
2424
# Read metadata from the first line
25-
mop_name, _, max_disorder, _ = fd.readline().strip().split(',')
25+
mod_name, _, max_disorder, _ = fd.readline().strip().split(',')
2626

2727
# Read the rest of the file
2828
percentages = []
@@ -40,10 +40,10 @@
4040
axes.ticklabel_format(style='plain')
4141
# Add a tick for the maximum disorder possible
4242
axes2.axhline(int(max_disorder), linestyle=':', color='gray')
43-
axes2.set_yticks([int(max_disorder)], [f"$max({mop_name.capitalize()}(X))$"])
43+
axes2.set_yticks([int(max_disorder)], [f"$max({mod_name.capitalize()}(X))$"])
4444

45-
pyplot.title(f"${mop_name.capitalize()}(X)$ generated by dist::{mop_name}")
46-
axes.set_ylabel(mop_name.capitalize())
45+
pyplot.title(f"${mod_name.capitalize()}(X)$ generated by dist::{mod_name}")
46+
axes.set_ylabel(mod_name.capitalize())
4747

4848
pyplot.tight_layout()
4949
pyplot.xlim(left=0, right=100)
Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# -*- coding: utf-8 -*-
22

3-
# Copyright (c) 2020-2024 Morwenn
3+
# Copyright (c) 2020-2025 Morwenn
44
# SPDX-License-Identifier: MIT
55

66
import argparse
@@ -43,7 +43,7 @@
4343
disorders = []
4444
with result_file.open() as fd:
4545
# Read metadata from the first line
46-
algo_name, mop_name, size = fd.readline().strip().split(',')
46+
algo_name, mod_name, size = fd.readline().strip().split(',')
4747

4848
# Read the rest of the file
4949
for line in fd:
@@ -71,7 +71,7 @@
7171
pyplot.title("Sorting std::vector<int> with $10^{}$ elements".format(
7272
round(math.log(int(size), 10)))
7373
)
74-
pyplot.xlabel(f"${mop_name}$")
74+
pyplot.xlabel(f"${mod_name}$")
7575
pyplot.ylabel("Cycles (lower is better)")
7676
pyplot.legend()
7777

docs/Benchmarks.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
* *1.16.0 for slow O(n log n) sorts*
55
* *1.14.0 for small array sorts*
66
* *1.13.1 for unstable random-access sorts, forward sorts, and the expensive move/cheap comparison benchmark*
7-
* *1.12.0 for measures of presortedness*
7+
* *1.12.0 for measures of disorder*
88
* *1.9.0 otherwise*
99

1010
Benchmarking is hard and I might not be doing it right. Moreover, benchmarking sorting algorithms highlights that the time needed to sort a collection of elements depends on several things: the type to sort, the size of the collection, the cost of comparing two values, the cost of moving an element, the patterns formed by the distribution of the values in the collection to sort, the type of the collection itself, etc. The aim of this page is to help you choose a sorting algorithm depending on your needs. You can find two main kinds of benchmarks: the ones that compare algorithms against shuffled collections of different sizes, and the ones that compare algorithms against different data patterns for a given collection size.
@@ -170,23 +170,23 @@ We can see several trends in these benchmarks, rather consistant across `int` an
170170
* [`low_comparisons_sorter`][low-comparisons-sorter] is second-best here but has a very limited range.
171171
* [`selection_sorter`][selection-sorter] and [`low_moves_sorter`][low-moves-sorter] are the worst contenders here. They are both different flavours of selection sorts, and as a results are pretty similar.
172172

173-
# Measures of presortedness
173+
# Measures of disorder
174174

175-
This benchmark for [measures of presortedness][measures-of-presortedness] is small and only intends to show the cost that these tools might incur. It is not meant to be exhaustive in any way.
175+
This benchmark for [measures of disorder][Measures-of-disorder] is small and only intends to show the cost that these tools might incur. It is not meant to be exhaustive in any way.
176176

177-
![Benchmark speed of measures of presortedness for increasing size for std::vector<int>](https://i.imgur.com/pjc7zJF.png)
177+
![Benchmark speed of measures of disorder for increasing size for std::vector<int>](https://i.imgur.com/pjc7zJF.png)
178178

179179
It makes rather easy to see the different groups of complexities:
180180
* *Run(X)* and *Mono(X)* are obvious O(n) algorithms.
181181
* *Dis(X)* is a more involved O(n) algorithm.
182-
* All of the other measures of presortedness run in O(n log n) time.
182+
* All of the other measures of disorder run in O(n log n) time.
183183

184184

185185
[fixed-size-sorters]: Fixed-size-sorters.md
186186
[insertion-sorter]: Sorters.md#insertion_sorter
187187
[low-comparisons-sorter]: Fixed-size-sorters.md#low_comparisons_sorter
188188
[low-moves-sorter]: Fixed-size-sorters.md#low_moves_sorter
189-
[measures-of-presortedness]: Measures-of-presortedness.md
189+
[Measures-of-disorder]: Measures-of-disorder.md
190190
[merge-exchange-network-sorter]: Fixed-size-sorters.md#merge_exchange_network_sorter
191191
[selection-sorter]: Sorters.md#selection_sorter
192192
[sorting-network-sorter]: Fixed-size-sorters.md#sorting_network_sorter

docs/Library-nomenclature.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -24,9 +24,11 @@
2424

2525
Note that the *sorters* (and virtually bery algorithm) in **cpp-sort** accept iterators that do not implement post-increment and post-decrement operations. The iterator categories accepted by the library are thus less restrictive than the ones mandated for the standard library.
2626

27-
* *Measure of presortedness*: also known as a *measure of disorder*, it corresponds to an algorithm telling how much a collection is already sorted. There isn't a single way to tell how much a collection is already sorted, one can for example count the number of inversions or the number of elements to remove to get a sorted subsequence. The main advantage of measures of presortedness are that some algorithms, known as *adaptative algorithms*, are known to be optimal for some of these measures, which means that they can advantage of the order that already exists in the collection in some way. **cpp-sort** provides a number of [measures of presortedness][measures-of-presortedness] in the namespace `cppsort::probe`:
27+
* *Measure of disorder*: a function used to estimate the amount of disorder in a sequence. There are many different to do that, such as counting the number of inversions in the sequence, or the number of elements to remove to get a sorted subsequence. **cpp-sort** provides a number of [measures of disorder][Measures-of-disorder] in the namespace `cppsort::probe`.
2828

29-
auto max_inversion = cppsort::probe::dis(collection);
29+
auto max_inversions = cppsort::probe::dis(collection);
30+
31+
* *Measure of presortedness*: a special kind of *measure of disorder* that satisfies a specific set of additional properties (see the page on *measures of disorder*). The overarching goal of those measures is to be able to estimate and reason about the number of steps required to sort a sequence of elements. Most notably, they allow to formally reason about *adaptive sorting algorithms*: given a measure of presortedness $M$, an $M$-adaptive (or $M$-optimal) sorting algorithm is an algorithm that can sort a sequence with a number of steps that is without a constant bound of the estimated minimal number of steps for the estimated disorder.
3032

3133
* *Metric*: as special kind of *sorter adapter* that returns information about sorted collections. See [the corresponding page][metrics] for additional information.
3234

@@ -65,7 +67,7 @@
6567

6668
* *Type-specific sorter*: some non-comparison sorters such as the [`spread_sorter`][spread-sorter] implement specific sorting algorithms which only work with some specific types (for example integers or strings).
6769

68-
* *Unified sorting interface*: *sorters*, *sorter adapters*, *measures of presortedness* and a few other components of the library accept a range or a pair of iterators, and optionally a comparison function and/or a comparison function. Those components typically rely on the library's [`sorter_facade`][sorter-facade] which handles the dispatching to the component's implementation and to handle a number of special cases. For simplicity, what is accepted by the `operator()` of such components is referred to as the *unified sorting interface* in the rest of the library.
70+
* *Unified sorting interface*: *sorters*, *sorter adapters*, *measures of disorder* and a few other components of the library accept a range or a pair of iterators, and optionally a comparison function and/or a comparison function. Those components typically rely on the library's [`sorter_facade`][sorter-facade] which handles the dispatching to the component's implementation and to handle a number of special cases. For simplicity, what is accepted by the `operator()` of such components is referred to as the *unified sorting interface* in the rest of the library.
6971

7072

7173
[comparators]: Comparators.md
@@ -75,7 +77,7 @@
7577
[iterator-categories]: https://en.cppreference.com/w/cpp/iterator
7678
[iterator-category]: Sorter-traits.md#iterator_category
7779
[iterator-tags]: https://en.cppreference.com/w/cpp/iterator/iterator_tags
78-
[measures-of-presortedness]: Measures-of-presortedness.md
80+
[Measures-of-disorder]: Measures-of-disorder.md
7981
[metrics]: Metrics.md
8082
[p0022]: https://wg21.link/P0022
8183
[radix-sort]: https://en.wikipedia.org/wiki/Radix_sort

0 commit comments

Comments
 (0)