Skip to content

Commit dbce271

Browse files
committed
Reimplement probe::enc as a true measure of presortedness
The old implementation recognizes that Enx(X) as proposed by Skienna does not match Mannila's first criterion for what makes a measures of presortedness, and uses Enc(X) - 1 instead. However, in doing so, it violates Mannila's fourth criterion. The new implementation uses M_Enc as proposed by Estivill-Castro in *Sorting and Measures of Disorder*, which satisfies all of Mannila's criteria.
1 parent 8457373 commit dbce271

File tree

4 files changed

+51
-21
lines changed

4 files changed

+51
-21
lines changed

docs/Measures-of-presortedness.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -152,13 +152,30 @@ When enough memory is available `probe::dis` runs in O(n) using an algorithm des
152152
#include <cpp-sort/probes/enc.h>
153153
```
154154

155-
Computes the number of encroaching lists that can be extracted from $X$ minus one (see *Encroaching lists as a measure of presortedness* by S. Skiena).
155+
Computes an approximation of the number of encroaching lists that can be extracted from $X$ (see *Encroaching lists as a measure of presortedness* by S. Skiena). Encroaching lists are better explained by their construction algorithm: create an empty list $L$ of lists given a sequence $X$ of elements, for each element $E$ of $X$:
156+
1. If $L$ is empty, create a new list with $E$.
157+
2. Otherwise, compare $E$ to the head and tail of the rightmost list of $L$.
158+
2.1 If $E$ is greater than the head, find the lefmost list that has a head smaller than $E$, and prepend $E$ to that list.
159+
2.2 Otherwise, it $E$ is smaller than the tail, find the lefmost list that has a tail greater than $E$, and append $E$ to that list.
160+
3.3 Otherwise append a new list to $L$ with the element $E$.
161+
162+
Those lists are called encroaching because the bounds of a given list "encroach" those of all lists on its right.
163+
164+
The number of encroaching lists does not satisfy the formal definition of a measure of presortedness because it returns $1$ for non-empty sorted sequences instead of $0$, which does not respect first Mannila's criterion. Using $Enc(X) - 1$ does not work either because it does not respect Mannila's fourth criterion. To circumvent these issues, `probe::enc` implements an equivalent measure of disorder $M_{Enc}$ proposed by V. Estivill-Castro in *Sorting and Measures of Disorder*, which satisfies all of Mannila's criteria for what makes a measure of presortedness:
165+
166+
$$
167+
M_{Enc}(X)=
168+
\begin{cases}
169+
0 & \text{if } X \text{ is sorted,}\\
170+
Enc(X_{tail}) & \text{otherwise, where } X_{tail} \text{ is } X \text{ without its leading ascending run.}
171+
\end{cases}
172+
$$
156173

157174
| Complexity | Memory | Iterators |
158175
| ----------- | ----------- | ------------- |
159176
| n log n | n | Forward |
160177

161-
`max_for_size`: $\frac{|X| + 1}{2} - 1$ when the values already extracted from $X$ constitute stronger bounds than the values yet to be extracted (for example the sequence $\langle 0, 9, 1, 8, 2, 7, 3, 6, 4, 5 \rangle$ will trigger the worst case).
178+
`max_for_size`: $\frac{|X|}{2}$ when all values extracted from $X$ are within the bounds of already extracted encroaching lists (for example the sequence $\langle 10, 0, 9, 1, 8, 2, 7, 3, 6, 4, 5 \rangle$ triggers the worst case).
162179

163180
### *Exc*
164181

include/cpp-sort/probes/enc.h

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@
1818
#include <cpp-sort/utility/as_function.h>
1919
#include <cpp-sort/utility/branchless_traits.h>
2020
#include <cpp-sort/utility/functional.h>
21+
#include "../detail/is_sorted_until.h"
2122
#include "../detail/iterator_traits.h"
2223
#include "../detail/lower_bound.h"
2324
#include "../detail/type_traits.h"
@@ -70,9 +71,11 @@ namespace cppsort::probe
7071
auto&& comp = utility::as_function(compare);
7172
auto&& proj = utility::as_function(projection);
7273

73-
if (first == last || std::next(first) == last) {
74-
return 0;
75-
}
74+
// Ignore the first monotonic run of the collection, technically
75+
// implements M_Enc as proposed by V. Estivill-Castro in *Sorting
76+
// and Measures of Disorder*
77+
first = cppsort::detail::is_sorted_until(first, last, compare, projection);
78+
if (first == last) return 0;
7679

7780
// Heads an tails of encroaching lists
7881
std::vector<std::pair<ForwardIterator, ForwardIterator>> lists;
@@ -109,14 +112,14 @@ namespace cppsort::probe
109112
++first;
110113
}
111114

112-
return lists.size() - 1;
115+
return lists.size();
113116
}
114117

115118
template<typename Integer>
116119
static constexpr auto max_for_size(Integer n)
117120
-> Integer
118121
{
119-
return n == 0 ? 0 : (n + 1) / 2 - 1;
122+
return n / 2;
120123
}
121124
};
122125
}

tests/probes/enc.cpp

Lines changed: 20 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -16,22 +16,34 @@ TEST_CASE( "presortedness measure: enc", "[probe][enc]" )
1616
SECTION( "simple test" )
1717
{
1818
const std::forward_list<int> li = { 4, 6, 5, 2, 9, 1, 3, 8, 0, 7 };
19-
CHECK( enc(li) == 2 );
20-
CHECK( enc(li.begin(), li.end()) == 2 );
19+
CHECK( enc(li) == 3 );
20+
CHECK( enc(li.begin(), li.end()) == 3 );
2121

2222
std::vector<internal_compare<int>> tricky(li.begin(), li.end());
23-
CHECK( enc(tricky, &internal_compare<int>::compare_to) == 2 );
23+
CHECK( enc(tricky, &internal_compare<int>::compare_to) == 3 );
2424
}
2525

2626
SECTION( "upper bound" )
2727
{
2828
// The upper bound should correspond to half the size
2929
// of the input sequence minus one
3030

31-
const std::forward_list<int> li = { 10, 0, 9, 1, 8, 2, 7, 3, 6, 4, 5 };
32-
auto max_n = enc.max_for_size(cppsort::utility::size(li));
33-
CHECK( max_n == 5 );
34-
CHECK( enc(li) == max_n );
35-
CHECK( enc(li.begin(), li.end()) == max_n );
31+
{
32+
// Even number of elements
33+
const std::forward_list<int> li = { 11, 10, 0, 9, 1, 8, 2, 7, 3, 6, 4, 5 };
34+
auto max_n = enc.max_for_size(cppsort::utility::size(li));
35+
CHECK( max_n == 6 );
36+
CHECK( enc(li) == max_n );
37+
CHECK( enc(li.begin(), li.end()) == max_n );
38+
}
39+
40+
{
41+
// Odd number of elements
42+
const std::forward_list<int> li = { 11, 10, 0, 9, 1, 8, 2, 7, 3, 6, 4 };
43+
auto max_n = enc.max_for_size(cppsort::utility::size(li));
44+
CHECK( max_n == 5 );
45+
CHECK( enc(li) == max_n );
46+
CHECK( enc(li.begin(), li.end()) == max_n );
47+
}
3648
}
3749
}

tests/probes/relations.cpp

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -109,10 +109,8 @@ TEST_CASE( "relations between measures of presortedness", "[probe]" )
109109
return enc(sequence) <= std::min<long long>(rem + 1, size - rem);
110110
});
111111

112-
rc::prop("(Enc(X) + 1) ≤ 2 Exc(X)", [](const std::vector<int>& sequence) {
113-
auto exc = cppsort::probe::exc(sequence);
114-
auto enc = cppsort::probe::enc(sequence);
115-
return (enc == 0 && exc == 0) || (enc + 1 <= 2 * exc);
112+
rc::prop("Enc(X) ≤ 2 Exc(X)", [](const std::vector<int>& sequence) {
113+
return cppsort::probe::enc(sequence) <= 2 * cppsort::probe::exc(sequence);
116114
});
117115

118116
rc::prop("Conjecture: Enc(X) ≤ Exc(X)", [](const std::vector<int>& sequence) {
@@ -130,8 +128,8 @@ TEST_CASE( "relations between measures of presortedness", "[probe]" )
130128
return sus(sequence) <= max(sequence);
131129
});
132130

133-
rc::prop("Enc(X) ≤ SUS(X)", [](const std::vector<int>& sequence) {
134-
return enc(sequence) <= sus(sequence);
131+
rc::prop("Enc(X) ≤ SUS(X) + 1", [](const std::vector<int>& sequence) {
132+
return enc(sequence) <= sus(sequence) + 1;
135133
});
136134

137135
// Heapsort - Adapted for Presorted Files

0 commit comments

Comments
 (0)