Skip to content

Commit 764c2d4

Browse files
authored
Expose gfx::timmerge in the public interface (#36)
* Expose gfx::timmerge in the public interface * Improve the timmerge performance description * Use markdown code in README * Improve TimSort::merge() logging * Make setting initial vector size more efficient and explicit * AVERAGE => approx. average (trapezoidal rule) * Add a Wiki link to detailed bench_merge results * Fix MSVC build * Do not duplicate Result size * Optimize TimSort::merge * timmerge: do not skip assertions
1 parent 1656423 commit 764c2d4

File tree

13 files changed

+1078
-308
lines changed

13 files changed

+1078
-308
lines changed

README.md

Lines changed: 53 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,19 @@ can't fallback to a O(n log² n) algorithm when there isn't enough extra heap me
2626
type such as `void`.
2727

2828

29+
Merging sorted ranges efficiently is an important part of the TimSort algorithm. This library exposes its merge
30+
algorithm in the public API. According to the benchmarks, `gfx::timmerge` is slower than `std::inplace_merge` on
31+
heavily/randomly overlapping subranges of simple elements, but it is faster for complex elements such as `std::string`
32+
and on sparsely overlapping subranges. `gfx::timmerge` should be usable as a drop-in replacement for
33+
`std::inplace_merge`, with the difference that it can't fallback to a O(n log n) algorithm when there isn't enough
34+
extra heap memory available. Like `gfx::timsort`, `gfx::timmerge` can take a projection function and avoids using the
35+
postfix `++` or `--` operators.
36+
37+
2938
The full list of available signatures is as follows (in namespace `gfx`):
3039

3140
```cpp
32-
// Overloads taking a pair of iterators
41+
// timsort overloads taking a pair of iterators
3342

3443
template <typename RandomAccessIterator>
3544
void timsort(RandomAccessIterator const first, RandomAccessIterator const last);
@@ -42,7 +51,7 @@ template <typename RandomAccessIterator, typename Compare, typename Projection>
4251
void timsort(RandomAccessIterator const first, RandomAccessIterator const last,
4352
Compare compare, Projection projection);
4453

45-
// Overloads taking a range
54+
// timsort overloads taking a range
4655

4756
template <typename RandomAccessRange>
4857
void timsort(RandomAccessRange &range);
@@ -52,6 +61,20 @@ void timsort(RandomAccessRange &range, Compare compare);
5261

5362
template <typename RandomAccessRange, typename Compare, typename Projection>
5463
void timsort(RandomAccessRange &range, Compare compare, Projection projection);
64+
65+
// timmerge overloads
66+
67+
template <typename RandomAccessIterator>
68+
void timmerge(RandomAccessIterator first, RandomAccessIterator middle,
69+
RandomAccessIterator last);
70+
71+
template <typename RandomAccessIterator, typename Compare>
72+
void timmerge(RandomAccessIterator first, RandomAccessIterator middle,
73+
RandomAccessIterator last, Compare compare);
74+
75+
template <typename RandomAccessIterator, typename Compare, typename Projection>
76+
void timmerge(RandomAccessIterator first, RandomAccessIterator middle,
77+
RandomAccessIterator last, Compare compare, Projection projection);
5578
```
5679
5780
## EXAMPLE
@@ -102,7 +125,7 @@ conan install timsort/2.0.2
102125

103126
## DIAGNOSTICS & INFORMATION
104127

105-
A few configuration macros allow gfx::timsort to emit diagnostic, which might be helpful to diagnose issues:
128+
A few configuration macros allow `gfx::timsort` and `gfx::timmerge` to emit diagnostic, which might be helpful to diagnose issues:
106129
* Defining `GFX_TIMSORT_ENABLE_ASSERT` inserts assertions in key locations in the algorithm to avoid logic errors.
107130
* Defining `GFX_TIMSORT_ENABLE_LOG` inserts logs in key locations, which allow to follow more closely the flow of the algorithm.
108131

@@ -130,7 +153,7 @@ built with CMake:
130153
Benchmarks are available in the `benchmarks` subdirectory, and can be constructed directly by passing `BUILD_BENCHMARKS=ON`
131154
variable to CMake during the configuration step.
132155

133-
Example output (timing scale: sec.):
156+
Example bench_sort output (timing scale: sec.):
134157

135158
c++ -v
136159
Apple LLVM version 7.0.0 (clang-700.0.72)
@@ -171,3 +194,29 @@ Example output (timing scale: sec.):
171194
std::sort 0.402458
172195
std::stable_sort 2.436326
173196
timsort 0.298639
197+
198+
Example bench_merge output (timing scale: milliseconds; omitted detailed results for different
199+
middle iterator positions, reformatted to improve readability):
200+
201+
c++ -v
202+
Using built-in specs.
203+
...
204+
Target: x86_64-pc-linux-gnu
205+
...
206+
gcc version 10.2.0 (GCC)
207+
c++ -I ../include -Wall -Wextra -g -DNDEBUG -O2 -std=c++11 bench_merge.cpp -o bench_merge
208+
./bench_merge
209+
size 100000
210+
element type\algorithm: std::inplace_merge timmerge
211+
RANDOMIZED SEQUENCE
212+
[int] approx. average 33.404430 37.047990
213+
[std::string] approx. average 324.964249 210.297207
214+
REVERSED SEQUENCE
215+
[int] approx. average 11.441404 4.017482
216+
[std::string] approx. average 305.649503 114.773898
217+
SORTED SEQUENCE
218+
[int] approx. average 4.291098 0.105571
219+
[std::string] approx. average 158.238114 0.273858
220+
221+
Detailed bench_merge results for different middle iterator positions can be found at
222+
https://github.com/timsort/cpp-TimSort/wiki/Benchmark-results

benchmarks/CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11

2-
foreach(filename bench.cpp)
2+
foreach(filename bench_merge.cpp bench_sort.cpp)
33
get_filename_component(name ${filename} NAME_WE)
44
add_executable(${name} ${filename})
55
target_link_libraries(${name} PRIVATE gfx::timsort)

benchmarks/bench.cpp

Lines changed: 0 additions & 126 deletions
This file was deleted.

benchmarks/bench_merge.cpp

Lines changed: 130 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,130 @@
1+
/*
2+
* Copyright (c) 2021 Igor Kushnir <[email protected]>.
3+
*
4+
* SPDX-License-Identifier: MIT
5+
*/
6+
#include <cstdlib>
7+
#include <algorithm>
8+
#include <chrono>
9+
#include <iomanip>
10+
#include <iostream>
11+
#include <string>
12+
#include <valarray>
13+
#include <vector>
14+
#include <gfx/timsort.hpp>
15+
#include "benchmarker.hpp"
16+
17+
namespace
18+
{
19+
std::vector<int> generate_middle_positions(int size) {
20+
std::vector<int> result = {
21+
0, 1, 2, 5, 100, size/100, size/20, size/5, size/3, size/2, 3*size/4,
22+
6*size/7, 24*size/25, 90*size/91, size-85, size-8, size-2, size-1, size
23+
};
24+
25+
// The code below can remove or reorder elements if size is small.
26+
27+
auto logical_end = std::remove_if(result.begin(), result.end(), [size](int middle) {
28+
return middle < 0 || middle > size;
29+
});
30+
result.erase(logical_end, result.end());
31+
32+
std::sort(result.begin(), result.end());
33+
logical_end = std::unique(result.begin(), result.end());
34+
result.erase(logical_end, result.end());
35+
36+
return result;
37+
}
38+
39+
using Result = std::valarray<double>;
40+
Result zeroResult() { return Result(2); }
41+
}
42+
43+
template <typename value_t>
44+
struct Bench {
45+
void operator()(const std::vector<value_t> &source) const {
46+
const int size = static_cast<int>(source.size());
47+
const auto middle_positions = generate_middle_positions(size);
48+
49+
int prev_middle = 0;
50+
auto prev_result = zeroResult();
51+
auto result_sum = zeroResult();
52+
53+
std::cerr << "middle\\algorithm:\tstd::inplace_merge\ttimmerge" << std::endl;
54+
constexpr int width = 10;
55+
constexpr const char* padding = " \t";
56+
57+
std::vector<value_t> a(source.size());
58+
for (auto middle : middle_positions) {
59+
std::copy(source.begin(), source.end(), a.begin());
60+
std::sort(a.begin(), a.begin() + middle);
61+
std::sort(a.begin() + middle, a.end());
62+
const auto result = run(a, middle);
63+
64+
if (middle != prev_middle) {
65+
// Trapezoidal rule for approximating the definite integral.
66+
result_sum += 0.5 * (result + prev_result)
67+
* static_cast<double>(middle - prev_middle);
68+
prev_middle = middle;
69+
}
70+
prev_result = result;
71+
72+
std::cerr << std::setw(width) << middle
73+
<< " \t" << std::setw(width) << result[0]
74+
<< padding << std::setw(width) << result[1]
75+
<< std::endl;
76+
}
77+
78+
if (size != 0) {
79+
result_sum /= static_cast<double>(size);
80+
std::cerr << "approx. average"
81+
<< " \t" << std::setw(width) << result_sum[0]
82+
<< padding << std::setw(width) << result_sum[1]
83+
<< std::endl;
84+
}
85+
}
86+
87+
private:
88+
static Result run(const std::vector<value_t> &a, const int middle) {
89+
std::vector<value_t> b(a.size());
90+
const auto assert_is_sorted = [&b] {
91+
if (!std::is_sorted(b.cbegin(), b.cend())) {
92+
std::cerr << "Not sorted!" << std::endl;
93+
std::abort();
94+
}
95+
};
96+
97+
auto result = zeroResult();
98+
for (auto *total_time_ms : { &result[0], &result[1] }) {
99+
using Clock = std::chrono::steady_clock;
100+
decltype(Clock::now() - Clock::now()) total_time{0};
101+
102+
for (int i = 0; i < 100; ++i) {
103+
std::copy(a.begin(), a.end(), b.begin());
104+
const auto time_begin = Clock::now();
105+
106+
if (total_time_ms == &result[0]) {
107+
std::inplace_merge(b.begin(), b.begin() + middle, b.end());
108+
} else {
109+
gfx::timmerge(b.begin(), b.begin() + middle, b.end());
110+
}
111+
112+
const auto time_end = Clock::now();
113+
total_time += time_end - time_begin;
114+
115+
// Verifying that b is sorted should prevent the compiler from optimizing anything out.
116+
assert_is_sorted();
117+
}
118+
119+
*total_time_ms = std::chrono::duration_cast<
120+
std::chrono::microseconds>(total_time).count() / 1000.0;
121+
}
122+
return result;
123+
}
124+
};
125+
126+
int main(int argc, const char *argv[]) {
127+
const int size = argc > 1 ? std::stoi(argv[1]) : 100 * 1000;
128+
Benchmarker<Bench> benchmarker(size);
129+
benchmarker.run();
130+
}

0 commit comments

Comments
 (0)