Skip to content

Describe bottom-up poplar heap operations #3

@Morwenn

Description

@Morwenn

Inspired by bottom-up heapsort, it consists in changing the sift operation to perform a deep-dive to find the biggest leaf of a semipoplar while performing a single operation per level, then bubble up to find where the root must be sifted to transform the semipoplar into a proper poplar. Following what happens with bottom-up heapsort, it should lead to the following differences:

  • Fewer comparisons are performed on average (it might even change the upper bound).
  • More comparisons are performed for already sorted or almost sorted collections.
  • It is less cache-friendly and might be slower when comparisons are cheap.
  • We can use moves instead of swaps to reduce the number of moves performed (might also be done with original poplar sort?).

The only real difficulty compared to bottom-up heapsort is to compute the parent of a node. In order to do that we keep track in a bitset of whether we took left or right children when finding the leaf (shouldn't need to be bigger that the iterator's difference_type), then we can backtrack using this bitset.

Here is a very rough and unpolished preliminary implementation of the modified sift function (using cpp-sort utilities):

template<typename RandomAccessIterator, typename Size,
            typename Compare, typename Projection>
auto sift(RandomAccessIterator first, Size size,
            Compare compare, Projection projection)
    -> void
{
    using utility::iter_move;
    auto&& comp = utility::as_function(compare);
    auto&& proj = utility::as_function(projection);

    auto size_orig = size;

    if (size < 2) return;

    auto root = first + (size - 1);
    auto child_root1 = root - 1;
    auto child_root2 = first + (size / 2 - 1);

    // Find the smallest leaf
    int step = 0;
    unsigned long long track = 0;
    auto biggest = root;
    while (true) {
        if (comp(proj(*child_root1), proj(*child_root2))) {
            biggest = child_root2;
        } else {
            biggest = child_root1;
            track |= 1;
        }

        size /= 2;
        if (size < 2) break;

        child_root1 = biggest - 1;
        child_root2 = biggest - (size - size / 2);
        track <<= 1;
        ++step;
    }

    CPPSORT_ASSERT(size == 1);

    // Find where to place the root
    while (true) {
        if (comp(proj(*root), proj(*biggest))) break;

        if ((track & 1) == 0) {
            std::advance(biggest, size + 1);
        } else {
            ++biggest;
        }
        if (biggest == root) return;

        track >>= 1;
        --step;
        size = size * 2 + 1;
    }

    // Move nodes as needed until they're all in the right place
    auto tmp = iter_move(root);
    size = size_orig;
    child_root1 = root - 1;
    child_root2 = first + (size / 2 - 1);

    while (true) {
        auto child = (((track >> step) & 1) == 1) ? child_root1 : child_root2;
        *root = iter_move(child);
        if (child == biggest) {
            *biggest = std::move(tmp);
            return;
        }

        root = child;
        size /= 2;
        child_root1 = root - 1;
        child_root2 = root - (size - size / 2);
        --step;
    }
}

The following benchmark comparing the number of performed comparisons between top-down and bottom-up poplar sorts show a behaviour that seems to match the expected one:

Graph of top-down vs. bottom-up poplar sort against different patterns on 10^6 elements

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions