Skip to content

Commit db6c845

Browse files
authored
Merge pull request #673 from jlumpe/heaps-custom-ordering
Use custom orderings with heaps and nlargest/nsmallest
2 parents 2122aa2 + bfd5303 commit db6c845

File tree

7 files changed

+218
-50
lines changed

7 files changed

+218
-50
lines changed

Project.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
name = "DataStructures"
22
uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
3-
version = "0.18.4"
3+
version = "0.18.5"
4+
45

56
[deps]
67
Compat = "34da2185-b29b-5c13-b0c7-acf172513d20"

docs/src/heaps.md

Lines changed: 69 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -33,13 +33,13 @@ provides the following interface:
3333
```julia
3434
# Let `h` be a heap, `i` be a handle, and `v` be a value.
3535

36-
i = push!(h, v) # adds a value to the heap and and returns a handle to v
36+
i = push!(h, v) # adds a value to the heap and and returns a handle to v
3737

38-
update!(h, i, v) # updates the value of an element (referred to by the handle i)
38+
update!(h, i, v) # updates the value of an element (referred to by the handle i)
3939

40-
delete!(h, i) # deletes the node with handle i from the heap
40+
delete!(h, i) # deletes the node with handle i from the heap
4141

42-
v, i = top_with_handle(h) # returns the top value of a heap and its handle
42+
v, i = top_with_handle(h) # returns the top value of a heap and its handle
4343
```
4444

4545
Currently, both min/max versions of binary heap (type `BinaryHeap`) and
@@ -49,38 +49,52 @@ Examples of constructing a heap:
4949

5050
```julia
5151
h = BinaryMinHeap{Int}()
52-
h = BinaryMaxHeap{Int}() # create an empty min/max binary heap of integers
52+
h = BinaryMaxHeap{Int}() # create an empty min/max binary heap of integers
5353

5454
h = BinaryMinHeap([1,4,3,2])
55-
h = BinaryMaxHeap([1,4,3,2]) # create a min/max heap from a vector
55+
h = BinaryMaxHeap([1,4,3,2]) # create a min/max heap from a vector
5656

5757
h = MutableBinaryMinHeap{Int}()
58-
h = MutableBinaryMaxHeap{Int}() # create an empty mutable min/max heap
58+
h = MutableBinaryMaxHeap{Int}() # create an empty mutable min/max heap
5959

6060
h = MutableBinaryMinHeap([1,4,3,2])
61-
h = MutableBinaryMaxHeap([1,4,3,2]) # create a mutable min/max heap from a vector
61+
h = MutableBinaryMaxHeap([1,4,3,2]) # create a mutable min/max heap from a vector
6262
```
6363

64-
Heaps may be constructed with a custom ordering. One use case for custom orderings
65-
is to achieve faster performance with `Float` elements with the risk of random ordering
66-
if any elements are `NaN`. The provided `DataStructures.FasterForward` and
67-
`DataStructures.FasterReverse` orderings are optimized for this purpose.
68-
Custom orderings may also be used for defining the order of structs as heap elements.
64+
## Using alternate orderings
65+
66+
Heaps can also use alternate orderings apart from the default one defined by
67+
`Base.isless`. This is accomplished by passing an instance of `Base.Ordering`
68+
as the first argument to the constructor. The top of the heap will then be the
69+
element that comes first according to this ordering.
70+
71+
The following example uses 2-tuples to track the index of each element in the
72+
original array, but sorts only by the data value:
73+
6974
```julia
70-
h = BinaryHeap{Float64, DataStructures.FasterForward}() # faster min heap
71-
h = BinaryHeap{Float64, DataStructures.FasterReverse}() # faster max heap
75+
data = collect(enumerate(["foo", "bar", "baz"]))
7276

73-
h = MutableBinaryHeap{Float64, DataStructures.FasterForward}() # faster mutable min heap
74-
h = MutableBinaryHeap{Float64, DataStructures.FasterReverse}() # faster mutable max heap
77+
h1 = BinaryHeap(data) # Standard lexicographic ordering for tuples
78+
first(h1) # => (1, "foo")
7579

76-
h = BinaryHeap{MyStruct, MyStructOrdering}() # heap containing custom struct
80+
h2 = BinaryHeap(Base.By(last), data) # Order by 2nd element only
81+
first(h2) # => (2, "bar")
82+
```
83+
84+
If the ordering type is a singleton it can be passed as a type parameter to the
85+
constructor instead:
86+
87+
```julia
88+
BinaryHeap{T, O}() # => BinaryHeap{T}(O())
89+
MutableBinaryHeap{T, O}() # => MutableBinaryHeap{T}(O())
7790
```
7891

7992
## Min-max heaps
8093
Min-max heaps maintain the minimum _and_ the maximum of a set,
8194
allowing both to be retrieved in constant (`O(1)`) time.
8295
The min-max heaps in this package are subtypes of `AbstractMinMaxHeap <: AbstractHeap`
8396
and have the same interface as other heaps with the following additions:
97+
8498
```julia
8599
# Let h be a min-max heap, k an integer
86100
minimum(h) # return the smallest element
@@ -95,6 +109,7 @@ popmax!(h, k) # remove and return the largest k elements
95109
popall!(h) # remove and return all the elements, sorted smallest to largest
96110
popall!(h, o) # remove and return all the elements according to ordering o
97111
```
112+
98113
The usual `first(h)` and `pop!(h)` are defined to be `minimum(h)` and `popmin!(h)`,
99114
respectively.
100115

@@ -104,7 +119,7 @@ This package includes an implementation of a binary min-max heap (`BinaryMinMaxH
104119
105120
Examples:
106121
```julia
107-
h = BinaryMinMaxHeap{Int}() # create an empty min-max heap with integer values
122+
h = BinaryMinMaxHeap{Int}() # create an empty min-max heap with integer values
108123

109124
h = BinaryMinMaxHeap([1, 2, 3, 4]) # create a min-max heap from a vector
110125
```
@@ -115,13 +130,42 @@ Heaps can be used to extract the largest or smallest elements of an
115130
array without sorting the entire array first:
116131

117132
```julia
118-
nlargest(3, [0,21,-12,68,-25,14]) # => [68,21,14]
119-
nsmallest(3, [0,21,-12,68,-25,14]) # => [-25,-12,0]
133+
data = [0,21,-12,68,-25,14]
134+
nlargest(3, data) # => [68,21,14]
135+
nsmallest(3, data) # => [-25,-12,0]
136+
```
137+
138+
Both methods also support the `by` and `lt` keywords to customize the sort order,
139+
as in `Base.sort`:
140+
141+
```julia
142+
nlargest(3, data, by=x -> x^2) # => [68,-25,21]
143+
nsmallest(3, data, by=x -> x^2) # => [0,-12,14]
120144
```
121145

122-
Note that if the array contains floats and is free of NaN values,
123-
then the following alternatives may be used to achieve a 2x performance boost.
146+
The lower-level `DataStructures.nextreme` function takes a `Base.Ordering`
147+
instance as the first argument and returns the first `n` elements according to
148+
this ordering:
149+
150+
```julia
151+
DataStructures.nextreme(Base.Forward, n, a) # Equivalent to nsmallest(n, a)
124152
```
125-
DataStructures.nextreme(DataStructures.FasterReverse(), n, a) # faster nlargest(n, a)
126-
DataStructures.nextreme(DataStructures.FasterForward(), n, a) # faster nsmallest(n, a)
153+
154+
155+
# Improving performance with Float data
156+
157+
One use case for custom orderings is to achieve faster performance with `Float`
158+
elements with the risk of random ordering if any elements are `NaN`.
159+
The provided `DataStructures.FasterForward` and `DataStructures.FasterReverse`
160+
orderings are optimized for this purpose and may achive a 2x performance boost:
161+
162+
```julia
163+
h = BinaryHeap{Float64, DataStructures.FasterForward}() # faster min heap
164+
h = BinaryHeap{Float64, DataStructures.FasterReverse}() # faster max heap
165+
166+
h = MutableBinaryHeap{Float64, DataStructures.FasterForward}() # faster mutable min heap
167+
h = MutableBinaryHeap{Float64, DataStructures.FasterReverse}() # faster mutable max heap
168+
169+
DataStructures.nextreme(DataStructures.FasterReverse(), n, a) # faster nlargest(n, a)
170+
DataStructures.nextreme(DataStructures.FasterForward(), n, a) # faster nsmallest(n, a)
127171
```

src/heaps.jl

Lines changed: 18 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -129,37 +129,45 @@ function nextreme(ord::Base.Ordering, n::Int, arr::AbstractVector{T}) where T
129129
end
130130

131131
"""
132-
nlargest(n, arr)
132+
nlargest(n, arr; kw...)
133133
134134
Return the `n` largest elements of the array `arr`.
135135
136136
Equivalent to:
137-
sort(arr, order = Base.Reverse)[1:min(n, end)]
137+
sort(arr, kw..., rev=true)[1:min(n, end)]
138138
139139
Note that if `arr` contains floats and is free of NaN values,
140-
then the following alternative may be used to achieve 2x performance.
140+
then the following alternative may be used to achieve 2x performance:
141+
141142
DataStructures.nextreme(DataStructures.FasterReverse(), n, arr)
143+
142144
This faster version is equivalent to:
145+
143146
sort(arr, lt = >)[1:min(n, end)]
144147
"""
145-
function nlargest(n::Int, arr::AbstractVector)
146-
return nextreme(Base.Reverse, n, arr)
148+
function nlargest(n::Int, arr::AbstractVector; lt=isless, by=identity)
149+
order = Base.ReverseOrdering(Base.ord(lt, by, nothing))
150+
return nextreme(order, n, arr)
147151
end
148152

149153
"""
150-
nsmallest(n, arr)
154+
nsmallest(n, arr; kw...)
151155
152156
Return the `n` smallest elements of the array `arr`.
153157
154158
Equivalent to:
155-
sort(arr, order = Base.Forward)[1:min(n, end)]
159+
sort(arr; kw...)[1:min(n, end)]
156160
157161
Note that if `arr` contains floats and is free of NaN values,
158-
then the following alternative may be used to achieve 2x performance.
162+
then the following alternative may be used to achieve 2x performance:
163+
159164
DataStructures.nextreme(DataStructures.FasterForward(), n, arr)
165+
160166
This faster version is equivalent to:
167+
161168
sort(arr, lt = <)[1:min(n, end)]
162169
"""
163-
function nsmallest(n::Int, arr::AbstractVector)
164-
return nextreme(Base.Forward, n, arr)
170+
function nsmallest(n::Int, arr::AbstractVector; lt=isless, by=identity)
171+
order = Base.ord(lt, by, nothing)
172+
return nextreme(order, n, arr)
165173
end

src/heaps/binary_heap.jl

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,22 +34,30 @@ mutable struct BinaryHeap{T, O <: Base.Ordering} <: AbstractHeap{T}
3434
ordering::O
3535
valtree::Vector{T}
3636

37-
function BinaryHeap{T, O}() where {T,O}
38-
new{T,O}(O(), Vector{T}())
37+
function BinaryHeap{T}(ordering::Base.Ordering) where T
38+
new{T, typeof(ordering)}(ordering, Vector{T}())
3939
end
4040

41-
function BinaryHeap{T, O}(xs) where {T,O}
42-
ordering = O()
41+
function BinaryHeap{T}(ordering::Base.Ordering, xs::AbstractVector) where T
4342
valtree = heapify(xs, ordering)
44-
new{T,O}(ordering, valtree)
43+
new{T, typeof(ordering)}(ordering, valtree)
4544
end
4645
end
4746

47+
BinaryHeap(ordering::Base.Ordering, xs::AbstractVector{T}) where T = BinaryHeap{T}(ordering, xs)
48+
49+
# Constructors using singleton order types as type parameters rather than arguments
50+
BinaryHeap{T, O}() where {T, O<:Base.Ordering} = BinaryHeap{T}(O())
51+
BinaryHeap{T, O}(xs::AbstractVector) where {T, O<:Base.Ordering} = BinaryHeap{T}(O(), xs)
52+
53+
# Forward/reverse ordering type aliases
4854
const BinaryMinHeap{T} = BinaryHeap{T, Base.ForwardOrdering}
4955
const BinaryMaxHeap{T} = BinaryHeap{T, Base.ReverseOrdering}
56+
5057
BinaryMinHeap(xs::AbstractVector{T}) where T = BinaryMinHeap{T}(xs)
5158
BinaryMaxHeap(xs::AbstractVector{T}) where T = BinaryMaxHeap{T}(xs)
5259

60+
5361
#################################################
5462
#
5563
# interfaces

src/heaps/mutable_binary_heap.jl

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -160,26 +160,32 @@ mutable struct MutableBinaryHeap{T, O <: Base.Ordering} <: AbstractMutableHeap{T
160160
nodes::Vector{MutableBinaryHeapNode{T}}
161161
node_map::Vector{Int}
162162

163-
function MutableBinaryHeap{T, O}() where {T, O}
164-
ordering = O()
163+
function MutableBinaryHeap{T}(ordering::Base.Ordering) where T
165164
nodes = Vector{MutableBinaryHeapNode{T}}()
166165
node_map = Vector{Int}()
167-
new{T, O}(ordering, nodes, node_map)
166+
new{T, typeof(ordering)}(ordering, nodes, node_map)
168167
end
169168

170-
function MutableBinaryHeap{T, O}(xs::AbstractVector{T}) where {T, O}
171-
ordering = O()
169+
function MutableBinaryHeap{T}(ordering::Base.Ordering, xs::AbstractVector) where T
172170
nodes, node_map = _make_mutable_binary_heap(ordering, T, xs)
173-
new{T, O}(ordering, nodes, node_map)
171+
new{T, typeof(ordering)}(ordering, nodes, node_map)
174172
end
175173
end
176174

175+
MutableBinaryHeap(ordering::Base.Ordering, xs::AbstractVector{T}) where T = MutableBinaryHeap{T}(ordering, xs)
176+
177+
# Constructors using singleton order types as type parameters rather than arguments
178+
MutableBinaryHeap{T, O}() where {T, O<:Base.Ordering} = MutableBinaryHeap{T}(O())
179+
MutableBinaryHeap{T, O}(xs::AbstractVector) where {T, O<:Base.Ordering} = MutableBinaryHeap{T}(O(), xs)
180+
181+
# Forward/reverse ordering type aliases
177182
const MutableBinaryMinHeap{T} = MutableBinaryHeap{T, Base.ForwardOrdering}
178183
const MutableBinaryMaxHeap{T} = MutableBinaryHeap{T, Base.ReverseOrdering}
179184

180185
MutableBinaryMinHeap(xs::AbstractVector{T}) where T = MutableBinaryMinHeap{T}(xs)
181186
MutableBinaryMaxHeap(xs::AbstractVector{T}) where T = MutableBinaryMaxHeap{T}(xs)
182187

188+
183189
function Base.show(io::IO, h::MutableBinaryHeap)
184190
print(io, "MutableBinaryHeap(")
185191
nodes = h.nodes

0 commit comments

Comments
 (0)