Skip to content

Commit 1ee43ef

Browse files
committed
Update performance comparison docs, fix names and formatting
1 parent 8826cce commit 1ee43ef

9 files changed

+281
-225
lines changed

README.rst

Lines changed: 3 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,10 @@ Python Sorted Containers
33

44
.. todo::
55

6-
* Rename github repo
7-
* Add sortedmap, using std::map in C++ standard library.
8-
https://pypi.org/project/sortedmap/
9-
* Research
10-
https://bitbucket.org/mojaves/pyskiplist/
11-
https://pypi.org/project/skipdict/
12-
https://github.com/tailhook/sortedsets
13-
* Re-run performance benchmarks
146
* Update history and document v3 milestone
7+
* Update landing page
8+
* Re-run performance benchmarks
9+
* Rename github repo
1510
* Tell Doug Hellmann about Sorted Containers and relation to bisect module
1611

1712
`Sorted Containers`_ is an Apache2 licensed `sorted collections library`_,

docs/performance-workload.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,8 +54,23 @@ interference while sorted list operations are performed. The frequency of each
5454
operation is also estimated because no projects had performance benchmarks that
5555
were easily evaluated.
5656

57+
The legends of the graphs below correlate the underlying data structure used to
58+
the Python project. The correlation is as follows:
59+
5760
.. currentmodule:: sortedcontainers
5861

62+
====================== ==================================
63+
Data Structure Project
64+
====================== ==================================
65+
:class:`SortedList` :doc:`Sorted Containers<index>`
66+
:class:`SortedKeyList` :doc:`Sorted Containers<index>`
67+
B-Tree `blist on PyPI`_
68+
List `sortedcollection recipe`_
69+
====================== ==================================
70+
71+
.. _`blist on PyPI`: https://pypi.org/project/blist/
72+
.. _`sortedcollection recipe`: http://code.activestate.com/recipes/577197-sortedcollection/
73+
5974
Sorted List
6075
-----------
6176

docs/performance.rst

Lines changed: 117 additions & 82 deletions
Original file line numberDiff line numberDiff line change
@@ -9,96 +9,96 @@ Containers<index>` is performance so we would be remiss not to produce this
99
page with comparisons.
1010

1111
The source for all benchmarks can be found under the "tests" directory in the
12-
files prefixed "benchmark." Measurements are made from the min, max, and mean
13-
of 5 repetitions. In the graphs below, the line follows the mean and at each
14-
point, the min/max displays the bounds. Note that the axes are log-log so
15-
properly reading two different lines would describe one metric as "X times"
16-
faster rather than "X seconds" faster. In all graphs, lower is
17-
better. Measurements are made by powers of ten: 100 through 1,000,000.
18-
19-
Measurements up to 10,000,000,000 elements have been successfully tested and
12+
files prefixed "benchmark." Measurements are made from the min, max, and median
13+
of 5 repetitions. In the graphs below, the line follows the median at each
14+
point. Note that the axes are log-log so properly reading two different lines
15+
would describe one metric as "X times" faster rather than "X seconds"
16+
faster. In all graphs, lower is better. Measurements are made by powers of ten:
17+
100 through 10,000,000.
18+
19+
Measurements up to ten billion elements have been successfully tested and
2020
benchmarked. Read :doc:`performance-scale` for details. Only a couple
2121
implementations (including :doc:`Sorted Containers<index>`) are capable of
2222
handling so many elements. The major limiting factor at that size is
2323
memory. Consider the simple case of storing CPython's integers in a
2424
:doc:`sortedlist`. Each integer object requires ~24 bytes so one hundred
2525
million elements will require about three gigabytes of memory. If the
26-
implemenation adds significant overhead then most systems will run out of
26+
implementation adds significant overhead then most systems will run out of
2727
memory. For all datasets which may be kept in memory, :doc:`Sorted
2828
Containers<index>` is an excellent choice.
2929

30-
A good effort has been made to find competing implementations. Six in total
30+
A good effort has been made to find competing implementations. Seven in total
3131
were found with various list, set, and dict implementations.
3232

33-
blist
34-
Provides list, dict, and set containers based on the blist data-type.
35-
Implemented in Python and C. Last updated March, 2014. `blist on PyPI
36-
<https://pypi.org/project/blist/>`_
37-
38-
bintrees
39-
Provides several tree-based implementations for dict and set containers.
40-
Fastest were AVL and Red-Black trees. Extends the conventional API to provide
41-
set operations for the dict type. Implemented in C. Last updated April, 2017.
42-
`bintrees on PyPI <https://pypi.org/project/bintrees/>`_
43-
44-
banyan
45-
Provides a fast, C++-implementation for dict and set data types. Offers some
46-
features also found in sortedcontainers like accessing the n-th item in a set
47-
or dict. Last updated April, 2013. `banyan on PyPI
48-
<https://pypi.org/project/Banyan/>`_
49-
50-
treap
51-
Uses Cython for improved performance and provides a dict container. Last
52-
updated June, 2017. `treap on PyPI <https://pypi.org/project/treap/>`_
53-
54-
skiplistcollections
55-
Pure-Python implementation based on skip-lists providing a limited API for
56-
dict and set types. Last updated January, 2014. `skiplistcollections on PyPI
57-
<https://pypi.org/project/skiplistcollections/>`_
58-
59-
sortedcollection
60-
Pure-Python implementation of sorted list based solely on a list.
61-
Feature-poor and inefficient for writes but included because it is written by
62-
Raymond Hettinger and linked from the official Python docs. Last updated
63-
April, 2011. `sortedcollection on ActiveState
64-
<http://code.activestate.com/recipes/577197-sortedcollection/>`_
65-
66-
Several competing implementations were omitted because they were not easily
67-
installable or failed to build.
68-
69-
rbtree
70-
C-implementation that only supports Python 2. Last updated
71-
March, 2012. Provides a fast, C-implementation for dict and set data types.
72-
`rbtree on PyPI <https://pypi.org/project/rbtree/>`_
73-
74-
ruamel.ordereddict.sorteddict
75-
C-implementation that only supports Python 2. Performance was measured in
76-
correspondence with the module author. Performance was generally very good
77-
except for ``__delitem__``. At scale, deleting entries became exceedingly
78-
slow. Last updated July, 2017. `ruamel.ordereddict on PyPI
79-
<https://pypi.org/project/ruamel.ordereddict/>`_
80-
81-
rbtree from NewCenturyComputers
82-
Pure-Python tree-based implementation. Not sure when this was last updated.
83-
Unlikely to be fast. `rbtree from NewCenturyComputers
84-
<http://newcenturycomputers.net/projects/rbtree.html>`_
85-
86-
python-avl-tree from Github user pgrafov
87-
Pure-Python tree-based implementation. Last updated October, 2010. Unlikely
88-
to be fast. `python-avl-tree from Github user pgrafov
89-
<https://github.com/pgrafov/python-avl-tree>`_
90-
91-
pyavl
92-
C-implementation for AVL tree-based dict and set containers. Claims to be
93-
fast. Lacking documentation and failed to build. Last updated December, 2008.
94-
`pyavl on PyPI <https://pypi.org/project/pyavl/>`_
95-
96-
Several projects have deprecated themselves in favor of :doc:`Sorted
97-
Containers<index>`. Most notably those are `bintrees
98-
<https://pypi.org/project/bintrees/>`_ and `sorteddict
99-
<https://pypi.org/project/sorteddict/>`_. All of the projects above also use
100-
Python 2 semantics for :doc:`sorteddict` data types. Wherever possible,
101-
:doc:`Sorted Containers<index>` has adopted Python 3 semantics.
33+
1. *blist* -- Provides list, dict, and set containers based on the blist
34+
data-type. Uses a `B-Tree`_ data structure. Implemented in Python and C. BSD
35+
License. Last updated March, 2014. `blist on PyPI`_
36+
37+
2. *bintrees* -- Provides several tree-based implementations for dict and set
38+
containers. Fastest were AVL-Tree and Red-Black-Tree data
39+
structures.. Extends the conventional API to provide set operations for the
40+
dict type. Now deprecated in favor of :doc:`Sorted Containers<index>`
41+
Implemented in C. MIT License. Last updated April, 2017. `bintrees on
42+
PyPI`_
43+
44+
3. *sortedmap* -- Provides a fast, C++ implemenation for dict data types. Uses
45+
the C++ standard library `std::map` data structure which is usually a
46+
red-black tree. Last updated February, 2016. `sortedmap on PyPI`_
47+
48+
4. *banyan* -- Provides a fast, C++ implementation for dict and set data
49+
types. Offers some features also found in sortedcontainers like accessing
50+
the n-th item in a set or dict. Uses sources from the `tree implementation`_
51+
in GNU libstdc++. GPLv3 License. Last updated April, 2013. `banyan on PyPI`_
52+
53+
5. *treap* -- Uses Cython for improved performance and provides a dict
54+
container. Apache V2 License. Last updated June, 2017. `treap on PyPI`_
55+
56+
6. *skiplistcollections* -- Pure-Python implementation based on skip-lists
57+
providing a limited API for dict and set types. MIT License. Last updated
58+
January, 2014. `skiplistcollections on PyPI`_
59+
60+
7. *sortedcollection* -- Pure-Python implementation of sorted list based solely
61+
on a list. Feature-poor and inefficient for writes but included because it
62+
is written by Raymond Hettinger and linked from the official Python
63+
docs. MIT License. Last updated April, 2011. `sortedcollection recipe`_
64+
65+
Several alternative implementations were omitted for reasons documented below:
66+
67+
A. *rbtree* -- C-implementation that only supports Python 2. Provides a fast,
68+
C-implementation for dict and set data types. GPLv3 License. Last updated
69+
March, 2012. `rbtree on PyPI`_
70+
71+
B. *ruamel.ordereddict.sorteddict* -- C-implementation that only supports
72+
Python 2. Performance was measured in correspondence with the module
73+
author. Performance was generally very good except for ``__delitem__``. At
74+
scale, deleting entries became exceedingly slow. MIT License. Last updated
75+
July, 2017. `ruamel.ordereddict on PyPI`_
76+
77+
C. *pyskiplist* -- Pure-Python skip-list based implementation supporting a
78+
sorted-list-like interface. Now deprecated in favor of :doc:`Sorted
79+
Containers<index>`. MIT License. Last updated July, 2015. `pyskiplist on
80+
PyPI`_
81+
82+
D. *sorteddict* -- Pure-Python lazily-computed sorted dict implementation. Now
83+
deprecated in favor of :doc:`Sorted Containers<index>`. GPLv3 License. Last
84+
updated September, 2007. `sorteddict on PyPI`_
85+
86+
E. *rbtree from NewCenturyComputers* -- Pure-Python tree-based
87+
implementation. Not sure when this was last updated. Unlikely to be
88+
fast. Unknown license. Unknown last update. `rbtree from
89+
NewCenturyComputers`_
90+
91+
F. *python-avl-tree from GitHub user pgrafov* -- Pure-Python tree-based
92+
implementation. Unlikely to be fast. MIT License. Last updated
93+
October, 2010. `python-avl-tree from GitHub user pgrafov`_
94+
95+
G. *pyavl* -- C-implementation for AVL tree-based dict and set
96+
containers. Claims to be fast. Lacking documentation and failed to
97+
build. Public Domain License. Last updated December, 2008. `pyavl on PyPI`_
98+
99+
H. *skiplist* -- C-implementation of sorted list based on skip-list data
100+
structure. Only supports Python 2. Zlib/libpng License. Last updated
101+
Septemeber, 2013. `skiplist from Bitbucket user mojaves`_
102102

103103
The most similar module to :doc:`Sorted Containers<index>` is
104104
skiplistcollections given that each is implemented in Python. But as is
@@ -123,16 +123,51 @@ been made to simulate real-world workloads. The :doc:`simulated workload
123123
performance comparison<performance-workload>` contains examples with
124124
comparisons to other implementations, load factors, and runtimes.
125125

126-
A couple final notes about the graphs below. Missing data indicates the
127-
benchmark either took too long or failed. The set operations with tiny, small,
128-
medium, and large variations indicate the size of the container involved in the
126+
Some final notes about the graphs below. Missing data indicates the benchmark
127+
either took too long or failed. The set operations with tiny, small, medium,
128+
and large variations indicate the size of the container involved in the
129129
right-hand-side of the operation: tiny is exactly 10 elements; small is 10% of
130130
the size of the left-hand-side; medium is 50%; and large is 100%. :doc:`Sorted
131131
Containers<index>` uses a different algorithm based on the size of the
132132
right-hand-side of the operation for a dramatic improvement in performance.
133133

134+
The legends of the graphs below correlate the underlying data structure used
135+
the Python project. The correlation is as follows:
136+
134137
.. currentmodule:: sortedcontainers
135138

139+
====================== ==================================
140+
Data Structure Project
141+
====================== ==================================
142+
:class:`SortedList` :doc:`Sorted Containers<index>`
143+
:class:`SortedKeyList` :doc:`Sorted Containers<index>`
144+
B-Tree `blist on PyPI`_
145+
List `sortedcollection recipe`_
146+
AVL-Tree `bintrees on PyPI`_
147+
RB-Tree `banyan on PyPI`_
148+
Skip-List `skiplistcollections on PyPI`_
149+
std::map `sortedmap on PyPI`_
150+
Treap `treap on PyPI`_
151+
====================== ==================================
152+
153+
.. _`B-Tree`: https://en.wikipedia.org/wiki/B-tree
154+
.. _`blist on PyPI`: https://pypi.org/project/blist/
155+
.. _`bintrees on PyPI`: https://pypi.org/project/bintrees/
156+
.. _`sortedmap on PyPI`: https://pypi.org/project/sortedmap/
157+
.. _`sorteddict on PyPI`: https://pypi.org/project/sorteddict/
158+
.. _`pyskiplist on PyPI`: https://pypi.org/project/pyskiplist/
159+
.. _`banyan on PyPI`: https://pypi.org/project/Banyan/
160+
.. _`treap on PyPI`: https://pypi.org/project/treap/
161+
.. _`skiplistcollections on PyPI`: https://pypi.org/project/skiplistcollections/
162+
.. _`sortedcollection recipe`: http://code.activestate.com/recipes/577197-sortedcollection/
163+
.. _`rbtree on PyPI`: https://pypi.org/project/rbtree/
164+
.. _`ruamel.ordereddict on PyPI`: https://pypi.org/project/ruamel.ordereddict/
165+
.. _`rbtree from NewCenturyComputers`: http://newcenturycomputers.net/projects/rbtree.html
166+
.. _`python-avl-tree from GitHub user pgrafov`: https://github.com/pgrafov/python-avl-tree
167+
.. _`pyavl on PyPI`: https://pypi.org/project/pyavl/
168+
.. _`skiplist from Bitbucket user mojaves`: https://bitbucket.org/mojaves/pyskiplist/
169+
.. _`tree implementation`: https://gcc.gnu.org/onlinedocs/libstdc%2B%2B/ext/pb_ds/tree_based_containers.html
170+
136171
Sorted List
137172
-----------
138173

tests/benchmark.py

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,13 @@ def measure(test, func, size):
2626

2727
def benchmark(test, name, ctor, setup, func_name, limit):
2828
if args.load > 0:
29-
if name == 'SortedDict':
30-
ctor = partial(ctor, args.load)
31-
else:
32-
ctor = partial(ctor, load=args.load)
29+
load = args.load
30+
ctor_original = ctor
31+
def ctor_load():
32+
obj = ctor_original()
33+
obj._reset(load)
34+
return obj
35+
ctor = ctor_load
3336

3437
for size in sizes:
3538
if not args.no_limit and size > limit:
@@ -45,14 +48,16 @@ def benchmark(test, name, ctor, setup, func_name, limit):
4548
# record
4649

4750
times = []
51+
4852
for rpt in range(5):
4953
obj = ctor()
5054
setup(obj, size)
5155
func = getattr(obj, func_name)
5256
times.append(measure(test, func, size))
5357

54-
print(getattr(test, name_attr), name + args.suffix, size, min(times),
55-
max(times), times[2], sum(times) / len(times))
58+
times.sort()
59+
print(getattr(test, name_attr), name + args.suffix, size, times[0],
60+
times[-1], times[2], sum(times) / len(times))
5661

5762
def register_test(func):
5863
tests[getattr(func, name_attr)] = func
@@ -96,7 +101,7 @@ def main(name):
96101
detail('Seed:', args.seed)
97102
random.seed(args.seed)
98103

99-
sizes.extend(args.size or [100, 1000, 10000, 100000, 1000000, 10000000])
104+
sizes.extend(args.size or [100, 1000]) # , 10000, 100000, 1000000, 10000000])
100105

101106
detail('Sizes:', sizes)
102107

0 commit comments

Comments
 (0)