Skip to content

Commit cb968f9

Browse files
committed
Update more docs: implementation and development
1 parent 8bf352c commit cb968f9

File tree

4 files changed

+69
-130
lines changed

4 files changed

+69
-130
lines changed

README.rst

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,18 @@ Python Sorted Containers
33

44
.. todo::
55

6-
* Review implementation page
7-
* Review development page
8-
* Re-run performance benchmarks
96
* Add __delitem__ to sorted dict views?
107
* Document migrating bintrees insert to soco
8+
* Rename github repo
9+
* Update coverage
10+
* Pass pylint
11+
* Update development page with test and coverage output.
12+
* Re-run performance benchmarks
1113
* Replace bintrees in
1214
* https://github.com/astropy/astropy
1315
* https://github.com/netzob/netzob
1416
* https://github.com/danpaquin/gdax-python
1517
* https://github.com/dyn4mik3/OrderBook/
16-
* Rename github repo
1718
* Tell Doug Hellmann about Sorted Containers and relation to bisect module
1819
* Send email update to Python announce list.
1920

docs/development.rst

Lines changed: 28 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,12 @@ Developing and Contributing
44
Collaborators are welcome!
55

66
#. Check for open issues or open a fresh issue to start a discussion around a
7-
bug. There is a Contributor Friendly tag for issues that should be used by
8-
people who are not very familiar with the codebase yet.
7+
bug.
98
#. Fork `the repository <https://github.com/grantjenks/sorted_containers>`_ on
109
GitHub and start making your changes to a new branch.
1110
#. Write a test which shows that the bug was fixed.
1211
#. Send a pull request and bug the maintainer until it gets merged and
13-
published. :)
12+
published :)
1413

1514
Development Lead
1615
----------------
@@ -28,32 +27,26 @@ Get the Code
2827
------------
2928

3029
:doc:`Sorted Containers<index>` is actively developed on GitHub, where the code
31-
is `always available <https://github.com/grantjenks/sorted_containers>`_.
32-
33-
You can either clone the public repository::
30+
is `open source`_. The recommended way to get a copy of the source repository
31+
is to clone the repository from GitHub::
3432

3533
$ git clone git://github.com/grantjenks/sorted_containers.git
3634

37-
Download the `tarball <https://github.com/grantjenks/sorted_containers/tarball/master>`_::
38-
39-
$ curl -OL https://github.com/grantjenks/sorted_containers/tarball/master
40-
41-
Or, download the `zipball <https://github.com/grantjenks/sorted_containers/zipball/master>`_::
42-
43-
$ curl -OL https://github.com/grantjenks/sorted_containers/zipball/master
35+
.. _`open source`: https://github.com/grantjenks/sorted_containers
4436

4537
Development Dependencies
4638
------------------------
4739

48-
Install development dependencies with `pip <http://www.pip-installer.org/>`_::
40+
Install development dependencies with `pip <https://pypi.org/project/pip/>`_::
4941

5042
$ pip install -r requirements.txt
5143

5244
This includes everything for building/running tests, benchmarks and
5345
documentation.
5446

55-
Note that installing the Banyan module on Windows requires `patching the source
56-
<https://code.google.com/p/banyan/issues/detail?id=3>`_ in a couple places.
47+
Some alternative implementations, such as `banyan`, may have issues when
48+
installing on Windows. You can still develop :doc:`Sorted Containers<index>`
49+
without these packages. They will be omitted from benchmarking.
5750

5851
Testing
5952
-------
@@ -64,84 +57,24 @@ simply run::
6457

6558
$ python setup.py test
6659

67-
The test argument to setup.py will download a minimal testing infrastructure
60+
The test argument to `setup.py` will download a minimal testing infrastructure
6861
and run the tests.
6962

7063
::
7164

72-
$ tox
73-
GLOB sdist-make: /repos/sorted_containers/setup.py
74-
py26 inst-nodeps: /repos/sorted_containers/.tox/dist/sortedcontainers-0.8.0.zip
75-
py26 runtests: PYTHONHASHSEED='1205144536'
76-
py26 runtests: commands[0] | nosetests
77-
...
78-
----------------------------------------------------------------------
79-
Ran 150 tests in 7.080s
80-
81-
OK
82-
py27 inst-nodeps: /repos/sorted_containers/.tox/dist/sortedcontainers-0.8.0.zip
83-
py27 runtests: PYTHONHASHSEED='1205144536'
84-
py27 runtests: commands[0] | nosetests
85-
...
86-
----------------------------------------------------------------------
87-
Ran 150 tests in 6.670s
88-
89-
OK
90-
py32 inst-nodeps: /repos/sorted_containers/.tox/dist/sortedcontainers-0.8.0.zip
91-
py32 runtests: PYTHONHASHSEED='1205144536'
92-
py32 runtests: commands[0] | nosetests
93-
...
94-
----------------------------------------------------------------------
95-
Ran 150 tests in 10.254s
96-
97-
OK
98-
py33 inst-nodeps: /repos/sorted_containers/.tox/dist/sortedcontainers-0.8.0.zip
99-
py33 runtests: PYTHONHASHSEED='1205144536'
100-
py33 runtests: commands[0] | nosetests
101-
...
102-
----------------------------------------------------------------------
103-
Ran 150 tests in 10.485s
104-
105-
OK
106-
py34 inst-nodeps: /repos/sorted_containers/.tox/dist/sortedcontainers-0.8.0.zip
107-
py34 runtests: PYTHONHASHSEED='1205144536'
108-
py34 runtests: commands[0] | nosetests
109-
...
110-
----------------------------------------------------------------------
111-
Ran 150 tests in 11.350s
112-
113-
OK
114-
___________________ summary _______________________
115-
py26: commands succeeded
116-
py27: commands succeeded
117-
py32: commands succeeded
118-
py33: commands succeeded
119-
py34: commands succeeded
120-
congratulations :)
121-
122-
Coverage testing uses `nose <https://nose.readthedocs.org>`_:
65+
$ python setup.py test
66+
<todo>
67+
68+
Coverage testing uses `pytest-cov <https://pypi.org/project/pytest-cov/>`_:
12369

12470
::
12571

126-
$ nosetests --with-coverage
127-
...................................................
128-
Name Stmts Miss Cover Missing
129-
-----------------------------------------------------------
130-
sortedcontainers 4 0 100%
131-
sortedcontainers.sorteddict 220 10 95% 18, 21, 96, 106, 115, 149, 158, 183, 220, 253
132-
sortedcontainers.sortedlist 452 1 99% 16
133-
sortedcontainers.sortedset 163 10 94% 51, 62, 65, 70, 75, 80, 84, 86, 88, 90
134-
-----------------------------------------------------------
135-
TOTAL 839 21 97%
136-
----------------------------------------------------------------------
137-
Ran 146 tests in 15.447s
138-
139-
OK
140-
141-
It's normal not to see 100% coverage. Some code is specific to the Python
142-
runtime.
143-
144-
Stress testing is also based on nose but can be run independently as a
72+
$ todo
73+
74+
It's normal to see coverage a little less than 100%. Some code is specific to
75+
the Python runtime.
76+
77+
Stress testing is also based on pytest but can be run independently as a
14578
module. Stress tests are kept in the tests directory and prefixed with
14679
test_stress. Stress tests accept two arguments: an iteration count and random
14780
seed value. For example, to run stress on the SortedList data type:
@@ -155,24 +88,24 @@ seed value. For example, to run stress on the SortedList data type:
15588
Exiting after 0:00:00.846000
15689

15790
If stress exits normally then it worked successfully. Some stress is run by tox
158-
and nose but the iteration count is limited at 1,000. More rigorous testing
91+
and pytest but the iteration count is limited at 1,000. More rigorous testing
15992
requires increasing the iteration count to millions. At that level, it's best
16093
to just let it run overnight. Stress testing will stop at the first failure.
16194

16295
Running Benchmarks
16396
------------------
16497

16598
Running and plotting benchmarks is a two step process. Each is a Python script
166-
in the tests directory. To run the benchmarks for SortedList, plot the results,
167-
and save the resulting graphs, run:
99+
in the tests directory. To run the benchmarks for :class:`SortedList`, plot the
100+
results, and save the resulting graphs, run:
168101

169102
::
170103

171104
$ python -m tests.benchmark_sortedlist --bare > tests/results_sortedlist.txt
172105
$ python -m tests.benchmark_plot tests/results_sortedlist.txt SortedList --save
173106

174107
Each script has a handful of useful arguments. Use ``--help`` to display
175-
those. Consult the source for details. The file ``tests/benchmark_plot.py``
108+
those. Consult the source for details. The file `tests/benchmark_plot.py`
176109
contains notes about benchmarking different Python runtimes against each other.
177110

178111
If you simply want to run the benchmarks to observe the performance on your
@@ -189,7 +122,8 @@ local machine, then run:
189122
$ python -m tests.benchmark_sortedset
190123

191124
The benchmarks will warn if some packages are not importable. This limits the
192-
possible comparisons. In all cases, you can install missing packages from PyPI.
125+
possible comparisons. See `requirements.txt` for the package names than can be
126+
installed from PyPI.
193127

194128
Tested Runtimes
195129
---------------
@@ -202,8 +136,9 @@ of Python:
202136
* CPython 3.3
203137
* CPython 3.4
204138
* CPython 3.5
139+
* CPython 3.6
205140
* PyPy
206141
* PyPy3
207142

208143
Life will feel much saner if you use `virtualenv <http://www.virtualenv.org/>`_
209-
to manage each of the runtimes.
144+
and `tox` to manage and test each of the runtimes.

docs/implementation.rst

Lines changed: 30 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,25 @@
11
Implementation Details
22
======================
33

4-
The :doc:`Sorted Containers<index>` internal implementation is based on a couple
5-
observations. The first is that Python lists are fast, *really fast*. They have
6-
great characteristics for memory management and random access. The second is
7-
that bisect.insort is fast. This is somewhat counter-intuitive since it
8-
involves shifting a series of items in a list. But modern processors do this
9-
really well. A lot of time has been spent optimizing mem-copy/mem-move-like
10-
operations both in hardware and software.
11-
12-
But using only one list and bisect.insort would produce sluggish behavior for
13-
lengths exceeding ten thousand. So the implementation of
14-
:doc:`SortedList<sortedlist>` uses a list of lists to store elements. In this
15-
way, inserting or deleting is most often performed on a short list. Only rarely
16-
does a new list need to be added or deleted.
17-
18-
:doc:`SortedList<sortedlist>` maintains three internal variables: ``_lists``,
19-
``_maxes``, and ``_index``. The first is simply the list of lists; each member
20-
is a sorted sublist of elements. The second contains the maximum element in
21-
each of the sublists. This is used for fast binary-search. The last maintains a
22-
tree of pair-wise sums of the lengths of the lists.
4+
The :doc:`Sorted Containers<index>` internal implementation is based on a
5+
couple observations. The first is that Python's `list` is fast, *really
6+
fast*. Lists have great characteristics for memory management and random
7+
access. The second is that `bisect.insort` is fast. This is somewhat
8+
counter-intuitive since it involves shifting a series of items in a list. But
9+
modern processors do this really well. A lot of time has been spent optimizing
10+
mem-copy/mem-move-like operations both in hardware and software.
11+
12+
But using only one list and `bisect.insort` would produce sluggish behavior for
13+
lengths exceeding ten thousand. So the implementation of :doc:`sortedlist` uses
14+
a list of lists to store elements. In this way, inserting or deleting is most
15+
often performed on a short list. Only rarely does a new list need to be added
16+
or deleted.
17+
18+
:doc:`sortedlist` maintains three internal variables: `_lists`, `_maxes`, and
19+
`_index`. The first is simply the list of lists, each member is a sorted
20+
sublist of elements. The second contains the maximum element in each of the
21+
sublists. This is used for fast binary-search. The last maintains a tree of
22+
pair-wise sums of the lengths of the lists.
2323

2424
Lists are kept balanced using the load factor. If a sublist's length exceeds
2525
double the load then it is split in two. Likewise at half the load it is
@@ -29,10 +29,10 @@ factor that is the square root to cube root of the average length. (Although
2929
you will probably exhaust the memory of your machine before that point.)
3030
Experimentation is also recommended. A :doc:`load factor performance
3131
comparison<performance-load>` is also provided. For more in-depth analysis,
32-
read :doc:`Performance at Scale<performance-scale>` which benchmarks
33-
:doc:`Sorted Containers<index>` with ten billion elements.
32+
read :doc:`performance-scale` which benchmarks :doc:`Sorted Containers<index>`
33+
with ten billion elements.
3434

35-
Finding an element is a two step process. First the ``_maxes`` list, also known
35+
Finding an element is a two step process. First the `_maxes` list, also known
3636
as the "maxes" index, is bisected which yields the position of a sorted
3737
sublist. Then that sublist is bisected for the location of the element.
3838

@@ -55,12 +55,12 @@ Traditional tree-based designs have better big-O notation but that ignores the
5555
realities of today's software and hardware. For a more in-depth analysis, read
5656
:doc:`Performance at Scale<performance-scale>`.
5757

58-
Indexing uses the ``_index`` list which operates as a tree of pair-wise sums of
58+
Indexing uses the `_index` list which operates as a tree of pair-wise sums of
5959
the lengths of the lists. The tree is maintained as a dense binary tree. It's
60-
easiest to explain with an example. Suppose ``_lists`` contains sublists with
60+
easiest to explain with an example. Suppose `_lists` contains sublists with
6161
these lengths (in this example, we assume the load factor is 4)::
6262

63-
map(len, _lists) -> [3, 5, 4, 5, 6]
63+
list(map(len, _lists)) -> [3, 5, 4, 5, 6]
6464

6565
Given these lengths, the first row in the index is the pair-wise sums::
6666

@@ -79,12 +79,10 @@ finally::
7979

8080
With this list, we can efficiently compute the index of an item in a sublist
8181
and, vice-versa, find an item given an index. Details of the algorithms to do
82-
so are contained in the docstring for ``SortedList._loc`` and
83-
``SortedList._pos``.
84-
82+
so are contained in the docstring for `SortedList._loc` and `SortedList._pos`.
8583

8684
For example, indexing requires traversing the tree to a leaf node. Each node
87-
has two children which are easily computable. Given an index, ``pos``, the
85+
has two children which are easily computable. Given an index, `pos`, the
8886
left-child is at ``pos * 2 + 1`` and the right-child is at ``pos * 2 + 2``.
8987

9088
When the index is less than the left-child, traversal moves to the left
@@ -125,7 +123,7 @@ the sorted list.
125123
Maintaining the position index in this way has several advantages:
126124

127125
* It's easy to traverse to children/parent. The children of a position in the
128-
``_index`` are at ``(pos * 2) + 1`` and ``(pos * 2) + 2``. The parent is at
126+
`_index` are at ``(pos * 2) + 1`` and ``(pos * 2) + 2``. The parent is at
129127
``(pos - 1) // 2``. We can even identify left/right-children easily. Each
130128
left-child is at an odd index and each right-child is at an even index.
131129

@@ -136,7 +134,7 @@ Maintaining the position index in this way has several advantages:
136134
all be done within C-routines in the Python interpreter.
137135

138136
* It's space efficient. The whole index is no more than twice the size of the
139-
length of the ``_lists`` and contains only integers.
137+
length of the `_lists` and contains only integers.
140138

141139
* It's easy to update. Adding or removing an item involves incrementing or
142140
decrementing only ``log2(len(_index))`` items in the index. The only caveat
@@ -148,5 +146,5 @@ to other traditional designs. Whether the design is novel, I (Grant Jenks) do
148146
not know. Until shown otherwise, I would like to refer to it as the "Jenks"
149147
index.
150148

151-
Each sorted container has a function named ``_check`` for verifying
149+
Each sorted container has a function named `_check` for verifying
152150
consistency. This function details the data-type invariants.

tox.ini

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,12 @@ deps=pytest
66
commands=python -m pytest
77

88
[pytest]
9-
addopts=--doctest-modules --doctest-glob "*.rst"
9+
addopts=
10+
--doctest-modules
11+
--doctest-glob "*.rst"
12+
--ignore tests/benchmark_plot.py
13+
--ignore tests/plot_lengths_histogram_add.py
14+
--ignore tests/plot_lengths_histogram_delitem.py
1015
testpaths=docs sortedcontainers tests
1116

1217
[testenv:lint]

0 commit comments

Comments
 (0)