Update more docs: implementation and development

grantjenks · grantjenks · commit cb968f95aadd · 2018-05-15T16:10:13.000-07:00
diff --git a/README.rst b/README.rst
@@ -3,17 +3,18 @@ Python Sorted Containers
 
 .. todo::
 
-   * Review implementation page
-   * Review development page
-   * Re-run performance benchmarks
    * Add __delitem__ to sorted dict views?
    * Document migrating bintrees insert to soco
+   * Rename github repo
+   * Update coverage
+   * Pass pylint
+   * Update development page with test and coverage output.
+   * Re-run performance benchmarks
    * Replace bintrees in
      * https://github.com/astropy/astropy
      * https://github.com/netzob/netzob
      * https://github.com/danpaquin/gdax-python
      * https://github.com/dyn4mik3/OrderBook/
-   * Rename github repo
    * Tell Doug Hellmann about Sorted Containers and relation to bisect module
    * Send email update to Python announce list.
 
diff --git a/docs/development.rst b/docs/development.rst
@@ -4,13 +4,12 @@ Developing and Contributing
 Collaborators are welcome!
 
 #. Check for open issues or open a fresh issue to start a discussion around a
-   bug.  There is a Contributor Friendly tag for issues that should be used by
-   people who are not very familiar with the codebase yet.
+   bug.
 #. Fork `the repository <https://github.com/grantjenks/sorted_containers>`_ on
    GitHub and start making your changes to a new branch.
 #. Write a test which shows that the bug was fixed.
 #. Send a pull request and bug the maintainer until it gets merged and
-   published. :)
+   published :)
 
 Development Lead
 ----------------
@@ -28,32 +27,26 @@ Get the Code
 ------------
 
 :doc:`Sorted Containers<index>` is actively developed on GitHub, where the code
-is `always available <https://github.com/grantjenks/sorted_containers>`_.
-
-You can either clone the public repository::
+is `open source`_. The recommended way to get a copy of the source repository
+is to clone the repository from GitHub::
 
     $ git clone git://github.com/grantjenks/sorted_containers.git
 
-Download the `tarball <https://github.com/grantjenks/sorted_containers/tarball/master>`_::
-
-    $ curl -OL https://github.com/grantjenks/sorted_containers/tarball/master
-
-Or, download the `zipball <https://github.com/grantjenks/sorted_containers/zipball/master>`_::
-
-    $ curl -OL https://github.com/grantjenks/sorted_containers/zipball/master
+.. _`open source`: https://github.com/grantjenks/sorted_containers
 
 Development Dependencies
 ------------------------
 
-Install development dependencies with `pip <http://www.pip-installer.org/>`_::
+Install development dependencies with `pip <https://pypi.org/project/pip/>`_::
 
     $ pip install -r requirements.txt
 
 This includes everything for building/running tests, benchmarks and
 documentation.
 
-Note that installing the Banyan module on Windows requires `patching the source
-<https://code.google.com/p/banyan/issues/detail?id=3>`_ in a couple places.
+Some alternative implementations, such as `banyan`, may have issues when
+installing on Windows. You can still develop :doc:`Sorted Containers<index>`
+without these packages. They will be omitted from benchmarking.
 
 Testing
 -------
@@ -64,84 +57,24 @@ simply run::
 
     $ python setup.py test
 
-The test argument to setup.py will download a minimal testing infrastructure
+The test argument to `setup.py` will download a minimal testing infrastructure
 and run the tests.
 
 ::
 
-    $ tox
-    GLOB sdist-make: /repos/sorted_containers/setup.py
-    py26 inst-nodeps: /repos/sorted_containers/.tox/dist/sortedcontainers-0.8.0.zip
-    py26 runtests: PYTHONHASHSEED='1205144536'
-    py26 runtests: commands[0] | nosetests
-    ...
-    ----------------------------------------------------------------------
-    Ran 150 tests in 7.080s
-
-    OK
-    py27 inst-nodeps: /repos/sorted_containers/.tox/dist/sortedcontainers-0.8.0.zip
-    py27 runtests: PYTHONHASHSEED='1205144536'
-    py27 runtests: commands[0] | nosetests
-    ...
-    ----------------------------------------------------------------------
-    Ran 150 tests in 6.670s
-
-    OK
-    py32 inst-nodeps: /repos/sorted_containers/.tox/dist/sortedcontainers-0.8.0.zip
-    py32 runtests: PYTHONHASHSEED='1205144536'
-    py32 runtests: commands[0] | nosetests
-    ...
-    ----------------------------------------------------------------------
-    Ran 150 tests in 10.254s
-
-    OK
-    py33 inst-nodeps: /repos/sorted_containers/.tox/dist/sortedcontainers-0.8.0.zip
-    py33 runtests: PYTHONHASHSEED='1205144536'
-    py33 runtests: commands[0] | nosetests
-    ...
-    ----------------------------------------------------------------------
-    Ran 150 tests in 10.485s
-
-    OK
-    py34 inst-nodeps: /repos/sorted_containers/.tox/dist/sortedcontainers-0.8.0.zip
-    py34 runtests: PYTHONHASHSEED='1205144536'
-    py34 runtests: commands[0] | nosetests
-    ...
-    ----------------------------------------------------------------------
-    Ran 150 tests in 11.350s
-
-    OK
-    ___________________ summary _______________________
-      py26: commands succeeded
-      py27: commands succeeded
-      py32: commands succeeded
-      py33: commands succeeded
-      py34: commands succeeded
-      congratulations :)
-
-Coverage testing uses `nose <https://nose.readthedocs.org>`_:
+    $ python setup.py test
+    <todo>
+
+Coverage testing uses `pytest-cov <https://pypi.org/project/pytest-cov/>`_:
 
 ::
 
-    $ nosetests --with-coverage
-    ...................................................
-    Name                          Stmts   Miss  Cover   Missing
-    -----------------------------------------------------------
-    sortedcontainers                  4      0   100%
-    sortedcontainers.sorteddict     220     10    95%   18, 21, 96, 106, 115, 149, 158, 183, 220, 253
-    sortedcontainers.sortedlist     452      1    99%   16
-    sortedcontainers.sortedset      163     10    94%   51, 62, 65, 70, 75, 80, 84, 86, 88, 90
-    -----------------------------------------------------------
-    TOTAL                           839     21    97%
-    ----------------------------------------------------------------------
-    Ran 146 tests in 15.447s
-
-    OK
-
-It's normal not to see 100% coverage. Some code is specific to the Python
-runtime.
-
-Stress testing is also based on nose but can be run independently as a
+    $ todo
+
+It's normal to see coverage a little less than 100%. Some code is specific to
+the Python runtime.
+
+Stress testing is also based on pytest but can be run independently as a
 module. Stress tests are kept in the tests directory and prefixed with
 test_stress. Stress tests accept two arguments: an iteration count and random
 seed value. For example, to run stress on the SortedList data type:
@@ -155,24 +88,24 @@ seed value. For example, to run stress on the SortedList data type:
     Exiting after 0:00:00.846000
 
 If stress exits normally then it worked successfully. Some stress is run by tox
-and nose but the iteration count is limited at 1,000. More rigorous testing
+and pytest but the iteration count is limited at 1,000. More rigorous testing
 requires increasing the iteration count to millions. At that level, it's best
 to just let it run overnight. Stress testing will stop at the first failure.
 
 Running Benchmarks
 ------------------
 
 Running and plotting benchmarks is a two step process. Each is a Python script
-in the tests directory. To run the benchmarks for SortedList, plot the results,
-and save the resulting graphs, run:
+in the tests directory. To run the benchmarks for :class:`SortedList`, plot the
+results, and save the resulting graphs, run:
 
 ::
 
     $ python -m tests.benchmark_sortedlist --bare > tests/results_sortedlist.txt
     $ python -m tests.benchmark_plot tests/results_sortedlist.txt SortedList --save
 
 Each script has a handful of useful arguments. Use ``--help`` to display
-those. Consult the source for details. The file ``tests/benchmark_plot.py``
+those. Consult the source for details. The file `tests/benchmark_plot.py`
 contains notes about benchmarking different Python runtimes against each other.
 
 If you simply want to run the benchmarks to observe the performance on your
@@ -189,7 +122,8 @@ local machine, then run:
     $ python -m tests.benchmark_sortedset
 
 The benchmarks will warn if some packages are not importable. This limits the
-possible comparisons. In all cases, you can install missing packages from PyPI.
+possible comparisons. See `requirements.txt` for the package names than can be
+installed from PyPI.
 
 Tested Runtimes
 ---------------
@@ -202,8 +136,9 @@ of Python:
 * CPython 3.3
 * CPython 3.4
 * CPython 3.5
+* CPython 3.6
 * PyPy
 * PyPy3
 
 Life will feel much saner if you use `virtualenv <http://www.virtualenv.org/>`_
-to manage each of the runtimes.
+and `tox` to manage and test each of the runtimes.
diff --git a/docs/implementation.rst b/docs/implementation.rst
@@ -1,25 +1,25 @@
 Implementation Details
 ======================
 
-The :doc:`Sorted Containers<index>` internal implementation is based on a couple
-observations. The first is that Python lists are fast, *really fast*. They have
-great characteristics for memory management and random access. The second is
-that bisect.insort is fast. This is somewhat counter-intuitive since it
-involves shifting a series of items in a list. But modern processors do this
-really well. A lot of time has been spent optimizing mem-copy/mem-move-like
-operations both in hardware and software.
-
-But using only one list and bisect.insort would produce sluggish behavior for
-lengths exceeding ten thousand. So the implementation of
-:doc:`SortedList<sortedlist>` uses a list of lists to store elements. In this
-way, inserting or deleting is most often performed on a short list. Only rarely
-does a new list need to be added or deleted.
-
-:doc:`SortedList<sortedlist>` maintains three internal variables: ``_lists``,
-``_maxes``, and ``_index``. The first is simply the list of lists; each member
-is a sorted sublist of elements. The second contains the maximum element in
-each of the sublists. This is used for fast binary-search. The last maintains a
-tree of pair-wise sums of the lengths of the lists.
+The :doc:`Sorted Containers<index>` internal implementation is based on a
+couple observations. The first is that Python's `list` is fast, *really
+fast*. Lists have great characteristics for memory management and random
+access. The second is that `bisect.insort` is fast. This is somewhat
+counter-intuitive since it involves shifting a series of items in a list. But
+modern processors do this really well. A lot of time has been spent optimizing
+mem-copy/mem-move-like operations both in hardware and software.
+
+But using only one list and `bisect.insort` would produce sluggish behavior for
+lengths exceeding ten thousand. So the implementation of :doc:`sortedlist` uses
+a list of lists to store elements. In this way, inserting or deleting is most
+often performed on a short list. Only rarely does a new list need to be added
+or deleted.
+
+:doc:`sortedlist` maintains three internal variables: `_lists`, `_maxes`, and
+`_index`. The first is simply the list of lists, each member is a sorted
+sublist of elements. The second contains the maximum element in each of the
+sublists. This is used for fast binary-search. The last maintains a tree of
+pair-wise sums of the lengths of the lists.
 
 Lists are kept balanced using the load factor. If a sublist's length exceeds
 double the load then it is split in two. Likewise at half the load it is
@@ -29,10 +29,10 @@ factor that is the square root to cube root of the average length.  (Although
 you will probably exhaust the memory of your machine before that point.)
 Experimentation is also recommended. A :doc:`load factor performance
 comparison<performance-load>` is also provided. For more in-depth analysis,
-read :doc:`Performance at Scale<performance-scale>` which benchmarks
-:doc:`Sorted Containers<index>` with ten billion elements.
+read :doc:`performance-scale` which benchmarks :doc:`Sorted Containers<index>`
+with ten billion elements.
 
-Finding an element is a two step process. First the ``_maxes`` list, also known
+Finding an element is a two step process. First the `_maxes` list, also known
 as the "maxes" index, is bisected which yields the position of a sorted
 sublist. Then that sublist is bisected for the location of the element.
 
@@ -55,12 +55,12 @@ Traditional tree-based designs have better big-O notation but that ignores the
 realities of today's software and hardware. For a more in-depth analysis, read
 :doc:`Performance at Scale<performance-scale>`.
 
-Indexing uses the ``_index`` list which operates as a tree of pair-wise sums of
+Indexing uses the `_index` list which operates as a tree of pair-wise sums of
 the lengths of the lists. The tree is maintained as a dense binary tree. It's
-easiest to explain with an example. Suppose ``_lists`` contains sublists with
+easiest to explain with an example. Suppose `_lists` contains sublists with
 these lengths (in this example, we assume the load factor is 4)::
 
-    map(len, _lists) -> [3, 5, 4, 5, 6]
+    list(map(len, _lists)) -> [3, 5, 4, 5, 6]
 
 Given these lengths, the first row in the index is the pair-wise sums::
 
@@ -79,12 +79,10 @@ finally::
 
 With this list, we can efficiently compute the index of an item in a sublist
 and, vice-versa, find an item given an index. Details of the algorithms to do
-so are contained in the docstring for ``SortedList._loc`` and
-``SortedList._pos``.
-
+so are contained in the docstring for `SortedList._loc` and `SortedList._pos`.
 
 For example, indexing requires traversing the tree to a leaf node. Each node
-has two children which are easily computable. Given an index, ``pos``, the
+has two children which are easily computable. Given an index, `pos`, the
 left-child is at ``pos * 2 + 1`` and the right-child is at ``pos * 2 + 2``.
 
 When the index is less than the left-child, traversal moves to the left
@@ -125,7 +123,7 @@ the sorted list.
 Maintaining the position index in this way has several advantages:
 
 * It's easy to traverse to children/parent. The children of a position in the
-  ``_index`` are at ``(pos * 2) + 1`` and ``(pos * 2) + 2``. The parent is at
+  `_index` are at ``(pos * 2) + 1`` and ``(pos * 2) + 2``. The parent is at
   ``(pos - 1) // 2``. We can even identify left/right-children easily. Each
   left-child is at an odd index and each right-child is at an even index.
 
@@ -136,7 +134,7 @@ Maintaining the position index in this way has several advantages:
   all be done within C-routines in the Python interpreter.
 
 * It's space efficient. The whole index is no more than twice the size of the
-  length of the ``_lists`` and contains only integers.
+  length of the `_lists` and contains only integers.
 
 * It's easy to update. Adding or removing an item involves incrementing or
   decrementing only ``log2(len(_index))`` items in the index. The only caveat
@@ -148,5 +146,5 @@ to other traditional designs. Whether the design is novel, I (Grant Jenks) do
 not know. Until shown otherwise, I would like to refer to it as the "Jenks"
 index.
 
-Each sorted container has a function named ``_check`` for verifying
+Each sorted container has a function named `_check` for verifying
 consistency. This function details the data-type invariants.
diff --git a/tox.ini b/tox.ini
@@ -6,7 +6,12 @@ deps=pytest
 commands=python -m pytest
 
 [pytest]
-addopts=--doctest-modules --doctest-glob "*.rst"
+addopts=
+    --doctest-modules
+    --doctest-glob "*.rst"
+    --ignore tests/benchmark_plot.py
+    --ignore tests/plot_lengths_histogram_add.py
+    --ignore tests/plot_lengths_histogram_delitem.py
 testpaths=docs sortedcontainers tests
 
 [testenv:lint]