Skip to content

Commit 6227e09

Browse files
author
Frankie Robertson
committed
First pass at documentation
1 parent 7e0a0cf commit 6227e09

File tree

8 files changed

+240
-52
lines changed

8 files changed

+240
-52
lines changed

docs/background.rst

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
Background and API design
2+
=========================
3+
4+
There have been long standing efficiency issues with scikit-learn's. In
5+
particular, the `ball tree`_ and `k-d tree`_ to not scale well to high
6+
dimensional spaces. The decision was taken that the best way to integrate other
7+
techniques was to allow all applicable unsupervised estimators methods to take
8+
a sparse matrix, typically being a KNN-graph of the points, but potentially
9+
being any estimate. These `slides from PyParis 2018`_ explain some background,
10+
while `issue #10463`_ and `pull request #10482`_ give discussion, justification
11+
and benchmarks and more detail regarding the approach.
12+
13+
The main advantage of this technique is that the sparse matrix/KNN-graph can be built transformer from the data, and these to be sequenced using the scikit-learn pipeline mechanism. This approach allows for, for example parameter search to be done on the KNN-graph construction technique together with the estimator. Typically the transformer should closely follow the interface of KNeighborsTransformer. The `exact contract is outlined in the user guide`_. . There is also `an example notebook with early versions of the transformers in this library`_.
14+
15+
.. _`ball tree`: https://en.wikipedia.org/wiki/Ball_tree
16+
.. _`k-d tree`: https://en.wikipedia.org/wiki/K-d_tree
17+
.. _`slides from PyParis 2018`: https://tomdlt.github.io/decks/2018_pyparis/
18+
.. _`issue #10463`: https://github.com/scikit-learn/scikit-learn/issues/10463
19+
.. _`pull request #10482`: https://github.com/scikit-learn/scikit-learn/pull/10482
20+
.. _`exact contract is outlined in the user guide`: https://scikit-learn.org/stable/modules/neighbors.html#neighbors-transformer
21+
.. _`an example notebook with early versions of the transformers in this library`: https://scikit-learn.org/stable/auto_examples/neighbors/approximate_nearest_neighbors.html#sphx-glr-auto-examples-neighbors-approximate-nearest-neighbors-py

docs/clustering.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
Clustering
2+
==========
3+
4+
While it is possible to use the transformers of the sklearn_ann.kneighbors module together with clustering algorithms from scikit-learn directly, there is often a mismatch between techniques like DBSCAN, which require for each node its neighbors within a certain radius, and kNN-graph which has a fixed number of. This mismatch may result in k being set to high, to make sure that, slowing things down.
5+
6+
This module contains an implementation of RNN-DBSCAN, which is based on the kNN-graph structure.
7+
8+
.. automodule:: sklearn_ann.cluster.rnn_dbscan
9+
:members:

docs/conf.py

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,8 @@
1414
# import sys
1515
# sys.path.insert(0, os.path.abspath('.'))
1616

17+
import sphinx_rtd_theme
18+
1719

1820
# -- Project information -----------------------------------------------------
1921

@@ -28,6 +30,11 @@
2830
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
2931
# ones.
3032
extensions = [
33+
'sphinx.ext.autodoc',
34+
'sphinx.ext.autosummary',
35+
'numpydoc',
36+
'sphinx_issues',
37+
'sphinx.ext.viewcode',
3138
]
3239

3340
# Add any paths that contain templates here, relative to this directory.
@@ -44,9 +51,12 @@
4451
# The theme to use for HTML and HTML Help pages. See the documentation for
4552
# a list of builtin themes.
4653
#
47-
html_theme = 'alabaster'
54+
html_theme = 'sphinx_rtd_theme'
4855

4956
# Add any paths that contain custom static files (such as style sheets) here,
5057
# relative to this directory. They are copied after the builtin static files,
5158
# so a file named "default.css" will overwrite the builtin "default.css".
52-
html_static_path = ['_static']
59+
html_static_path = ['_static']
60+
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
61+
62+
autodoc_mock_imports = ["annoy", "faiss", "pynndescent", "nmslib"]

docs/index.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,17 @@ sklearn-ann
1414
:start-after: inclusion-marker-do-not-remove
1515

1616

17+
User Guide
18+
---------------------
19+
20+
.. toctree::
21+
:maxdepth: 2
22+
23+
background
24+
kneighbors
25+
clustering
26+
27+
1728
Indices and tables
1829
==================
1930

docs/kneighbors.rst

Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
Implementations of the KNeighborsTransformer interface
2+
======================================================
3+
4+
This module contains transformers which transform from array-like structures of
5+
shape (n_samples, n_features) to KNN-graphs encoded as scipy.sparse.csr_matrix.
6+
They conform to the KNeighborsTransformer interface. Each submodule in this
7+
module provides facilities for exactly one external nearest neighbour library.
8+
9+
Annoy
10+
-----
11+
12+
`Annoy (Approximate Nearest Neighbors Oh Yeah)`_ is a C++ library with Python
13+
bindings to search for points in space that are close to a given query point. The originates from Spotify.
14+
It uses a forest of random projection trees.
15+
16+
.. _`Annoy (Approximate Nearest Neighbors Oh Yeah)`: https://github.com/spotify/annoy
17+
18+
19+
.. automodule:: sklearn_ann.kneighbors.annoy
20+
:members:
21+
22+
FAISS
23+
-----
24+
25+
`FAISS (Facebook AI Similarity Search)`_ is a library for efficient similarity
26+
search and clustering of dense vectors. The project originates from Facebook AI
27+
Research (FAIR). It contains multiple algorithms including algorithms for
28+
exact/brute force nearest neighbour, methods based on quantization and product
29+
quantization, and methods based on Hierarchical Navigable Small World graphs
30+
(HNSW). There are some `guidelines on how to choose the best index for your
31+
purposes`.
32+
33+
.. _`FAISS (Facebook AI Similarity Search)`: https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index
34+
35+
.. _`guidelines on how to choose the best index for your purposes`: https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index
36+
37+
38+
.. automodule:: sklearn_ann.kneighbors.faiss
39+
:members:
40+
41+
nmslib
42+
------
43+
44+
`nmslib (non-metric space library)` is a library for similarity search support
45+
metric and non-metric spaces. It contains multiple algorithms.
46+
47+
48+
.. automodule:: sklearn_ann.kneighbors.nmslib
49+
:members:
50+
51+
PyNNDescent
52+
-----------
53+
54+
`PyNNDescent`_ is a Python nearest neighbor descent for approximate nearest
55+
neighbors. It iteratively improves kNN-graph using the transitive property,
56+
using random projections for initialisation. This transformer is actually
57+
implemented as part of PyNNDescent, and simply re-exported here for (foolish)
58+
consistency. If you only need this transformer, just use PyNNDescent directly.
59+
60+
61+
.. automodule:: sklearn_ann.kneighbors.pynndescent
62+
:members:
63+
64+
sklearn
65+
-------
66+
67+
`scikit-learn` itself contains ball tree and k-d indices. KNeighborsTransformer is re-exported here specialised for these two types of index for consistency.
68+
69+
70+
.. automodule:: sklearn_ann.kneighbors.sklearn
71+
:members:

poetry.lock

Lines changed: 63 additions & 43 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)