Skip to content

Commit 9a82c4e

Browse files
author
Jelmer Bot
committed
add missing doc-page; specify long description filetype
1 parent 731cfe3 commit 9a82c4e

File tree

2 files changed

+139
-0
lines changed

2 files changed

+139
-0
lines changed

doc/basic_usage.rst

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
==============================================
2+
Manifold Modelling with Minimum Spanning Trees
3+
==============================================
4+
5+
Dimensionality reduction (DR) algorithms typically assume the data they are
6+
given is uniformly sampled from some underlying manifold. When this is not the
7+
case, and there are observation-gaps along the manifold, these algorithms may
8+
fail to detect a single connected entity. This repository presents two manifold
9+
approximation approaches based on minimum spanning trees (MST) for non-uniform
10+
sampled data.
11+
12+
---------------------------------
13+
Noisy Minimum Spanning Tree Union
14+
---------------------------------
15+
16+
The noisy minimum spanning tree union ($n$-MST) is inspired by Pathfinder
17+
networks that, with a specific parameter selection, yield the union set of all
18+
possible MSTs in a network (see, e.g., [`1`_], [`2`_]). We compute noisy MSTs to
19+
detect alternative connectivity at all distance scales for distances which may
20+
have few identically weighted connections.
21+
22+
We add Gaussian noise ($\mu=0$) to every candidate edge. The noise parameter $n$
23+
is specified as a fraction of the points' nearest neighbour distance and
24+
controls the Gaussian's standard deviation. This formulation makes the noise
25+
scale with the data's density to avoid adding more edges in dense regions than
26+
sparse regions, retaining a reasonably uniform manifold approximation graph.
27+
28+
.. code:: python
29+
30+
import matplotlib.pyplot as plt
31+
import matplotlib.collections as mc
32+
from sklearn.datasets import make_swiss_roll
33+
from multi_mst.noisy_mst import NoisyMST
34+
35+
X, t = make_swiss_roll(n_samples=2000, noise=0.5, hole=True)
36+
projector = NoisyMST(num_trees=10, noise_fraction=1.0).fit(X)
37+
38+
xs = projector.embedding_[:, 0]
39+
ys = projector.embedding_[:, 1]
40+
coo_matrix = projector.graph_.tocoo()
41+
sources = coo_matrix.row
42+
targets = coo_matrix.col
43+
44+
plt.figure(figsize=(4, 3))
45+
plt.scatter(xs, ys, c=t, s=1, edgecolors="none", linewidth=0, cmap="viridis")
46+
lc = mc.LineCollection(
47+
list(zip(zip(xs[sources], ys[sources]), zip(xs[targets], ys[targets]))),
48+
linewidth=0.2,
49+
zorder=-1,
50+
alpha=0.5,
51+
color="k",
52+
)
53+
ax = plt.gca()
54+
ax.add_collection(lc)
55+
ax.set_aspect("equal")
56+
plt.subplots_adjust(0, 0, 1, 1)
57+
plt.axis("off")
58+
plt.show()
59+
60+
.. figure:: _static/noisy_mst.png
61+
62+
63+
---------------------------------
64+
$k$-Nearest Minimum Spanning Tree
65+
---------------------------------
66+
67+
The k-nearest Minimum Spanning Tree ($k$-MST) generalises $k$-nearest neighbour
68+
networks ($k$-NN) for minimum spanning trees. It adds the $k$ shortest edges
69+
between components. Since data points start as distinct components, all $k$-NN
70+
edges are included in the kMST.
71+
72+
To avoid creating shortcuts in the manifold, a distance threshold $\epsilon$ can
73+
be applied. The parameter is specified as a fraction of the shortest edge
74+
between components and provides an upper distance limit for the $2$-to-$k$
75+
alternative edges.
76+
77+
.. code:: python
78+
79+
import matplotlib.pyplot as plt
80+
import matplotlib.collections as mc
81+
from sklearn.datasets import make_swiss_roll
82+
from multi_mst.k_mst import KMST
83+
84+
X, t = make_swiss_roll(n_samples=2000, noise=0.5, hole=True)
85+
projector = KMST(num_neighbors=3, epsilon=2.0).fit(X)
86+
87+
xs = projector.embedding_[:, 0]
88+
ys = projector.embedding_[:, 1]
89+
coo_matrix = projector.graph_.tocoo()
90+
sources = coo_matrix.row
91+
targets = coo_matrix.col
92+
93+
plt.figure(figsize=(4, 3))
94+
plt.scatter(xs, ys, c=t, s=1, edgecolors="none", linewidth=0, cmap="viridis")
95+
lc = mc.LineCollection(
96+
list(zip(zip(xs[sources], ys[sources]), zip(xs[targets], ys[targets]))),
97+
linewidth=0.2,
98+
zorder=-1,
99+
alpha=0.5,
100+
color="k",
101+
)
102+
ax = plt.gca()
103+
ax.add_collection(lc)
104+
ax.set_aspect("equal")
105+
plt.subplots_adjust(0, 0, 1, 1)
106+
plt.axis("off")
107+
plt.show()
108+
109+
.. figure:: _static/k_mst.png
110+
111+
112+
-------------------------
113+
Installation Instructions
114+
-------------------------
115+
116+
The `multi_mst` package can be installed from pypi:
117+
118+
.. code:: bash
119+
120+
pip install multi_mst
121+
122+
----------------
123+
Acknowledgements
124+
----------------
125+
126+
Most code---including the numba KDTree, disjoint set and boruvka MST
127+
construction implementation---is adapted from `fast_hdbscan`_.
128+
129+
-------
130+
License
131+
-------
132+
133+
`multi_mst` uses the same license as `fast_hdbscan`: BSD (2-clause). See the
134+
LICENSE file for details.
135+
136+
.. _1: https://onlinelibrary.wiley.com/doi/10.1002/asi.20904
137+
.. _2: https://ieeexplore.ieee.org/document/8231853
138+
.. _fast_hdbscan: https://github.com/TutteInstitute/fast_hdbscan

setup.cfg

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ maintainer = Jelmer Bot
77
maintainer_email = [email protected]
88
description = Minimum spanning tree based manifold approximations.
99
long_description = file: README.md
10+
long_description_content_type = text/markdown
1011
keywords = minimum spanning tree, dimensionality reduction
1112
url = https://github.com/vda-lab/multi_mst
1213
license = BSD 2-Clause License

0 commit comments

Comments
 (0)