Skip to content

Commit 0d47531

Browse files
committed
Update links and references to the new repository location.
1 parent 4704bf8 commit 0d47531

11 files changed

+44
-38
lines changed

README.rst

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,13 @@
55
:target: https://anaconda.org/conda-forge/hdbscan
66
:alt: Conda-forge Version
77
.. image:: https://img.shields.io/pypi/l/hdbscan.svg
8-
:target: https://github.com/lmcinnes/hdbscan/blob/master/LICENSE
8+
:target: https://github.com/scikit-learn-contrib/hdbscan/blob/master/LICENSE
99
:alt: License
10-
.. image:: https://travis-ci.org/lmcinnes/hdbscan.svg
11-
:target: https://travis-ci.org/lmcinnes/hdbscan
10+
.. image:: https://travis-ci.org/scikit-learn-contrib/hdbscan.svg
11+
:target: https://travis-ci.org/scikit-learn-contrib/hdbscan
1212
:alt: Travis Build Status
13-
.. image:: https://coveralls.io/repos/github/lmcinnes/hdbscan/badge.svg?branch=master
14-
:target: https://coveralls.io/github/lmcinnes/hdbscan?branch=master
13+
.. image:: https://coveralls.io/repos/github/scikit-learn-contrib/hdbscan/badge.svg?branch=master
14+
:target: https://coveralls.io/github/scikit-learn-contrib/hdbscan?branch=master
1515
:alt: Test Coverage
1616
.. image:: https://readthedocs.org/projects/hdbscan/badge/?version=latest
1717
:target: https://hdbscan.readthedocs.org
@@ -45,7 +45,7 @@ Based on the paper:
4545

4646
Documentation, including tutorials, are available on ReadTheDocs at http://hdbscan.readthedocs.io/en/latest/ .
4747

48-
Notebooks `comparing HDBSCAN to other clustering algorithms <http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb>`_, explaining `how HDBSCAN works <http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/How%20HDBSCAN%20Works.ipynb>`_ and `comparing performance with other python clustering implementations <http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Benchmarking%20scalability%20of%20clustering%20implementations-v0.7.ipynb>`_ are available.
48+
Notebooks `comparing HDBSCAN to other clustering algorithms <http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb>`_, explaining `how HDBSCAN works <http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/How%20HDBSCAN%20Works.ipynb>`_ and `comparing performance with other python clustering implementations <http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Benchmarking%20scalability%20of%20clustering%20implementations-v0.7.ipynb>`_ are available.
4949

5050
------------------
5151
How to use HDBSCAN
@@ -69,10 +69,10 @@ Performance
6969
-----------
7070

7171
Significant effort has been put into making the hdbscan implementation as fast as
72-
possible. It is `orders of magnitude faster than the reference implementation <http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Python%20vs%20Java.ipynb>`_ in Java,
72+
possible. It is `orders of magnitude faster than the reference implementation <http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Python%20vs%20Java.ipynb>`_ in Java,
7373
and is currently faster than highly optimized single linkage implementations in C and C++.
74-
`version 0.7 performance can be seen in this notebook <http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Benchmarking%20scalability%20of%20clustering%20implementations-v0.7.ipynb>`_ .
75-
In particular `performance on low dimensional data is better than sklearn's DBSCAN <http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Benchmarking%20scalability%20of%20clustering%20implementations%202D%20v0.7.ipynb>`_ ,
74+
`version 0.7 performance can be seen in this notebook <http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Benchmarking%20scalability%20of%20clustering%20implementations-v0.7.ipynb>`_ .
75+
In particular `performance on low dimensional data is better than sklearn's DBSCAN <http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Benchmarking%20scalability%20of%20clustering%20implementations%202D%20v0.7.ipynb>`_ ,
7676
and via support for caching with joblib, re-clustering with different parameters
7777
can be almost free.
7878

@@ -90,7 +90,7 @@ object has attributes for:
9090

9191
All of which come equipped with methods for plotting and converting
9292
to Pandas or NetworkX for further analysis. See the notebook on
93-
`how HDBSCAN works <http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/How%20HDBSCAN%20Works.ipynb>`_ for examples and further details.
93+
`how HDBSCAN works <http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/How%20HDBSCAN%20Works.ipynb>`_ for examples and further details.
9494

9595
The clusterer objects also have an attribute providing cluster membership
9696
strengths, resulting in optional soft clustering (and no further compute
@@ -173,7 +173,7 @@ For a manual install get this package:
173173

174174
.. code:: bash
175175
176-
wget https://github.com/lmcinnes/hdbscan/archive/master.zip
176+
wget https://github.com/scikit-learn-contrib/hdbscan/archive/master.zip
177177
unzip master.zip
178178
rm master.zip
179179
cd hdbscan-master

circle.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ machine:
22
environment:
33
# The github organization or username of the repository which hosts the
44
# project and documentation.
5-
USERNAME: "lmcinnes"
5+
USERNAME: "scikit-learn-contrib"
66

77
# The repository where the documentation will be hosted
88
DOC_REPO: "hdbscan"

docs/comparing_clustering_algorithms.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -585,7 +585,7 @@ So, in summary:
585585

586586
How does HDBSCAN perform on our test dataset? Unfortunately HDBSCAN is
587587
not part of ``sklearn``. Fortunately we can just import the `hdbscan
588-
library <https://github.com/lmcinnes/hdbscan>`__ and use it as if it
588+
library <https://github.com/scikit-learn-contrib/hdbscan>`__ and use it as if it
589589
were part of ``sklearn``.
590590

591591
.. code:: python

docs/how_hdbscan_works.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ one hundred data points.
5454
Now, the best way to explain HDBSCAN is actually just use it and then go
5555
through the steps that occurred along the way teasing out what is
5656
happening at each step. So let's load up the `hdbscan
57-
library <https://github.com/lmcinnes/hdbscan>`__ and get to work.
57+
library <https://github.com/scikit-learn-contrib/hdbscan>`__ and get to work.
5858

5959
.. code:: python
6060
@@ -396,7 +396,7 @@ are a fair number of moving parts to the algorithm -- but ultimately
396396
each part is actually very straightforward and can be optimized well.
397397
Hopefully with a better understanding both of the intuitions and some of
398398
the implementation details of HDBSCAN you will feel motivated to `try it
399-
out <https://github.com/lmcinnes/hdbscan>`__. The library continues to
399+
out <https://github.com/scikit-learn-contrib/hdbscan>`__. The library continues to
400400
develop, and will provide a base for new ideas including a near
401401
parameterless Persistent Density Clustering algorithm, and a new
402402
semi-supervised clustering algorithm.

docs/performance_and_scalability.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ The implementations being test are:
3939
provides very fast agglomerative clustering in C++)
4040
- `DeBaCl <https://github.com/CoAxLab/DeBaCl>`__ (Density Based
4141
Clustering; similar to a mix of DBSCAN and Agglomerative)
42-
- `HDBSCAN <https://github.com/lmcinnes/hdbscan>`__ (A robust
42+
- `HDBSCAN <https://github.com/scikit-learn-contrib/hdbscan>`__ (A robust
4343
hierarchical version of DBSCAN)
4444

4545
Obviously a major factor in performance will be the algorithm itself.
@@ -510,7 +510,7 @@ to be very constrained in what algorithms you can apply: if you get
510510
enough datapoints only K-Means, DBSCAN, and HDBSCAN will be left. This
511511
is somewhat disappointing, paritcularly as `K-Means is not a
512512
particularly good clustering
513-
algorithm <http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb>`__,
513+
algorithm <http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb>`__,
514514
paricularly for exploratory data analysis.
515515

516516
With this in mind it is worth looking at how these last several
@@ -781,7 +781,7 @@ hierarchical density based clustering than DeBaCl, and sklearn has by
781781
far the best K-Means implementation). For anything beyond toy datasets,
782782
however, your algorithm options are greatly constrained. In my
783783
(obviously biased) opinion `HDBSCAN is the best algorithm for
784-
clustering <http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb>`__.
784+
clustering <http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb>`__.
785785
If you need to cluster data beyond the scope that HDBSCAN can reasonably
786786
handle then the only algorithm options on the table are DBSCAN and
787787
K-Means; DBSCAN is the slower of the two, especially for very large

notebooks/Benchmarking scalability of clustering implementations 2D v0.7.ipynb

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
" * Agglomerative clustering\n",
2222
"* [Fastcluster](http://danifold.net/fastcluster.html) (which provides very fast agglomerative clustering in C++)\n",
2323
"* [DeBaCl](https://github.com/CoAxLab/DeBaCl) (Density Based Clustering; similar to a mix of DBSCAN and Agglomerative)\n",
24-
"* [HDBSCAN](https://github.com/lmcinnes/hdbscan) (A robust hierarchical version of DBSCAN)\n"
24+
"* [HDBSCAN](https://github.com/scikit-learn-contrib/hdbscan) (A robust hierarchical version of DBSCAN)\n"
2525
]
2626
},
2727
{
@@ -944,7 +944,7 @@
944944
"language_info": {
945945
"codemirror_mode": {
946946
"name": "ipython",
947-
"version": 2
947+
"version": 2.0
948948
},
949949
"file_extension": ".py",
950950
"mimetype": "text/x-python",
@@ -956,4 +956,4 @@
956956
},
957957
"nbformat": 4,
958958
"nbformat_minor": 0
959-
}
959+
}

notebooks/Benchmarking scalability of clustering implementations-v0.7.ipynb

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@
2323
" * Agglomerative clustering\n",
2424
"* [Fastcluster](http://danifold.net/fastcluster.html) (which provides very fast agglomerative clustering in C++)\n",
2525
"* [DeBaCl](https://github.com/CoAxLab/DeBaCl) (Density Based Clustering; similar to a mix of DBSCAN and Agglomerative)\n",
26-
"* [HDBSCAN](https://github.com/lmcinnes/hdbscan) (A robust hierarchical version of DBSCAN)\n",
26+
"* [HDBSCAN](https://github.com/scikit-learn-contrib/hdbscan) (A robust hierarchical version of DBSCAN)\n",
2727
"\n",
2828
"Obviously a major factor in performance will be the algorithm itself. Some algorithms are simply slower -- often, but not always, because they are doing more work to provide a better clustering."
2929
]
@@ -568,7 +568,7 @@
568568
"source": [
569569
"If we're looking for scaling we can write off the scipy single linkage implementation -- if even we didn't hit the RAM limit the $O(n^2)$ scaling is going to quickly catch up with us. Fastcluster has the same asymptotic scaling, but is heavily optimized to being the constant down much lower -- at this point it is still keeping close to the faster algorithms. It's asymtotics will still catch up with it eventually however.\n",
570570
"\n",
571-
"In practice this is going to mean that for larger datasets you are going to be very constrained in what algorithms you can apply: if you get enough datapoints only K-Means, DBSCAN, and HDBSCAN will be left. This is somewhat disappointing, paritcularly as [K-Means is not a particularly good clustering algorithm](http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb), paricularly for exploratory data analysis.\n",
571+
"In practice this is going to mean that for larger datasets you are going to be very constrained in what algorithms you can apply: if you get enough datapoints only K-Means, DBSCAN, and HDBSCAN will be left. This is somewhat disappointing, paritcularly as [K-Means is not a particularly good clustering algorithm](http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb), paricularly for exploratory data analysis.\n",
572572
"\n",
573573
"With this in mind it is worth looking at how these last several implementations perform at much larger sizes, to see, for example, when fastscluster starts to have its asymptotic complexity start to pull it away."
574574
]
@@ -863,7 +863,7 @@
863863
"source": [
864864
"## Conclusions\n",
865865
"\n",
866-
"Performance obviously depends on the algorithm chosen, but can also vary significantly upon the specific implementation (HDBSCAN is far better hierarchical density based clustering than DeBaCl, and sklearn has by far the best K-Means implementation). For anything beyond toy datasets, however, your algorithm options are greatly constrained. In my (obviously biased) opinion [HDBSCAN is the best algorithm for clustering](http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb). If you need to cluster data beyond the scope that HDBSCAN can reasonably handle then the only algorithm options on the table are DBSCAN and K-Means; DBSCAN is the slower of the two, especially for very large data, but K-Means clustering can be remarkably poor -- it's a tough choice."
866+
"Performance obviously depends on the algorithm chosen, but can also vary significantly upon the specific implementation (HDBSCAN is far better hierarchical density based clustering than DeBaCl, and sklearn has by far the best K-Means implementation). For anything beyond toy datasets, however, your algorithm options are greatly constrained. In my (obviously biased) opinion [HDBSCAN is the best algorithm for clustering](http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb). If you need to cluster data beyond the scope that HDBSCAN can reasonably handle then the only algorithm options on the table are DBSCAN and K-Means; DBSCAN is the slower of the two, especially for very large data, but K-Means clustering can be remarkably poor -- it's a tough choice."
867867
]
868868
}
869869
],
@@ -876,7 +876,7 @@
876876
"language_info": {
877877
"codemirror_mode": {
878878
"name": "ipython",
879-
"version": 2
879+
"version": 2.0
880880
},
881881
"file_extension": ".py",
882882
"mimetype": "text/x-python",
@@ -888,4 +888,4 @@
888888
},
889889
"nbformat": 4,
890890
"nbformat_minor": 0
891-
}
891+
}

notebooks/Comparing Clustering Algorithms.ipynb

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -450,7 +450,7 @@
450450
"* **Stability**: HDBSCAN is stable over runs and subsampling (since the variable density clustering will still cluster sparser subsampled clusters with the same parameter choices), and has good stability over parameter choices.\n",
451451
"* **Performance**: When implemented well HDBSCAN can be very efficient. The current implementation has similar performance to `fastcluster`'s agglomerative clustering (and will use `fastcluster` if it is available), but we expect future implementations that take advantage of newer data structure such as cover trees to scale significantly better.\n",
452452
"\n",
453-
"How does HDBSCAN perform on our test dataset? Unfortunately HDBSCAN is not part of `sklearn`. Fortunately we can just import the [hdbscan library](https://github.com/lmcinnes/hdbscan) and use it as if it were part of `sklearn`."
453+
"How does HDBSCAN perform on our test dataset? Unfortunately HDBSCAN is not part of `sklearn`. Fortunately we can just import the [hdbscan library](https://github.com/scikit-learn-contrib/hdbscan) and use it as if it were part of `sklearn`."
454454
]
455455
},
456456
{
@@ -503,7 +503,9 @@
503503
"collapsed": true
504504
},
505505
"outputs": [],
506-
"source": []
506+
"source": [
507+
""
508+
]
507509
}
508510
],
509511
"metadata": {
@@ -515,7 +517,7 @@
515517
"language_info": {
516518
"codemirror_mode": {
517519
"name": "ipython",
518-
"version": 2
520+
"version": 2.0
519521
},
520522
"file_extension": ".py",
521523
"mimetype": "text/x-python",
@@ -527,4 +529,4 @@
527529
},
528530
"nbformat": 4,
529531
"nbformat_minor": 0
530-
}
532+
}

notebooks/How HDBSCAN Works.ipynb

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -427,7 +427,9 @@
427427
"collapsed": true
428428
},
429429
"outputs": [],
430-
"source": []
430+
"source": [
431+
""
432+
]
431433
}
432434
],
433435
"metadata": {
@@ -439,7 +441,7 @@
439441
"language_info": {
440442
"codemirror_mode": {
441443
"name": "ipython",
442-
"version": 2
444+
"version": 2.0
443445
},
444446
"file_extension": ".py",
445447
"mimetype": "text/x-python",
@@ -451,4 +453,4 @@
451453
},
452454
"nbformat": 4,
453455
"nbformat_minor": 0
454-
}
456+
}

notebooks/Python vs Java.ipynb

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
"\n",
2222
"This is the story of how our codebase evolved and was optimized, and how it compares with the Java version at different stages of that journey.\n",
2323
"\n",
24-
"To make the comparisons we'll need data on runtimes of both algorithms, ranging over dataset size, and dataset dimension. To save time and space I've done that work in [another notebook](http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Performance%20data%20generation%20.ipynb) and will just load the data in here."
24+
"To make the comparisons we'll need data on runtimes of both algorithms, ranging over dataset size, and dataset dimension. To save time and space I've done that work in [another notebook](http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Performance%20data%20generation%20.ipynb) and will just load the data in here."
2525
]
2626
},
2727
{
@@ -478,7 +478,9 @@
478478
"collapsed": true
479479
},
480480
"outputs": [],
481-
"source": []
481+
"source": [
482+
""
483+
]
482484
}
483485
],
484486
"metadata": {
@@ -490,7 +492,7 @@
490492
"language_info": {
491493
"codemirror_mode": {
492494
"name": "ipython",
493-
"version": 2
495+
"version": 2.0
494496
},
495497
"file_extension": ".py",
496498
"mimetype": "text/x-python",
@@ -502,4 +504,4 @@
502504
},
503505
"nbformat": 4,
504506
"nbformat_minor": 0
505-
}
507+
}

0 commit comments

Comments
 (0)