Update links and references to the new repository location.

lmcinnes · lmcinnes · commit 0d4753165602 · 2016-09-02T16:40:56.000-04:00
diff --git a/README.rst b/README.rst
@@ -5,13 +5,13 @@
     :target: https://anaconda.org/conda-forge/hdbscan
     :alt: Conda-forge Version
 .. image:: https://img.shields.io/pypi/l/hdbscan.svg
-    :target: https://github.com/lmcinnes/hdbscan/blob/master/LICENSE
+    :target: https://github.com/scikit-learn-contrib/hdbscan/blob/master/LICENSE
     :alt: License
-.. image:: https://travis-ci.org/lmcinnes/hdbscan.svg
-    :target: https://travis-ci.org/lmcinnes/hdbscan
+.. image:: https://travis-ci.org/scikit-learn-contrib/hdbscan.svg
+    :target: https://travis-ci.org/scikit-learn-contrib/hdbscan
     :alt: Travis Build Status
-.. image:: https://coveralls.io/repos/github/lmcinnes/hdbscan/badge.svg?branch=master
-    :target: https://coveralls.io/github/lmcinnes/hdbscan?branch=master
+.. image:: https://coveralls.io/repos/github/scikit-learn-contrib/hdbscan/badge.svg?branch=master
+    :target: https://coveralls.io/github/scikit-learn-contrib/hdbscan?branch=master
     :alt: Test Coverage
 .. image:: https://readthedocs.org/projects/hdbscan/badge/?version=latest
     :target: https://hdbscan.readthedocs.org
@@ -45,7 +45,7 @@ Based on the paper:
     
 Documentation, including tutorials, are available on ReadTheDocs at http://hdbscan.readthedocs.io/en/latest/ .  
     
-Notebooks `comparing HDBSCAN to other clustering algorithms <http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb>`_, explaining `how HDBSCAN works <http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/How%20HDBSCAN%20Works.ipynb>`_ and `comparing performance with other python clustering implementations <http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Benchmarking%20scalability%20of%20clustering%20implementations-v0.7.ipynb>`_ are available.
+Notebooks `comparing HDBSCAN to other clustering algorithms <http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb>`_, explaining `how HDBSCAN works <http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/How%20HDBSCAN%20Works.ipynb>`_ and `comparing performance with other python clustering implementations <http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Benchmarking%20scalability%20of%20clustering%20implementations-v0.7.ipynb>`_ are available.
 
 ------------------
 How to use HDBSCAN
@@ -69,10 +69,10 @@ Performance
 -----------
 
 Significant effort has been put into making the hdbscan implementation as fast as 
-possible. It is `orders of magnitude faster than the reference implementation <http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Python%20vs%20Java.ipynb>`_ in Java,
+possible. It is `orders of magnitude faster than the reference implementation <http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Python%20vs%20Java.ipynb>`_ in Java,
 and is currently faster than highly optimized single linkage implementations in C and C++.
-`version 0.7 performance can be seen in this notebook <http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Benchmarking%20scalability%20of%20clustering%20implementations-v0.7.ipynb>`_ .
-In particular `performance on low dimensional data is better than sklearn's DBSCAN <http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Benchmarking%20scalability%20of%20clustering%20implementations%202D%20v0.7.ipynb>`_ ,
+`version 0.7 performance can be seen in this notebook <http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Benchmarking%20scalability%20of%20clustering%20implementations-v0.7.ipynb>`_ .
+In particular `performance on low dimensional data is better than sklearn's DBSCAN <http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Benchmarking%20scalability%20of%20clustering%20implementations%202D%20v0.7.ipynb>`_ ,
 and via support for caching with joblib, re-clustering with different parameters
 can be almost free.
 
@@ -90,7 +90,7 @@ object has attributes for:
 
 All of which come equipped with methods for plotting and converting
 to Pandas or NetworkX for further analysis. See the notebook on
-`how HDBSCAN works <http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/How%20HDBSCAN%20Works.ipynb>`_ for examples and further details.
+`how HDBSCAN works <http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/How%20HDBSCAN%20Works.ipynb>`_ for examples and further details.
 
 The clusterer objects also have an attribute providing cluster membership
 strengths, resulting in optional soft clustering (and no further compute 
@@ -173,7 +173,7 @@ For a manual install get this package:
 
 .. code:: bash
 
-    wget https://github.com/lmcinnes/hdbscan/archive/master.zip
+    wget https://github.com/scikit-learn-contrib/hdbscan/archive/master.zip
     unzip master.zip
     rm master.zip
     cd hdbscan-master
diff --git a/circle.yml b/circle.yml
@@ -2,7 +2,7 @@ machine:
   environment:
     # The github organization or username of the repository which hosts the
     # project and documentation.
-    USERNAME: "lmcinnes"
+    USERNAME: "scikit-learn-contrib"
 
     # The repository where the documentation will be hosted
     DOC_REPO: "hdbscan"
diff --git a/docs/comparing_clustering_algorithms.rst b/docs/comparing_clustering_algorithms.rst
@@ -585,7 +585,7 @@ So, in summary:
 
 How does HDBSCAN perform on our test dataset? Unfortunately HDBSCAN is
 not part of ``sklearn``. Fortunately we can just import the `hdbscan
-library <https://github.com/lmcinnes/hdbscan>`__ and use it as if it
+library <https://github.com/scikit-learn-contrib/hdbscan>`__ and use it as if it
 were part of ``sklearn``.
 
 .. code:: python
diff --git a/docs/how_hdbscan_works.rst b/docs/how_hdbscan_works.rst
@@ -54,7 +54,7 @@ one hundred data points.
 Now, the best way to explain HDBSCAN is actually just use it and then go
 through the steps that occurred along the way teasing out what is
 happening at each step. So let's load up the `hdbscan
-library <https://github.com/lmcinnes/hdbscan>`__ and get to work.
+library <https://github.com/scikit-learn-contrib/hdbscan>`__ and get to work.
 
 .. code:: python
 
@@ -396,7 +396,7 @@ are a fair number of moving parts to the algorithm -- but ultimately
 each part is actually very straightforward and can be optimized well.
 Hopefully with a better understanding both of the intuitions and some of
 the implementation details of HDBSCAN you will feel motivated to `try it
-out <https://github.com/lmcinnes/hdbscan>`__. The library continues to
+out <https://github.com/scikit-learn-contrib/hdbscan>`__. The library continues to
 develop, and will provide a base for new ideas including a near
 parameterless Persistent Density Clustering algorithm, and a new
 semi-supervised clustering algorithm.
diff --git a/docs/performance_and_scalability.rst b/docs/performance_and_scalability.rst
@@ -39,7 +39,7 @@ The implementations being test are:
    provides very fast agglomerative clustering in C++)
 -  `DeBaCl <https://github.com/CoAxLab/DeBaCl>`__ (Density Based
    Clustering; similar to a mix of DBSCAN and Agglomerative)
--  `HDBSCAN <https://github.com/lmcinnes/hdbscan>`__ (A robust
+-  `HDBSCAN <https://github.com/scikit-learn-contrib/hdbscan>`__ (A robust
    hierarchical version of DBSCAN)
 
 Obviously a major factor in performance will be the algorithm itself.
@@ -510,7 +510,7 @@ to be very constrained in what algorithms you can apply: if you get
 enough datapoints only K-Means, DBSCAN, and HDBSCAN will be left. This
 is somewhat disappointing, paritcularly as `K-Means is not a
 particularly good clustering
-algorithm <http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb>`__,
+algorithm <http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb>`__,
 paricularly for exploratory data analysis.
 
 With this in mind it is worth looking at how these last several
@@ -781,7 +781,7 @@ hierarchical density based clustering than DeBaCl, and sklearn has by
 far the best K-Means implementation). For anything beyond toy datasets,
 however, your algorithm options are greatly constrained. In my
 (obviously biased) opinion `HDBSCAN is the best algorithm for
-clustering <http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb>`__.
+clustering <http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb>`__.
 If you need to cluster data beyond the scope that HDBSCAN can reasonably
 handle then the only algorithm options on the table are DBSCAN and
 K-Means; DBSCAN is the slower of the two, especially for very large
diff --git a/notebooks/Benchmarking scalability of clustering implementations 2D v0.7.ipynb b/notebooks/Benchmarking scalability of clustering implementations 2D v0.7.ipynb
@@ -21,7 +21,7 @@
     " * Agglomerative clustering\n",
     "* [Fastcluster](http://danifold.net/fastcluster.html) (which provides very fast agglomerative clustering in C++)\n",
     "* [DeBaCl](https://github.com/CoAxLab/DeBaCl) (Density Based Clustering; similar to a mix of DBSCAN and Agglomerative)\n",
-    "* [HDBSCAN](https://github.com/lmcinnes/hdbscan) (A robust hierarchical version of DBSCAN)\n"
+    "* [HDBSCAN](https://github.com/scikit-learn-contrib/hdbscan) (A robust hierarchical version of DBSCAN)\n"
    ]
   },
   {
@@ -944,7 +944,7 @@
   "language_info": {
    "codemirror_mode": {
     "name": "ipython",
-    "version": 2
+    "version": 2.0
    },
    "file_extension": ".py",
    "mimetype": "text/x-python",
@@ -956,4 +956,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 0
-}
+}
diff --git a/notebooks/Benchmarking scalability of clustering implementations-v0.7.ipynb b/notebooks/Benchmarking scalability of clustering implementations-v0.7.ipynb
@@ -23,7 +23,7 @@
     " * Agglomerative clustering\n",
     "* [Fastcluster](http://danifold.net/fastcluster.html) (which provides very fast agglomerative clustering in C++)\n",
     "* [DeBaCl](https://github.com/CoAxLab/DeBaCl) (Density Based Clustering; similar to a mix of DBSCAN and Agglomerative)\n",
-    "* [HDBSCAN](https://github.com/lmcinnes/hdbscan) (A robust hierarchical version of DBSCAN)\n",
+    "* [HDBSCAN](https://github.com/scikit-learn-contrib/hdbscan) (A robust hierarchical version of DBSCAN)\n",
     "\n",
     "Obviously a major factor in performance will be the algorithm itself. Some algorithms are simply slower -- often, but not always, because they are doing more work to provide a better clustering."
    ]
@@ -568,7 +568,7 @@
    "source": [
     "If we're looking for scaling we can write off the scipy single linkage implementation -- if even we didn't hit the RAM limit the $O(n^2)$ scaling is going to quickly catch up with us. Fastcluster has the same asymptotic scaling, but is heavily optimized to being the constant down much lower -- at this point it is still keeping close to the faster algorithms. It's asymtotics will still catch up with it eventually however.\n",
     "\n",
-    "In practice this is going to mean that for larger datasets you are going to be very constrained in what algorithms you can apply: if you get enough datapoints only K-Means, DBSCAN, and HDBSCAN will be left. This is somewhat disappointing, paritcularly as [K-Means is not a particularly good clustering algorithm](http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb), paricularly for exploratory data analysis.\n",
+    "In practice this is going to mean that for larger datasets you are going to be very constrained in what algorithms you can apply: if you get enough datapoints only K-Means, DBSCAN, and HDBSCAN will be left. This is somewhat disappointing, paritcularly as [K-Means is not a particularly good clustering algorithm](http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb), paricularly for exploratory data analysis.\n",
     "\n",
     "With this in mind it is worth looking at how these last several implementations perform at much larger sizes, to see, for example, when fastscluster starts to have its asymptotic complexity start to pull it away."
    ]
@@ -863,7 +863,7 @@
    "source": [
     "## Conclusions\n",
     "\n",
-    "Performance obviously depends on the algorithm chosen, but can also vary significantly upon the specific implementation (HDBSCAN is far better hierarchical density based clustering than DeBaCl, and sklearn has by far the best K-Means implementation). For anything beyond toy datasets, however, your algorithm options are greatly constrained. In my (obviously biased) opinion [HDBSCAN is the best algorithm for clustering](http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb). If you need to cluster data beyond the scope that HDBSCAN can reasonably handle then the only algorithm options on the table are DBSCAN and K-Means; DBSCAN is the slower of the two, especially for very large data, but K-Means clustering can be remarkably poor -- it's a tough choice."
+    "Performance obviously depends on the algorithm chosen, but can also vary significantly upon the specific implementation (HDBSCAN is far better hierarchical density based clustering than DeBaCl, and sklearn has by far the best K-Means implementation). For anything beyond toy datasets, however, your algorithm options are greatly constrained. In my (obviously biased) opinion [HDBSCAN is the best algorithm for clustering](http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Comparing%20Clustering%20Algorithms.ipynb). If you need to cluster data beyond the scope that HDBSCAN can reasonably handle then the only algorithm options on the table are DBSCAN and K-Means; DBSCAN is the slower of the two, especially for very large data, but K-Means clustering can be remarkably poor -- it's a tough choice."
    ]
   }
  ],
@@ -876,7 +876,7 @@
   "language_info": {
    "codemirror_mode": {
     "name": "ipython",
-    "version": 2
+    "version": 2.0
    },
    "file_extension": ".py",
    "mimetype": "text/x-python",
@@ -888,4 +888,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 0
-}
+}
diff --git a/notebooks/Comparing Clustering Algorithms.ipynb b/notebooks/Comparing Clustering Algorithms.ipynb
@@ -450,7 +450,7 @@
     "* **Stability**: HDBSCAN is stable over runs and subsampling (since the variable density clustering will still cluster sparser subsampled clusters with the same parameter choices), and has good stability over parameter choices.\n",
     "* **Performance**: When implemented well HDBSCAN can be very efficient. The current implementation has similar performance to `fastcluster`'s agglomerative clustering (and will use `fastcluster` if it is available), but we expect future implementations that take advantage of newer data structure such as cover trees to scale significantly better.\n",
     "\n",
-    "How does HDBSCAN perform on our test dataset? Unfortunately HDBSCAN is not part of `sklearn`. Fortunately we can just import the [hdbscan library](https://github.com/lmcinnes/hdbscan) and use it as if it were part of `sklearn`."
+    "How does HDBSCAN perform on our test dataset? Unfortunately HDBSCAN is not part of `sklearn`. Fortunately we can just import the [hdbscan library](https://github.com/scikit-learn-contrib/hdbscan) and use it as if it were part of `sklearn`."
    ]
   },
   {
@@ -503,7 +503,9 @@
     "collapsed": true
    },
    "outputs": [],
-   "source": []
+   "source": [
+    ""
+   ]
   }
  ],
  "metadata": {
@@ -515,7 +517,7 @@
   "language_info": {
    "codemirror_mode": {
     "name": "ipython",
-    "version": 2
+    "version": 2.0
    },
    "file_extension": ".py",
    "mimetype": "text/x-python",
@@ -527,4 +529,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 0
-}
+}
diff --git a/notebooks/How HDBSCAN Works.ipynb b/notebooks/How HDBSCAN Works.ipynb
@@ -427,7 +427,9 @@
     "collapsed": true
    },
    "outputs": [],
-   "source": []
+   "source": [
+    ""
+   ]
   }
  ],
  "metadata": {
@@ -439,7 +441,7 @@
   "language_info": {
    "codemirror_mode": {
     "name": "ipython",
-    "version": 2
+    "version": 2.0
    },
    "file_extension": ".py",
    "mimetype": "text/x-python",
@@ -451,4 +453,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 0
-}
+}
diff --git a/notebooks/Python vs Java.ipynb b/notebooks/Python vs Java.ipynb
@@ -21,7 +21,7 @@
     "\n",
     "This is the story of how our codebase evolved and was optimized, and how it compares with the Java version at different stages of that journey.\n",
     "\n",
-    "To make the comparisons we'll need data on runtimes of both algorithms, ranging over dataset size, and dataset dimension. To save time and space I've done that work in [another notebook](http://nbviewer.jupyter.org/github/lmcinnes/hdbscan/blob/master/notebooks/Performance%20data%20generation%20.ipynb) and will just load the data in here."
+    "To make the comparisons we'll need data on runtimes of both algorithms, ranging over dataset size, and dataset dimension. To save time and space I've done that work in [another notebook](http://nbviewer.jupyter.org/github/scikit-learn-contrib/hdbscan/blob/master/notebooks/Performance%20data%20generation%20.ipynb) and will just load the data in here."
    ]
   },
   {
@@ -478,7 +478,9 @@
     "collapsed": true
    },
    "outputs": [],
-   "source": []
+   "source": [
+    ""
+   ]
   }
  ],
  "metadata": {
@@ -490,7 +492,7 @@
   "language_info": {
    "codemirror_mode": {
     "name": "ipython",
-    "version": 2
+    "version": 2.0
    },
    "file_extension": ".py",
    "mimetype": "text/x-python",
@@ -502,4 +504,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 0
-}
+}
diff --git a/setup.py b/setup.py
@@ -54,7 +54,7 @@ def readme():
         'Programming Language :: Python :: 3.4',
     ],
     'keywords' : 'cluster clustering density hierarchical',
-    'url' : 'http://github.com/lmcinnes/hdbscan',
+    'url' : 'http://github.com/scikit-learn-contrib/hdbscan',
     'maintainer' : 'Leland McInnes',
     'maintainer_email' : 'leland.mcinnes@gmail.com',
     'license' : 'BSD',