Various README updates: remove outdated info, or info that was moved elsewhere (#91)

srowen · web-flow · commit 51c8ba1f19e8 · 2018-12-08T14:28:06.000-06:00
diff --git a/README.md b/README.md
@@ -1,33 +1,26 @@
 # Scikit-learn integration package for Apache Spark
 
-This package contains some tools to integrate the [Spark computing framework](http://spark.apache.org/) with the popular [scikit-learn machine library](http://scikit-learn.org/stable/). Among other tools:
- - train and evaluate multiple scikit-learn models in parallel. It is a distributed analog to the [multicore implementation](https://pythonhosted.org/joblib/parallel.html) included by default in [scikit-learn](http://scikit-learn.org/stable/).
- - convert Spark's Dataframes seamlessly into numpy `ndarray`s or sparse matrices.
- - (experimental) distribute Scipy's sparse matrices as a dataset of sparse vectors.
+This package contains some tools to integrate the [Spark computing framework](https://spark.apache.org/) with the popular [scikit-learn machine library](https://scikit-learn.org/stable/). Among other tools:
+- train and evaluate multiple scikit-learn models in parallel. It is a distributed analog to the [multicore implementation](https://pythonhosted.org/joblib/parallel.html) included by default in [scikit-learn](https://scikit-learn.org/stable/).
+- convert Spark's Dataframes seamlessly into numpy `ndarray`s or sparse matrices.
+- (experimental) distribute Scipy's sparse matrices as a dataset of sparse vectors.
 
 It focuses on problems that have a small amount of data and that can be run in parallel.
 - for small datasets, it distributes the search for estimator parameters (`GridSearchCV` in scikit-learn), using Spark,
 - for datasets that do not fit in memory, we recommend using the [distributed implementation in Spark MLlib](https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html).
-
-  > NOTE: This package distributes simple tasks like grid-search cross-validation. It does not distribute individual learning algorithms (unlike Spark MLlib).
-
-**Difference with the [sparkit-learn project](https://github.com/lensacom/sparkit-learn)** The sparkit-learn project aims at a comprehensive integration between Spark and scikit-learn. In particular, it adds some primitives to distribute numerical data using Spark, and it reimplements some of the most common algorithms found in scikit-learn.
-
-## License
-
-This package is released under the Apache 2.0 license. See the LICENSE file.
+This package distributes simple tasks like grid-search cross-validation. It does not distribute individual learning algorithms (unlike Spark MLlib).
 
 ## Installation
 
 This package is available on PYPI:
 
 	pip install spark-sklearn
 
-This project is also available as as [Spark package](http://spark-packages.org/package/databricks/spark-sklearn).
+This project is also available as as [Spark package](https://spark-packages.org/package/databricks/spark-sklearn).
 
 The developer version has the following requirements:
  - a recent release of scikit-learn. Releases 0.18.1, 0.19.0 have been tested, older versions may work too.
- - Spark >= 2.1.1. Spark may be downloaded from the [Spark official website](http://spark.apache.org/). In order to use this package, you need to use the pyspark interpreter or another Spark-compliant python interpreter. See the [Spark guide](https://spark.apache.org/docs/latest/programming-guide.html#overview) for more details.
+ - Spark >= 2.1.1. Spark may be downloaded from the [Spark official website](https://spark.apache.org/). In order to use this package, you need to use the pyspark interpreter or another Spark-compliant python interpreter. See the [Spark guide](https://spark.apache.org/docs/latest/programming-guide.html#overview) for more details.
  - [nose](https://nose.readthedocs.org) (testing dependency only)
  - Pandas, if using the Pandas integration or testing. Pandas==0.18 has been tested.
 
@@ -37,7 +30,7 @@ If you want to use a developer version, you just need to make sure the `python/`
 
 __Running tests__ You can directly run tests:
 
-  cd python && ./run-tests.sh
+    cd python && ./run-tests.sh
 
 This requires the environment variable `SPARK_HOME` to point to your local copy of Spark.
 
@@ -62,13 +55,4 @@ This classifier can be used as a drop-in replacement for any scikit-learn classi
 [API documentation](http://databricks.github.io/spark-sklearn-docs) is currently hosted on Github pages. To
 build the docs yourself, see the instructions in [docs/README.md](https://github.com/databricks/spark-sklearn/tree/master/docs).
 
-## Changelog
-
-- 2015-12-10 First public release (0.1)
-- 2016-08-16 Minor release (0.2.0):
-   1. the official Spark target is Spark 2.0
-   2. support for keyed models
-- 2017-09-20 Minor release (0.2.2):
-   1. The official Spark target is Spark >= 2.1
-- 2017-09-29 Minor release (0.2.3):
-   1. Fixes spark-package build of spark-sklearn.
+[![Build Status](https://travis-ci.org/databricks/spark-sklearn.svg?branch=master)](https://travis-ci.org/databricks/spark-sklearn)
diff --git a/docs/README.md b/docs/README.md
@@ -1,40 +1,28 @@
-Welcome to the spark-sklearn Spark Package documentation!
+# Generating the Documentation HTML
 
-This readme will walk you through navigating and building the spark-sklearn documentation, which is
-included here with the source code.
+## Installing Dependencies
 
-## Generating the Documentation HTML
-
-### Installing Dependencies
-
-The spark-sklearn documentation is built with [Jekyll](http://jekyllrb.com), which
+The spark-sklearn documentation is built with [Jekyll](https://jekyllrb.com), which
 can be installed as follows:
 
-    $ sudo gem install jekyll
-    $ sudo gem install jekyll-redirect-from
+    sudo gem install jekyll
 
 On macOS, with the default Ruby, please install Jekyll with Bundler as
-[instructed on offical website](https://jekyllrb.com/docs/quickstart/).
+[instructed on official website](https://jekyllrb.com/docs/quickstart/).
 Otherwise the build script might fail to resolve dependencies.
 
-    $ sudo gem install jekyll bundler
-    $ sudo gem install jekyll-redirect-from
+    sudo gem install jekyll bundler
 
 Install the python dependencies necessary for building the docs via (from project root):
 
-    $ pip install -r python/requirements-docs.txt
+    pip install -r python/requirements-docs.txt
 
-### Building the Docs
+## Building the Docs
 
 Execute `jekyll build` from the `docs/` directory to compile the site.
 When you run `jekyll build`, it will build (using Sphinx) the Python API
 docs, copying them into the `docs` directory (and then also into the `_site` directory).
 
-To serve the docs locally, run:
-
-    # Serve content locally on port 4000
-    $ jekyll serve --watch
+To serve the docs locally on port 4000, run:
 
-Note that `SPARK_HOME` must be set to your local Spark installation in order to generate the docs.
-To manually point to a specific `Spark` installation,
-    $ SPARK_HOME=<your-path-to-spark-home> PRODUCTION=1 jekyll build
+    SPARK_HOME=<your-path-to-spark-home> jekyll serve --watch
diff --git a/python/README.md b/python/README.md
diff --git a/python/README.md b/python/README.md
@@ -0,0 +1 @@
+../README.md