Convert README to .rst for better rendering in Python docs (#93)

srowen · web-flow · commit b21cf102483c · 2018-12-09T14:56:38.000-06:00
diff --git a/README.md b/README.md
diff --git a/README.rst b/README.rst
@@ -0,0 +1,81 @@
+Scikit-learn integration package for Apache Spark
+=================================================
+
+This package contains some tools to integrate the `Spark computing framework <https://spark.apache.org/>`_
+with the popular `scikit-learn machine library <https://scikit-learn.org/stable/>`_. Among other things, it can:
+
+- train and evaluate multiple scikit-learn models in parallel. It is a distributed analog to the
+  `multicore implementation <https://pythonhosted.org/joblib/parallel.html>`_ included by default in ``scikit-learn``
+- convert Spark's Dataframes seamlessly into numpy ``ndarray`` or sparse matrices
+- (experimental) distribute Scipy's sparse matrices as a dataset of sparse vectors
+
+It focuses on problems that have a small amount of data and that can be run in parallel.
+For small datasets, it distributes the search for estimator parameters (``GridSearchCV`` in scikit-learn),
+using Spark. For datasets that do not fit in memory, we recommend using the `distributed implementation in
+`Spark MLlib <https://spark.apache.org/docs/latest/api/python/pyspark.mllib.html>`_.
+
+This package distributes simple tasks like grid-search cross-validation.
+It does not distribute individual learning algorithms (unlike Spark MLlib).
+
+Installation
+------------
+
+This package is available on PYPI:
+
+::
+
+	pip install spark-sklearn
+
+This project is also available as as `Spark package <https://spark-packages.org/package/databricks/spark-sklearn>`_.
+
+The developer version has the following requirements:
+
+- a recent release of scikit-learn. Releases 0.18.1, 0.19.0 have been tested, older versions may work too.
+- Spark >= 2.1.1. Spark may be downloaded from the `Spark website <https://spark.apache.org/>`_.
+  In order to use this package, you need to use the pyspark interpreter or another Spark-compliant python
+  interpreter. See the `Spark guide <https://spark.apache.org/docs/latest/programming-guide.html#overview>`_
+  for more details.
+- `nose <https://nose.readthedocs.org>`_ (testing dependency only)
+- pandas, if using the pandas integration or testing. pandas==0.18 has been tested.
+
+If you want to use a developer version, you just need to make sure the ``python/`` subdirectory is in the
+``PYTHONPATH`` when launching the pyspark interpreter:
+
+::
+
+	PYTHONPATH=$PYTHONPATH:./python:$SPARK_HOME/bin/pyspark
+
+You can directly run tests:
+
+::
+
+    cd python && ./run-tests.sh
+
+This requires the environment variable ``SPARK_HOME`` to point to your local copy of Spark.
+
+Example
+-------
+
+Here is a simple example that runs a grid search with Spark. See the `Installation <#installation>`_ section
+on how to install the package.
+
+.. code:: python
+
+    from sklearn import svm, grid_search, datasets
+    from spark_sklearn import GridSearchCV
+    iris = datasets.load_iris()
+    parameters = {'kernel':('linear', 'rbf'), 'C':[1, 10]}
+    svr = svm.SVC()
+    clf = GridSearchCV(sc, svr, parameters)
+    clf.fit(iris.data, iris.target)
+
+This classifier can be used as a drop-in replacement for any scikit-learn classifier, with the same API.
+
+Documentation
+-------------
+
+`API documentation <http://databricks.github.io/spark-sklearn-docs>`_ is currently hosted on Github pages. To
+build the docs yourself, see the instructions in ``docs/``.
+
+.. image:: https://travis-ci.org/databricks/spark-sklearn.svg?branch=master
+    :target: https://travis-ci.org/databricks/spark-sklearn
diff --git a/python/MANIFEST.in b/python/MANIFEST.in
@@ -1 +1 @@
-include README.md
+include README.rst
diff --git a/python/README.md b/python/README.md
diff --git a/python/README.rst b/python/README.rst
@@ -0,0 +1 @@
+../README.rst
diff --git a/python/setup.cfg b/python/setup.cfg
@@ -1,2 +1,2 @@
 [metadata]
-description-file = README.md
+description-file = README.rst
diff --git a/python/setup.py b/python/setup.py
@@ -44,7 +44,7 @@ def read(*parts):
 	maintainer="Tim Hunter",
 	maintainer_email="timhunter@databricks.com",
 	keywords=KEYWORDS,
-	long_description=read("README.md"),
+	long_description=read("README.rst"),
 	packages=PACKAGES,
 	classifiers=CLASSIFIERS,
 	zip_safe=False,

Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		`-include README.md`
	`1`	`+include README.rst`
Original file line number	Diff line number	Diff line change
`@@ -1,2 +1,2 @@`
`1`	`1`	`[metadata]`
`2`		`-description-file = README.md`
	`2`	`+description-file = README.rst`