separate getting-started page from intro in doc

agoscinski · PicoCentauri · commit b0b7451a4b46 · 2023-08-15T09:36:12.000+02:00
* put intro of scikit-matter into the index page and abbreviate
* add getting started page which gives an overview of important
  implementations
* include existing introductory text for reconstruction measures
  into API reference so all introductory texts from the API are
  included into the getting started
* reword text a bit for more soundness within the getting started
* move doc and code corresponding to a module to __init__files to
  include them by automodule into the doc
diff --git a/docs/src/getting-started.rst b/docs/src/getting-started.rst
@@ -0,0 +1,46 @@
+Getting started
+===============
+
+This guide illustrates the main functionalities that ``scikit-matter`` provides. It
+assumes a very basic working knowledge of how ``scikit-learn`` works. Please refer to
+our :ref:`installation` instructions for installing ``scikit-matter``.
+
+For a detailed explaination of the functionalities, please look at the
+:ref:`selection-api`
+
+Features and Samples Selection
+------------------------------
+
+.. automodule:: skmatter._selection
+   :noindex:
+
+Notebook Examples
+^^^^^^^^^^^^^^^^^
+
+.. include:: examples/selection/index.rst
+   :start-line: 4
+
+
+Reconstruction Measures
+-----------------------
+
+.. automodule:: skmatter.metrics
+   :noindex:
+
+Notebook Examples
+^^^^^^^^^^^^^^^^^
+
+.. include:: examples/reconstruction/index.rst
+   :start-line: 4
+
+Principal Covariates Regression
+-------------------------------
+
+.. automodule:: skmatter.decomposition
+   :noindex:
+
+Notebook Examples
+^^^^^^^^^^^^^^^^^
+
+.. include:: examples/pcovr/index.rst
+   :start-line: 4
diff --git a/docs/src/gfrm.rst b/docs/src/gfrm.rst
@@ -1,31 +1,45 @@
 .. _gfrm:
 
 Reconstruction Measures
-======================================
+=======================
 
-.. currentmodule:: skmatter.metrics
+.. marker-reconstruction-introduction-begin
+
+.. automodule:: skmatter.metrics
+
+These reconstruction measures are available:
 
+* :ref:`GRE-api` (GRE) computes the amount of linearly-decodable information
+  recovered through a global linear reconstruction.
+* :ref:`GRD-api` (GRD) computes the amount of distortion contained in a global linear
+  reconstruction.
+* :ref:`LRE-api` (LRE) computes the amount of decodable information recovered through
+  a local linear reconstruction for the k-nearest neighborhood of each sample.
+
+.. marker-reconstruction-introduction-end
+
+.. currentmodule:: skmatter.metrics
 
 .. _GRE-api:
 
 Global Reconstruction Error
-###########################
+---------------------------
 
 .. autofunction:: pointwise_global_reconstruction_error
 .. autofunction:: global_reconstruction_error
 
 .. _GRD-api:
 
 Global Reconstruction Distortion
-################################
+--------------------------------
 
 .. autofunction:: pointwise_global_reconstruction_distortion
 .. autofunction:: global_reconstruction_distortion
 
 .. _LRE-api:
 
 Local Reconstruction Error
-##########################
+--------------------------
 
 .. autofunction:: pointwise_local_reconstruction_error
 .. autofunction:: local_reconstruction_error
diff --git a/docs/src/index.rst b/docs/src/index.rst
@@ -1,23 +1,11 @@
-scikit-matter documentation
-===========================
+scikit-matter
+=============
 
-``scikit-matter`` is a collection of `scikit-learn <https://scikit.org/>`_ compatible
-utilities that implement methods born out of the materials science and chemistry
-communities.
-
-Convenient-to-use libraries such as scikit-learn have accelerated the adoption and
-application of machine learning (ML) workflows and data-driven methods. Such libraries
-have gained great popularity partly because the implemented methods are generally
-applicable in multiple domains. While developments in the atomistic learning community
-have put forward general-use machine learning methods, their deployment is commonly
-entangled with domain-specific functionalities, preventing access to a wider audience.
-
-scikit-matter targets domain-agnostic implementations of methods developed in the
-computational chemical and materials science community, following the scikit-learn API
+scikit-matter is a toolbox of methods developed in the
+computational chemical and materials science community, following the
+`scikit-learn <https://scikit.org/>`_ API
 and coding guidelines to promote usability and interoperability with existing workflows.
-scikit-matter contains a toolbox of methods for unsupervised and supervised analysis of
-ML datasets, including the comparison, decomposition, and selection of features and
-samples.
+
 
 .. include:: ../../README.rst
    :start-after: marker-issues
@@ -27,9 +15,13 @@ samples.
   :maxdepth: 1
   :caption: Contents:
 
-  intro
+  getting-started
   installation
   reference
   tutorials
   contributing
   bibliography
+
+
+If you would like to contribute to scikit-matter, check out our :ref:`contributing`
+page!
diff --git a/docs/src/installation.rst b/docs/src/installation.rst
@@ -1,3 +1,5 @@
+.. _installation:
+
 .. include:: ../../README.rst
    :start-after: marker-installation
    :end-before: marker-ci-tests
diff --git a/docs/src/intro.rst b/docs/src/intro.rst
diff --git a/docs/src/selection.rst b/docs/src/selection.rst
@@ -1,67 +1,9 @@
+.. _selection-api:
 
 Feature and Sample Selection
 ============================
 
-`scikit-matter` contains multiple data sub-selection modules,
-primarily corresponding to methods derived from CUR matrix decomposition
-and Farthest Point Sampling. In their classical form, CUR and FPS determine
-a data subset that maximizes the
-variance (CUR) or distribution (FPS) of the features or samples. These methods
-can be modified to combine supervised and unsupervised learning, in a formulation
-denoted `PCov-CUR` and `PCov-FPS`.
-For further reading, refer to [Imbalzano2018]_ and [Cersonsky2021]_.
-
-These selectors can be used for both feature and sample selection, with similar
-instantiations. This can be executed using:
-
-.. doctest::
-
-    >>> # feature selection
-    >>> import numpy as np
-    >>> from skmatter.feature_selection import CUR, FPS, PCovCUR, PCovFPS
-    >>> selector = CUR(
-    ...     # the number of selections to make
-    ...     # if None, set to half the samples or features
-    ...     # if float, fraction of the total dataset to select
-    ...     # if int, absolute number of selections to make
-    ...     n_to_select=2,
-    ...     # option to use `tqdm <https://tqdm.github.io/>`_ progress bar
-    ...     progress_bar=True,
-    ...     # float, cutoff score to stop selecting
-    ...     score_threshold=1e-12,
-    ...     # boolean, whether to select randomly after non-redundant selections
-    ...     # are exhausted
-    ...     full=False,
-    ... )
-    >>> X = np.array(
-    ...     [
-    ...         [0.12, 0.21, 0.02],  # 3 samples, 3 features
-    ...         [-0.09, 0.32, -0.10],
-    ...         [-0.03, -0.53, 0.08],
-    ...     ]
-    ... )
-    >>> y = np.array([0.0, 0.0, 1.0])  # classes of each sample
-    >>> selector.fit(X)
-    CUR(n_to_select=2, progress_bar=True, score_threshold=1e-12)
-    >>> Xr = selector.transform(X)
-    >>> print(Xr.shape)
-    (3, 2)
-    >>> selector = PCovCUR(n_to_select=2)
-    >>> selector.fit(X, y)
-    PCovCUR(n_to_select=2)
-    >>> Xr = selector.transform(X)
-    >>> print(Xr.shape)
-    (3, 2)
-    >>>
-    >>> # Now sample selection
-    >>> from skmatter.sample_selection import CUR, FPS, PCovCUR, PCovFPS
-    >>> selector = CUR(n_to_select=2)
-    >>> selector.fit(X)
-    CUR(n_to_select=2)
-    >>> Xr = X[selector.selected_idx_]
-    >>> print(Xr.shape)
-    (2, 3)
-
+.. automodule:: skmatter._selection
 
 .. _CUR-api:
 
diff --git a/src/skmatter/_selection.py b/src/skmatter/_selection.py
@@ -1,5 +1,77 @@
-"""
-Sequential selection
+r"""
+This module contains data sub-selection modules primarily corresponding to
+methods derived from CUR matrix decomposition and Farthest Point Sampling. In
+their classical form, CUR and FPS determine a data subset that maximizes the
+variance (CUR) or distribution (FPS) of the features or samples.  These methods
+can be modified to combine supervised target information denoted by the methods
+`PCov-CUR` and `PCov-FPS`.  For further reading, refer to [Imbalzano2018]_ and
+[Cersonsky2021]_. These selectors can be used for both feature and sample
+selection, with similar instantiations. All sub-selection methods  scores each
+feature or sample (without an estimator) and chooses that with the maximum
+score. A simple example of usage:
+
+.. doctest::
+
+    >>> # feature selection
+    >>> import numpy as np
+    >>> from skmatter.feature_selection import CUR, FPS, PCovCUR, PCovFPS
+    >>> selector = CUR(
+    ...     # the number of selections to make
+    ...     # if None, set to half the samples or features
+    ...     # if float, fraction of the total dataset to select
+    ...     # if int, absolute number of selections to make
+    ...     n_to_select=2,
+    ...     # option to use `tqdm <https://tqdm.github.io/>`_ progress bar
+    ...     progress_bar=True,
+    ...     # float, cutoff score to stop selecting
+    ...     score_threshold=1e-12,
+    ...     # boolean, whether to select randomly after non-redundant selections
+    ...     # are exhausted
+    ...     full=False,
+    ... )
+    >>> X = np.array(
+    ...     [
+    ...         [0.12, 0.21, 0.02],  # 3 samples, 3 features
+    ...         [-0.09, 0.32, -0.10],
+    ...         [-0.03, -0.53, 0.08],
+    ...     ]
+    ... )
+    >>> y = np.array([0.0, 0.0, 1.0])  # classes of each sample
+    >>> selector.fit(X)
+    CUR(n_to_select=2, progress_bar=True, score_threshold=1e-12)
+    >>> Xr = selector.transform(X)
+    >>> print(Xr.shape)
+    (3, 2)
+    >>> selector = PCovCUR(n_to_select=2)
+    >>> selector.fit(X, y)
+    PCovCUR(n_to_select=2)
+    >>> Xr = selector.transform(X)
+    >>> print(Xr.shape)
+    (3, 2)
+    >>>
+    >>> # Now sample selection
+    >>> from skmatter.sample_selection import CUR, FPS, PCovCUR, PCovFPS
+    >>> selector = CUR(n_to_select=2)
+    >>> selector.fit(X)
+    CUR(n_to_select=2)
+    >>> Xr = X[selector.selected_idx_]
+    >>> print(Xr.shape)
+    (2, 3)
+
+These selectors are available:
+
+* :ref:`CUR-api`: a decomposition: an iterative feature selection method based upon the
+  singular value decoposition.
+* :ref:`PCov-CUR-api` decomposition extends upon CUR by using augmented right or left
+  singular vectors inspired by Principal Covariates Regression.
+* :ref:`FPS-api`: a common selection technique intended to exploit the diversity of
+  the input space. The selection of the first point is made at random or by a
+  separate metric
+* :ref:`PCov-FPS-api` extends upon FPS much like PCov-CUR does to CUR.
+* :ref:`Voronoi-FPS-api`: conduct FPS selection, taking advantage of Voronoi
+  tessellations to accelerate selection.
+* :ref:`DCH-api`: selects samples by constructing a directional convex hull and
+  determining which samples lie on the bounding surface.
 """
 
 import numbers
diff --git a/src/skmatter/decomposition/__init__.py b/src/skmatter/decomposition/__init__.py
@@ -1,6 +1,28 @@
-"""
-The :mod:`skmatter.decomposition` module includes the two distance
-measures, as defined by Principal Covariates Regression (PCovR)
+r"""
+Often, one wants to construct new ML features from their current representation
+in order to compress data or visualise trends in the dataset. In the archetypal
+method for this dimensionality reduction, principal components analysis (PCA),
+features are transformed into the latent space which best preserves the
+variance of the original data. This module provides the Principal Covariates
+Regression (PCovR), as introduced by [deJong1992]_, is a modification to PCA
+that incorporates target information, such that the resulting embedding could
+be tuned using a mixing parameter α to improve performance in regression tasks
+(:math:`\alpha = 0` corresponding to linear regression and :math:`\alpha = 1`
+corresponding to PCA).  [Helfrecht2020]_ introduced the non-linear version,
+Kernel Principal Covariates Regression (KPCovR), where the mixing parameter α
+now interpolates between kernel ridge regression (:math:`\alpha = 0`) and
+kernel principal components analysis (KPCA, :math:`\alpha = 1`).
+
+The module includes:
+
+* :ref:`PCovR-api` the standard Principal Covariates Regression. Utilises a
+  combination between a PCA-like and an LR-like loss, and therefore attempts to find
+  a low-dimensional projection of the feature vectors that simultaneously minimises
+  information loss and error in predicting the target properties using only the
+  latent space vectors :math:`\mathbf{T}`.
+* :ref:`KPCovR-api` the Kernel Principal Covariates Regression
+  a kernel-based variation on the
+  original PCovR method, proposed in [Helfrecht2020]_.
 """
 
 from ._kernel_pcovr import KernelPCovR
diff --git a/src/skmatter/metrics/__init__.py b/src/skmatter/metrics/__init__.py

Original file line number	Diff line number	Diff line change
`@@ -1,3 +1,5 @@`
	`1`	`+.. _installation:`
	`2`	`+`
`1`	`3`	`.. include:: ../../README.rst`
`2`	`4`	`:start-after: marker-installation`
`3`	`5`	`:end-before: marker-ci-tests`