You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* adding general description of the library
* adding intro page for new users with an overview of the
functionalities
* adding contribution guidelines for adding datasets submitting issues
and bugfixes
* tweaking reconstruction measures doc inside the code a bit
Copy file name to clipboardExpand all lines: docs/source/intro.rst
+27Lines changed: 27 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,4 +12,31 @@ Currently, scikit-COSMO contains models described in [Imbalzano2018]_, [Helfrech
12
12
as some modifications to sklearn functionalities and minimal datasets that are useful within the field
13
13
of computational materials science and chemistry.
14
14
15
+
16
+
17
+
- Fingerprint Selection:
18
+
Multiple data sub-selection modules, for selecting the most relevant features and samples out of a large set of candidates [Imbalzano2018]_, [Helfrecht2020]_ and [Cersonsky2021]_.
19
+
20
+
* :ref:`CUR-api` decomposition: an iterative feature selection method based upon the singular value decoposition.
21
+
* :ref:`PCov-CUR-api` decomposition extends upon CUR by using augmented right or left singular vectors inspired by Principal Covariates Regression.
22
+
* :ref:`FPS-api`: a common selection technique intended to exploit the diversity of the input space. The selection of the first point is made at random or by a separate metric.
23
+
* :ref:`PCov-FPS-api` extends upon FPS much like PCov-CUR does to CUR.
24
+
* :ref:`Voronoi-FPS-api`: conduct FPS selection, taking advantage of Voronoi tessellations to accelerate selection.
25
+
26
+
- Reconstruction Measures:
27
+
A set of easily-interpretable error measures of the relative information capacity of feature space `F` with respect to feature space `F'`.
28
+
The methods returns a value between 0 and 1, where 0 means that `F` and `F'` are completey distinct in terms of linearly-decodable information, and where 1 means that `F'` is contained in `F`.
29
+
All methods are implemented as the root mean-square error for the regression of the feature matrix `X_F'` (or sometimes called `Y` in the doc) from `X_F` (or sometimes called `X` in the doc) for transformations with different constraints (linear, orthogonal, locally-linear).
30
+
By default a custom 2-fold cross-validation :py:class:`skosmo.linear_model.RidgeRegression2FoldCV` is used to ensure the generalization of the transformation and efficiency of the computation, since we deal with a multi-target regression problem.
31
+
Methods were applied to compare different forms of featurizations through different hyperparameters and induced metrics and kernels [Goscinski2021]_ .
32
+
33
+
* :ref:`GRE-api` (GRE) computes the amount of linearly-decodable information recovered through a global linear reconstruction.
34
+
* :ref:`GRD-api` (GRD) computes the amount of distortion contained in a global linear reconstruction.
35
+
* :ref:`LRE-api` (LRE) computes the amount of decodable information recovered through a local linear reconstruction for the k-nearest neighborhood of each sample.
36
+
37
+
- Principal Covariates Regression
38
+
39
+
* PCovR: the standard Principal Covariates Regression [deJong1992]_. Utilises a combination between a PCA-like and an LR-like loss, and therefore attempts to find a low-dimensional projection of the feature vectors that simultaneously minimises information loss and error in predicting the target properties using only the latent space vectors $\mathbf{T}$ :ref:`PCovR-api`.
40
+
* Kernel Principal Covariates Regression (KPCovR) a kernel-based variation on the original PCovR method, proposed in [Helfrecht2020]_ :ref:`KPCovR-api`.
41
+
15
42
If you would like to contribute to scikit-COSMO, check out our :ref:`contributing` page!
0 commit comments