You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.rst
+32-43Lines changed: 32 additions & 43 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -102,87 +102,76 @@ The full documentation can be found `on this link <https://qolmat.readthedocs.io
102
102
103
103
**How does Qolmat work ?**
104
104
105
-
Qolmat simplifies the process of selecting a data imputation algorithm by comparing various methods based on different evaluation metrics. It is compatible with scikit-learn. The evaluation and comparison are based on the standard approach of selecting certain observations, setting their status to missing, and comparing their imputed values with their true values.
105
+
Qolmat allows model selection for scikit-learn compatible imputation algorithms, by performing three steps pictured below:
106
+
1) For each of the N folds, Qolmat artificially masks a set of observed values using a default or user specified `hole generator <explanation.html#hole-generator>`_,
107
+
2) For each fold and each compared `imputation method <imputers.html>`_, Qolmat fills both the missing and the masked values, then computes each of the default or user specified `performance metrics <explanation.html#metrics>`_.
108
+
3) For each compared imputer, Qolmat pools the computed metrics from the N folds into a single value.
106
109
107
-
More specifically, from the initial dataframe with missing value, we generate additional missing values (N folds).
108
-
On each sample, different imputation models are tested and reconstruction errors are computed on these artificially missing entries. Then the errors of each imputation model are averaged and we eventually obtained a unique error score per model. This procedure allows the comparison of different models on the same dataset.
110
+
This is very similar in spirit to the `cross_val_score <https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html>`_ function for scikit-learn.
0 commit comments