Skip to content

Commit 73ae899

Browse files
Julien RousselJulien Roussel
authored andcommitted
readme updated
1 parent c2bf976 commit 73ae899

File tree

1 file changed

+32
-43
lines changed

1 file changed

+32
-43
lines changed

README.rst

Lines changed: 32 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -102,87 +102,76 @@ The full documentation can be found `on this link <https://qolmat.readthedocs.io
102102

103103
**How does Qolmat work ?**
104104

105-
Qolmat simplifies the process of selecting a data imputation algorithm by comparing various methods based on different evaluation metrics. It is compatible with scikit-learn. The evaluation and comparison are based on the standard approach of selecting certain observations, setting their status to missing, and comparing their imputed values with their true values.
105+
Qolmat allows model selection for scikit-learn compatible imputation algorithms, by performing three steps pictured below:
106+
1) For each of the N folds, Qolmat artificially masks a set of observed values using a default or user specified `hole generator <explanation.html#hole-generator>`_,
107+
2) For each fold and each compared `imputation method <imputers.html>`_, Qolmat fills both the missing and the masked values, then computes each of the default or user specified `performance metrics <explanation.html#metrics>`_.
108+
3) For each compared imputer, Qolmat pools the computed metrics from the N folds into a single value.
106109

107-
More specifically, from the initial dataframe with missing value, we generate additional missing values (N folds).
108-
On each sample, different imputation models are tested and reconstruction errors are computed on these artificially missing entries. Then the errors of each imputation model are averaged and we eventually obtained a unique error score per model. This procedure allows the comparison of different models on the same dataset.
110+
This is very similar in spirit to the `cross_val_score <https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html>`_ function for scikit-learn.
109111

110112
.. image:: https://raw.githubusercontent.com/Quantmetry/qolmat/main/docs/images/schema_qolmat.png
111113
:align: center
112114

113115
**Imputation methods**
114116

115-
The following table contains the available imputation methods:
117+
The following table contains the available imputation methods.
116118

117119
.. list-table::
118120
:widths: 25 70 15 15 20
119121
:header-rows: 1
120122

121123
* - Method
122124
- Description
123-
- Tabular
124-
- Time series
125-
- Minimised criterion
125+
- Tabular or Time series
126+
- Single or multiple
126127
* - mean
127128
- Imputes the missing values using the mean along each column
128-
- yes
129-
- no
130-
- point
129+
- tabular
130+
- single
131131
* - median
132132
- Imputes the missing values using the median along each column
133-
- yes
134-
- no
135-
- point
133+
- tabular
134+
- single
136135
* - LOCF
137136
- Imputes missing entries by carrying the last observation forward for each columns
138-
- yes
139-
- yes
140-
- point
137+
- time series
138+
- single
141139
* - shuffle
142140
- Imputes missing entries with the random value of each column
143-
- yes
144-
- no
145-
- point
141+
- tabular
142+
- multiple
146143
* - interpolation
147144
- Imputes missing using some interpolation strategies supported by pd.Series.interpolate
148-
- yes
149-
- yes
150-
- point
145+
- time series
146+
- single
151147
* - impute on residuals
152148
- The series are de-seasonalised, residuals are imputed via linear interpolation, then residuals are re-seasonalised
153-
- no
154-
- yes
155-
- point
149+
- time series
150+
- single
156151
* - MICE
157152
- Multiple Imputation by Chained Equation
158-
- yes
159-
- no
160-
- point
153+
- tabular
154+
- both
161155
* - RPCA
162156
- Robust Principal Component Analysis
163-
- yes
164-
- yes
165-
- point
157+
- both
158+
- single
166159
* - SoftImpute
167160
- Iterative method for matrix completion that uses nuclear-norm regularization
168-
- yes
169-
- no
170-
- point
161+
- tabular
162+
- single
171163
* - KNN
172164
- K-nearest kneighbors
173-
- yes
174-
- no
175-
- point
165+
- tabular
166+
- single
176167
* - EM sampler
177168
- Imputes missing values via EM algorithm
178-
- yes
179-
- yes
180-
- point/distribution
169+
- both
170+
- both
181171
* - TabDDPM
182172
- Imputer based on Denoising Diffusion Probabilistic Models
183-
- yes
184-
- yes
185-
- distribution
173+
- both
174+
- both
186175

187176

188177

0 commit comments

Comments
 (0)