You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is possible to define a parameter dictionary for an imputer with three pieces of information: min, max and type. The aim of the dictionary is to determine the optimal parameters for data imputation. Here, we call this dictionary ``dict_config_opti``.
148
-
149
-
.. code-block:: python
150
-
151
-
search_params = {
152
-
"RPCA_opti": {
153
-
"tau": {"min": .5, "max": 5, "type":"Real"},
154
-
"lam": {"min": .1, "max": 1, "type":"Real"},
155
-
}
83
+
"VAR(1) process": imputer_var1
156
84
}
157
-
158
-
Then with the comparator function in ``from qolmat.benchmark import comparator``, we can compare the different imputation methods. This **does not use knowledge on missing values**, but it relies data masking instead. For more details on how imputors and comparator work, please see the following `link <https://qolmat.readthedocs.io/en/latest/explanation.html>`_.
Qolmat allows model selection for scikit-learn compatible imputation algorithms, by performing three steps pictured below:
106
+
1) For each of the K folds, Qolmat artificially masks a set of observed values using a default or user specified `hole generator <explanation.html#hole-generator>`_,
107
+
2) For each fold and each compared `imputation method <imputers.html>`_, Qolmat fills both the missing and the masked values, then computes each of the default or user specified `performance metrics <explanation.html#metrics>`_.
108
+
3) For each compared imputer, Qolmat pools the computed metrics from the K folds into a single value.
186
109
187
-
plt.figure(figsize=(25,5))
188
-
plt.plot(df['y'],'.g')
189
-
plt.plot(dfs_imputed['y'],'.r')
190
-
plt.plot(df_with_nan['y'],'.b')
191
-
plt.show()
110
+
This is very similar in spirit to the `cross_val_score <https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_val_score.html>`_ function for scikit-learn.
The following table contains the available imputation methods. We distinguish single imputation methods (aiming for pointwise accuracy, mostly deterministic) from multiple imputation methods (aiming for distribution similarity, mostly stochastic).
118
+
119
+
.. list-table::
120
+
:widths: 25 70 15 15
121
+
:header-rows: 1
122
+
123
+
* - Method
124
+
- Description
125
+
- Tabular or Time series
126
+
- Single or Multiple
127
+
* - mean
128
+
- Imputes the missing values using the mean along each column
129
+
- tabular
130
+
- single
131
+
* - median
132
+
- Imputes the missing values using the median along each column
133
+
- tabular
134
+
- single
135
+
* - LOCF
136
+
- Imputes missing entries by carrying the last observation forward for each columns
137
+
- time series
138
+
- single
139
+
* - shuffle
140
+
- Imputes missing entries with the random value of each column
141
+
- tabular
142
+
- multiple
143
+
* - interpolation
144
+
- Imputes missing using some interpolation strategies supported by pd.Series.interpolate
145
+
- time series
146
+
- single
147
+
* - impute on residuals
148
+
- The series are de-seasonalised, residuals are imputed via linear interpolation, then residuals are re-seasonalised
149
+
- time series
150
+
- single
151
+
* - MICE
152
+
- Multiple Imputation by Chained Equation
153
+
- tabular
154
+
- both
155
+
* - RPCA
156
+
- Robust Principal Component Analysis
157
+
- both
158
+
- single
159
+
* - SoftImpute
160
+
- Iterative method for matrix completion that uses nuclear-norm regularization
161
+
- tabular
162
+
- single
163
+
* - KNN
164
+
- K-nearest kneighbors
165
+
- tabular
166
+
- single
167
+
* - EM sampler
168
+
- Imputes missing values via EM algorithm
169
+
- both
170
+
- both
171
+
* - TabDDPM
172
+
- Imputer based on Denoising Diffusion Probabilistic Models
173
+
- both
174
+
- both
196
175
197
-
📘 Documentation
198
-
================
199
176
200
-
The full documentation can be found `on this link <https://qolmat.readthedocs.io/en/latest/>`_.
201
177
202
178
📝 Contributing
203
179
===============
@@ -222,8 +198,6 @@ Qolmat has been developed by Quantmetry.
222
198
🔍 References
223
199
==============
224
200
225
-
Qolmat methods belong to the field of conformal inference.
226
-
227
201
[1] Candès, Emmanuel J., et al. “Robust principal component analysis?.”
228
202
Journal of the ACM (JACM) 58.3 (2011): 1-37,
229
203
(`pdf <https://arxiv.org/abs/0912.3599>`__)
@@ -234,15 +208,13 @@ Journal of advanced transportation 2018 (2018).
0 commit comments