You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -123,6 +126,10 @@ All presented methods are group-wise: here each station is imputed independently
123
126
Some methods require hyperparameters. The user can directly specify them, or rather determine them through an optimization step using the `search_params` dictionary. The keys are the imputation method's name and the values are a dictionary specifying the minimum, maximum or list of categories and type of values (Integer, Real, Category or a dictionary indexed by the variable names) to search.
124
127
In pratice, we rely on a cross validation to find the best hyperparams values minimizing an error reconstruction.
In order to compare the methods, we $i)$ artificially create missing data (for missing data mechanisms, see the docs); $ii)$ then impute it using the different methods chosen and $iii)$ calculate the reconstruction error. These three steps are repeated a number of times equal to `n_splits`. For each method, we calculate the average error and compare the final errors.
@@ -190,14 +210,12 @@ Concretely, the comparator takes as input a dataframe to impute, a proportion of
190
210
Note these metrics compute reconstruction errors; it tells nothing about the distances between the "true" and "imputed" distributions.
0 commit comments