88 format_version : ' 1.3'
99 jupytext_version : 1.14.5
1010 kernelspec :
11- display_name : Python 3 (ipykernel)
11+ display_name : env_qolmat_dev
1212 language : python
13- name : python3
13+ name : env_qolmat_dev
1414---
1515
1616** This notebook aims to present the Qolmat repo through an example of a multivariate time series.
@@ -62,24 +62,24 @@ The dataset `Beijing` is the Beijing Multi-Site Air-Quality Data Set. It consist
6262This dataset only contains numerical vairables.
6363
6464``` python
65- # df_data = data.get_data_corrupted("Beijing", ratio_masked=.2, mean_size=120)
65+ df_data = data.get_data_corrupted(" Beijing" , ratio_masked = .2 , mean_size = 120 )
6666
6767# cols_to_impute = ["TEMP", "PRES", "DEWP", "NO2", "CO", "O3", "WSPM"]
6868# cols_to_impute = df_data.columns[df_data.isna().any()]
69- # cols_to_impute = ["TEMP", "PRES"]
69+ cols_to_impute = [" TEMP" , " PRES" ]
7070
7171```
7272
7373The dataset ` Artificial ` is designed to have a sum of a periodical signal, a white noise and some outliers.
7474
7575``` python
76- df_data = data.get_data_corrupted(" Artificial" , ratio_masked = .2 , mean_size = 10 )
77- cols_to_impute = [" signal" ]
76+ # df_data = data.get_data_corrupted("Artificial", ratio_masked=.2, mean_size=10)
77+ # cols_to_impute = ["signal"]
7878```
7979
8080``` python
81- df_data = data.get_data(" SNCF" , n_groups_max = 2 )
82- cols_to_impute = [" val_in" ]
81+ # df_data = data.get_data("SNCF", n_groups_max=2)
82+ # cols_to_impute = ["val_in"]
8383```
8484
8585``` python
@@ -132,14 +132,14 @@ imputer_nocb = imputers.ImputerNOCB(groups=["station"])
132132imputer_interpol = imputers.ImputerInterpolation(groups = [" station" ], method = " linear" )
133133imputer_spline = imputers.ImputerInterpolation(groups = [" station" ], method = " spline" , order = 2 )
134134imputer_shuffle = imputers.ImputerShuffle(groups = [" station" ])
135- imputer_residuals = imputers.ImputerResiduals(groups = [" station" ], period = 7 , model_tsa = " additive" , extrapolate_trend = " freq" , method_interpolation = " linear" )
135+ imputer_residuals = imputers.ImputerResiduals(groups = [" station" ], period = 365 , model_tsa = " additive" , extrapolate_trend = " freq" , method_interpolation = " linear" )
136136
137- imputer_rpca = imputers.ImputerRPCA(groups = [" station" ], columnwise = True , period = 7 , max_iter = 1000 , tau = 2 , lam = 1 )
137+ imputer_rpca = imputers.ImputerRPCA(groups = [" station" ], columnwise = False , max_iter = 256 , tau = 2 , lam = 1 )
138138# imputer_rpca_opti = imputers.ImputerRPCA(groups=["station"], columnwise=True, period=7, max_iter=100)
139139
140140imputer_ou = imputers.ImputerEM(groups = [" station" ], model = " multinormal" , method = " sample" , max_iter_em = 34 , n_iter_ou = 15 , dt = 1e-3 )
141141imputer_tsou = imputers.ImputerEM(groups = [" station" ], model = " VAR1" , method = " sample" , max_iter_em = 34 , n_iter_ou = 15 , dt = 1e-3 )
142- imputer_tsmle = imputers.ImputerEM(groups = [" station" ], model = " VAR1" , method = " mle" , max_iter_em = 100 , n_iter_ou = 15 , dt = 1e-3 , period = 7 )
142+ imputer_tsmle = imputers.ImputerEM(groups = [" station" ], model = " VAR1" , method = " mle" , max_iter_em = 100 , n_iter_ou = 15 , dt = 1e-3 )
143143
144144
145145imputer_knn = imputers.ImputerKNN(groups = [" station" ], k = 10 )
@@ -155,7 +155,7 @@ dict_imputers = {
155155 " shuffle" : imputer_shuffle,
156156 # "residuals": imputer_residuals,
157157 # "OU": imputer_ou,
158- # "TSOU": imputer_tsou,
158+ " TSOU" : imputer_tsou,
159159 " TSMLE" : imputer_tsmle,
160160 " RPCA" : imputer_rpca,
161161 # "RPCA_opti": imputer_rpca_opti,
@@ -184,9 +184,6 @@ In order to compare the methods, we $i)$ artificially create missing data (for m
184184</p >
185185
186186
187- ``` python
188- imputer_tsmle.hyperparams_user
189- ```
190187
191188Concretely, the comparator takes as input a dataframe to impute, a proportion of nan to create, a dictionary of imputers (those previously mentioned), a list with the columns names to impute, a generator of holes specifying the type of holes to create and the search dictionary search_params for hyperparameter optimization.
192189
0 commit comments