This repository was archived by the owner on Dec 6, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 60
ValueError: all the input array dimensions except for the concatenation axis must match exactly #29
Copy link
Copy link
Open
Description
The following cell from my notebook
from sklearn.feature_selection import VarianceThreshold
from sklearn.feature_selection import SelectFpr, SelectFdr, SelectFwe
from sklearn.preprocessing import FunctionTransformer, StandardScaler, MaxAbsScaler, RobustScaler
from sklearn.linear_model import LogisticRegression, LogisticRegressionCV
from sklearn.pipeline import Pipeline
from stability_selection import StabilitySelection
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_auc_score
vt = VarianceThreshold()
scl = RobustScaler(with_centering=False)
lr = LogisticRegression(random_state=42, solver='lbfgs', C=.05, max_iter=10000, n_jobs=-1, verbose=1)
pipe = Pipeline([('vt', vt), ('scl', scl), ('lr', lr)])
clf = StabilitySelection(base_estimator=pipe, lambda_name='lr__C', lambda_grid=np.array([.075, .05, .025]), n_bootstrap_iterations=3)
gives the output
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done 1 out of 1 | elapsed: 34.1s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done 1 out of 1 | elapsed: 32.4s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done 1 out of 1 | elapsed: 32.9s finished
and then the traceback
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-41-f62a92b4924d> in <module>
38 clf = StabilitySelection(base_estimator=pipe, lambda_name='lr__C', lambda_grid=np.array([.075, .05, .025]), n_bootstrap_iterations=3)
39
---> 40 clf.fit(X_train, y_train)
41 #print(lr.C_)
42 print(pd.Series(features)[vt.get_support(indices=True)[sel.get_support(indices=True)[clf.get_support()]]].sort_values().tail(60))
/opt/conda/lib/python3.6/site-packages/stability_selection/stability_selection.py in fit(self, X, y)
344 for subsample in bootstrap_samples)
345
--> 346 stability_scores[:, idx] = np.vstack(selected_variables).mean(axis=0)
347
348 self.stability_scores_ = stability_scores
/opt/conda/lib/python3.6/site-packages/numpy/core/shape_base.py in vstack(tup)
281 """
282 _warn_for_nonsequence(tup)
--> 283 return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
284
285
ValueError: all the input array dimensions except for the concatenation axis must match exactly
X_train is a scipy crs_matrix. Is this expected behaviour?
Metadata
Metadata
Assignees
Labels
No labels