Skip to content
This repository was archived by the owner on Dec 6, 2023. It is now read-only.

ValueError: all the input array dimensions except for the concatenation axis must match exactly #29

@goerch

Description

@goerch

The following cell from my notebook

from sklearn.feature_selection import VarianceThreshold
from sklearn.feature_selection import SelectFpr, SelectFdr, SelectFwe
from sklearn.preprocessing import FunctionTransformer, StandardScaler, MaxAbsScaler, RobustScaler
from sklearn.linear_model import LogisticRegression, LogisticRegressionCV
from sklearn.pipeline import Pipeline
from stability_selection import StabilitySelection
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_auc_score
vt = VarianceThreshold()
scl = RobustScaler(with_centering=False)
lr = LogisticRegression(random_state=42, solver='lbfgs', C=.05, max_iter=10000, n_jobs=-1, verbose=1)
pipe = Pipeline([('vt', vt), ('scl', scl), ('lr', lr)])
clf = StabilitySelection(base_estimator=pipe, lambda_name='lr__C', lambda_grid=np.array([.075, .05, .025]), n_bootstrap_iterations=3)

gives the output

[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:   34.1s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:   32.4s finished
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers.
[Parallel(n_jobs=-1)]: Done   1 out of   1 | elapsed:   32.9s finished

and then the traceback

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-41-f62a92b4924d> in <module>
     38 clf = StabilitySelection(base_estimator=pipe, lambda_name='lr__C', lambda_grid=np.array([.075, .05, .025]), n_bootstrap_iterations=3)
     39 
---> 40 clf.fit(X_train, y_train)
     41 #print(lr.C_)
     42 print(pd.Series(features)[vt.get_support(indices=True)[sel.get_support(indices=True)[clf.get_support()]]].sort_values().tail(60))

/opt/conda/lib/python3.6/site-packages/stability_selection/stability_selection.py in fit(self, X, y)
    344               for subsample in bootstrap_samples)
    345 
--> 346             stability_scores[:, idx] = np.vstack(selected_variables).mean(axis=0)
    347 
    348         self.stability_scores_ = stability_scores

/opt/conda/lib/python3.6/site-packages/numpy/core/shape_base.py in vstack(tup)
    281     """
    282     _warn_for_nonsequence(tup)
--> 283     return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
    284 
    285 

ValueError: all the input array dimensions except for the concatenation axis must match exactly

X_train is a scipy crs_matrix. Is this expected behaviour?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions