Welcome to verstack Discussions! #3
Replies: 10 comments 61 replies
-
Hi Danil, I wanted to try it myself but I cannot install it via pip install. All troubles start 'Preparing metadata (pyproject.toml). The error code is 'subprocess-exited-with-error' Do you have any advice to me on how to get through? I am using the latest scikit-learn version Thank you, |
Beta Was this translation helpful? Give feedback.
-
this is very unlikely, but try to downgrade pandas to 1.3.0 |
Beta Was this translation helpful? Give feedback.
-
Okay, And regarding the n_estimators - I have tried three experiments: two classifications and one regression and in each of them I've received a different number of n_estimators.
Seems to be working as expected. Let's have a look on the LGBMTuner dependencies versions you have installed.
I've got: lightgbm: 3.3.2 |
Beta Was this translation helpful? Give feedback.
-
Hi Danil, I started testing scsplit. It looks that it introduces NaN's to the dataframe it is splitting into train and test: I use dataframe "dataset" with continuous labels in the last column. Before using scsplit "train, test = scsplit(dataset, stratify=dataset['trns_cnt'])" I run test on NaN's "dataset.isna().sum().sum()" with zero result, but when I run scsplit "X_train, X_test, y_train, y_test = scsplit(X, y, stratify=y, test_size= 0.3)" I get "ValueError: Input y contains NaN". What could be the reason for that? Thank you, Nick |
Beta Was this translation helpful? Give feedback.
-
Hi Danil, Hope you had nice New Year celebration. Have a nice Staryi Novyi God as well. I wanted to share with you some thoughts on the package related to dealing with category features. In my view, lgbm has great advantage in treating cat features especially when you deal with very large datasets - you do not need to deal with onehot encoding blowing up memory requirements for the sets with very large number of records. When I use verstack to get integration with optuna I am forced to do encoding of cat columns and this severely limits the size of the dataset I can process with verstack. Do you plan to eliminate the need for cat columns encoding to enable the use of your package with large datasets? Nikolay |
Beta Was this translation helpful? Give feedback.
-
Yes, Danil. I was talking about LGBMTuner when I mentioned Optuna. My point here is very simple - LightGBM has great built-in capability to handle categorical features. Why would one give them up and add processing these features in additional step at the expense of performance and resources? In my case, for instance, the dataset size limitation is prohibitively high to the extent that I have to give up LGBMTuner and use Optima together with lgbm like there is no verstack. So, it's like a wish list for me - using both great integration of verstack with Optuna without separate encoding of categorical features (which already can be done by lgbm itself). |
Beta Was this translation helpful? Give feedback.
-
Hi Danil, the version of verstack in conda is 0.4.0 while the latest pip version is 3.6.6. Is it possible to push conda to update your verstack? Nikolay |
Beta Was this translation helpful? Give feedback.
-
I know, but it is for cloud, and the very old version of verstack in conda is considered a risk by our IT :-( |
Beta Was this translation helpful? Give feedback.
-
Hi Danil, how can I suppress automatic output of "Best threshold(s)" while running ThreshTuner()? Nick |
Beta Was this translation helpful? Give feedback.
-
@nicktishchenko
It is reflected in the documentation 'LGBMTuner/Examples' : https://verstack.readthedocs.io/en/latest/#id16 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
👋 Welcome!
We’re using Discussions as a place to connect with other members of our community. We hope that you:
build together 💪.
To get started, comment below with an introduction of yourself and tell us about what you do with this community.
Beta Was this translation helpful? Give feedback.
All reactions