-
Notifications
You must be signed in to change notification settings - Fork 74
Description
Hello again Cédric,
Following your help on transformer I am now trying to use a GridSearch to optimize the hyperparameters of a RandomForest.
I have a pipeline with lots of transformer which works great with Cross Validation and actual prediction, however I get a type error when trying to use it in a GridSearchCV, it seems like there is an extra argument of type ScikitLearn.Skcore.ParameterGrid in my setup :
pipe = Pipelines.Pipeline([ # This is working fine for cross validation, fitting and predicting
("extract_deck",PP_DeckTransformer()),
... # A list of 15 transformers
("featurize", mapper), # This is a DataFrameMapper to convert to Array
("forest", RandomForestClassifier(ntrees=200)) #Hyperparam: nsubfeatures, partialsampling, maxdepth
])
X_train = train
Y_train = convert(Array, train[:Survived])
# #Cross Validation - check model accuracy -- This is working fine
# crossval = round(cross_val_score(pipe, X_train, Y_train, cv =10), 2)
# print("\n",crossval,"\n")
# print(mean(crossval))
# GridSearch
grid = Dict(:ntrees => 10:30:240,
:nsubfeatures => 0:1:13,
:partialsampling => 0.2:0.1:1.0,
:maxdepth => -1:2:13
)
gridsearch = GridSearchCV(pipe, grid)
fit!(gridsearch, X_train, Y_train)
println("Best hyper-parameters: $(gridsearch.best_params_)")The error I get is :
ERROR: LoadError: MethodError: no method matching _fit!(::ScikitLearn.Skcore.GridSearchCV, ::DataFrames.DataFrame, ::Array{Int64,1}, ::ScikitLearn.Skcore.ParameterGrid)
Closest candidates are:
_fit!(::ScikitLearn.Skcore.BaseSearchCV, !Matched::AbstractArray{T,N}, ::Any, ::Any) at /Users/<user>/.julia/v0.5/ScikitLearn/src/grid_search.jl:254
in fit!(::ScikitLearn.Skcore.GridSearchCV, ::DataFrames.DataFrame, ::Array{Int64,1}) at /Users/<user>/.julia/v0.5/ScikitLearn/src/grid_search.jl:526
in include_from_node1(::String) at ./loading.jl:488
in include_from_node1(::String) at /usr/local/Cellar/julia/0.5.0/lib/julia/sys.dylib:?
in process_options(::Base.JLOptions) at ./client.jl:262
in _start() at ./client.jl:318
in _start() at /usr/local/Cellar/julia/0.5.0/lib/julia/sys.dylib:?
while loading /Users/<path>/Kaggle-001-Julia-MagicalForest.jl, in expression starting on line 538So the proc is receiving _fit!(::ScikitLearn.Skcore.GridSearchCV, ::DataFrames.DataFrame, ::Array{Int64,1}, ::ScikitLearn.Skcore.ParameterGrid) but expecting an array instead of a Dataframe. The thing is it should have been converted away by the DataFrameMapper.
If needed the full code is there https://github.com/mratsim/MachineLearning_Kaggle/blob/9c07a64a981a6512e021ae01623212a278fd05d1/Kaggle%20-%20001%20-%20Titanic%20Survivors/Kaggle-001-Julia-MagicalForest.jl#L530