- 
                Notifications
    You must be signed in to change notification settings 
- Fork 132
Description
Hello,
I am trying to register a custom sklearn in a PiML experience, but I am getting this error:
File "/tmp/ipykernel_35500/19077422.py", line 76, in objective exp.register(piml_pipeline, "pipeline") File "piml/api.py", line 2691, in piml.api.Experiment.register File "piml/workflow/model_train_api.py", line 61, in piml.workflow.model_train_api.ModelAPI.register_model File "piml/workflow/pipeline.py", line 123, in piml.workflow.pipeline.ModelPipeline.get_data ValueError: could not convert string to float: 'DUMMY STR'
It seems like the get_data expect the input data to be preprocessed, however all my preprocessing steps are included in the sklearn pipeline. I want to have the entire pipeline as single object as I am going to test multiple pipelines with distinct preprocessing methods. The point here seems to be that the is a categorical column, that should be a problem I think.
Here is the code I used:
  # Define model
  model_pipeline = Pipeline([("model", CatBoostClassifier(verbose=0, cat_features=cat_features_idxs))])
  pre_processing_pipeline = Pipeline([
    ('inmputers', 
        ColumnTransformer(transformers=[
            ('numerical_imputer', SimpleImputer(missing_values=np.nan, strategy='mean'), NUMERICAL_COLS),
            ('categorical_imputer', SimpleImputer(missing_values=None, strategy='most_frequent'), CATEGORICAL_COLS)
           ])
       ),
   ])
  # Concat Pipelines
  pipeline = Pipeline([
      ('pre_processing', pre_processing_pipeline),
      ('model', model_pipeline)
  ])
    # Fit the pipeline, predict and evaluate
    pipeline.fit(X_train_, y_train_)
    exp = Experiment()
    piml_pipeline = exp.make_pipeline(pipeline, task_type="classification", train_x=X_train_, train_y=y_train_, test_x=X_val_, test_y=y_val_)
    exp.register(piml_pipeline, "pipeline")
Is there a way for me to make it work?
Thanks π