Skip to content

Full Sklearn pipeline as External ModelΒ #53

@jp-varela

Description

@jp-varela

Hello,
I am trying to register a custom sklearn in a PiML experience, but I am getting this error:
File "/tmp/ipykernel_35500/19077422.py", line 76, in objective exp.register(piml_pipeline, "pipeline") File "piml/api.py", line 2691, in piml.api.Experiment.register File "piml/workflow/model_train_api.py", line 61, in piml.workflow.model_train_api.ModelAPI.register_model File "piml/workflow/pipeline.py", line 123, in piml.workflow.pipeline.ModelPipeline.get_data ValueError: could not convert string to float: 'DUMMY STR'

It seems like the get_data expect the input data to be preprocessed, however all my preprocessing steps are included in the sklearn pipeline. I want to have the entire pipeline as single object as I am going to test multiple pipelines with distinct preprocessing methods. The point here seems to be that the is a categorical column, that should be a problem I think.

Here is the code I used:

  # Define model
  model_pipeline = Pipeline([("model", CatBoostClassifier(verbose=0, cat_features=cat_features_idxs))])

  pre_processing_pipeline = Pipeline([
    ('inmputers', 
        ColumnTransformer(transformers=[
            ('numerical_imputer', SimpleImputer(missing_values=np.nan, strategy='mean'), NUMERICAL_COLS),
            ('categorical_imputer', SimpleImputer(missing_values=None, strategy='most_frequent'), CATEGORICAL_COLS)
           ])
       ),
   ])

  # Concat Pipelines
  pipeline = Pipeline([
      ('pre_processing', pre_processing_pipeline),
      ('model', model_pipeline)
  ])

    # Fit the pipeline, predict and evaluate
    pipeline.fit(X_train_, y_train_)

    exp = Experiment()
    piml_pipeline = exp.make_pipeline(pipeline, task_type="classification", train_x=X_train_, train_y=y_train_, test_x=X_val_, test_y=y_val_)
    exp.register(piml_pipeline, "pipeline")

Is there a way for me to make it work?
Thanks πŸ˜„

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions