Skip to content

Commit 382350d

Browse files
Copilotthinkall
andcommitted
Add documentation for preprocess() API methods
Co-authored-by: thinkall <3197038+thinkall@users.noreply.github.com>
1 parent ca40748 commit 382350d

File tree

1 file changed

+58
-0
lines changed

1 file changed

+58
-0
lines changed

website/docs/Use-Cases/Task-Oriented-AutoML.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -704,6 +704,64 @@ plt.barh(
704704

705705
![png](images/feature_importance.png)
706706

707+
### Preprocess data
708+
709+
FLAML provides two levels of preprocessing that can be accessed as public APIs:
710+
711+
1. **Task-level preprocessing** (`automl.preprocess()`): This applies transformations that are specific to the task type, such as handling data types, sparse matrices, and feature transformations learned during training.
712+
713+
2. **Estimator-level preprocessing** (`estimator.preprocess()`): This applies transformations specific to the estimator type (e.g., LightGBM, XGBoost).
714+
715+
The task-level preprocessing should be applied before the estimator-level preprocessing.
716+
717+
#### Task-level preprocessing
718+
719+
```python
720+
from flaml import AutoML
721+
import numpy as np
722+
723+
# Train the model
724+
automl = AutoML()
725+
automl.fit(X_train, y_train, task="classification", time_budget=60)
726+
727+
# Apply task-level preprocessing to new data
728+
X_test_preprocessed = automl.preprocess(X_test)
729+
730+
# Now you can use this with the estimator
731+
predictions = automl.model.predict(X_test_preprocessed)
732+
```
733+
734+
#### Estimator-level preprocessing
735+
736+
```python
737+
# Get the trained estimator
738+
estimator = automl.model
739+
740+
# Apply task-level preprocessing first
741+
X_test_task = automl.preprocess(X_test)
742+
743+
# Then apply estimator-level preprocessing
744+
X_test_estimator = estimator.preprocess(X_test_task)
745+
746+
# Use the fully preprocessed data with the underlying model
747+
predictions = estimator._model.predict(X_test_estimator)
748+
```
749+
750+
#### Complete preprocessing pipeline
751+
752+
For most use cases, the `predict()` method already handles both levels of preprocessing internally. However, if you need to apply preprocessing separately (e.g., for custom inference pipelines or debugging), you can use:
753+
754+
```python
755+
# Complete preprocessing pipeline
756+
X_task_preprocessed = automl.preprocess(X_test)
757+
X_final = automl.model.preprocess(X_task_preprocessed)
758+
759+
# This is equivalent to what happens internally in:
760+
predictions = automl.predict(X_test)
761+
```
762+
763+
**Note**: The `preprocess()` methods can only be called after `fit()` has been executed, as they rely on the transformations learned during training.
764+
707765
### Get best configuration
708766

709767
We can find the best estimator's name and best configuration by:

0 commit comments

Comments
 (0)