[BUG] BorutaShap Initialization with Unfitted Models

# Description
When initializing a `BorutaShap` object with an unfitted model (e.g., `RandomForestClassifier` or `RandomForestRegressor`), the `check_model` method immediately tries to check the presence of the `feature_importances_` attribute. However, this attribute is only available after the model is fitted, leading to the following issue:

**Unfitted Models**: If the base model is not yet fitted, calling check_model triggers a `NotFittedError`. This results in the attribute check_feature_importance being set to False. When the importance_measure is set to 'gini', this later raises:

```python
AttributeError('Model must contain the feature_importances_ method to use Gini try Shap instead')
```

# Behavior Explanation
- No Model Provided: When no model is passed during initialization, check_model internally initializes a new Random Forest model. Since these models are pre-defined and known to support gini importance measures, the check is skipped. This is why the issue does not occur when a model is not explicitly provided.

- Custom Model Passed: If a custom Random Forest model is passed (even if it's of the same type as the default), the immediate check for `feature_importances_` fails because the model has not been fitted, causing the error.

# Steps to Reproduce
```python
from borutashap import BorutaShap
from sklearn.ensemble import RandomForestClassifier

# Pass an unfitted model
model = RandomForestClassifier()
boruta_shap = BorutaShap(model=model, importance_measure='gini')
```

# Suggested Solutions
Below are three potential solutions to address this issue:

1. **Pre-fit the Model on a Simple Dataset**: Before checking the presence of the feature_importances_ attribute, clone the provided model and fit it on a predefined small dataset (aligned with the problem type: classification or regression). This ensures minimal performance impact and requires little modification to the existing codebase.

Example:
```python
from sklearn.base import clone
from sklearn.datasets import make_classification, make_regression

def prefit_model(model, classification=True):
    if classification:
        X, y = make_classification(n_samples=10, n_features=5)
    else:
        X, y = make_regression(n_samples=10, n_features=5)
    model_clone = clone(model)
    model_clone.fit(X, y)
    return model_clone
```
This can then be used during `check_model` to validate the `feature_importances_` attribute.

2. **Defer `feature_importances_` Check Until Needed**: Postpone the check for feature_importances_ until the first usage of the feature importance, typically after the model's fit method has been called. However, this could delay the discovery of compatibility issues, especially for models with long training times, potentially leading to user frustration.

3. **Explicitly Require Fitted Models**: Add a model fit check (e.g., using `sklearn.utils.validation.check_is_fitted`) before attempting to access `feature_importances_`. This would provide immediate feedback to the user if the model is not ready. While practical, this may not align well with typical usage patterns.

# Preferred Solution
From my perspective, Option (1) (pre-fitting the model on a simple dataset) strikes a balance between performance, user experience, and minimal codebase changes. 

# Additional Notes
The current behavior of bypassing the `feature_importances_` check for internally initialized models is intentional and appropriate. However, it highlights why user-provided models, even of the same type (e.g., `RandomForestClassifier`), fail if not pre-fitted. This discrepancy should be addressed to provide a consistent and user-friendly experience.

@Ekeany Let me know if more details or assistance are needed! 😊


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] BorutaShap Initialization with Unfitted Models #136

Description

Behavior Explanation

Steps to Reproduce

Suggested Solutions

Preferred Solution

Additional Notes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[BUG] BorutaShap Initialization with Unfitted Models #136

Description

Description

Behavior Explanation

Steps to Reproduce

Suggested Solutions

Preferred Solution

Additional Notes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions