Skip to content

Commit afc0518

Browse files
10.18.1
1 parent 420e2ab commit afc0518

File tree

5 files changed

+76
-6
lines changed

5 files changed

+76
-6
lines changed

CHANGELOG.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
## [10.18.1] - 2025-10-30
6+
7+
### Fixed
8+
- **Improved Backward Compatibility for Saved Models:** Resolved an issue where loading models trained with older versions of `aplr` would fail due to missing attributes. The `__setstate__` method now initializes new preprocessing-related attributes to `None` for older models, ensuring they can be loaded and used without `AttributeError` exceptions.
9+
- **Stability for Unfitted Models:** Fixed a crash that occurred when calling `predict` on an unfitted `APLRClassifier`. The model now correctly raises a `RuntimeError` with an informative message in this scenario, improving stability and user feedback.
10+
- **Restored Flexibility for `X_names` Parameter:** Fixed a regression from v10.18.0 where the `X_names` parameter no longer accepted `numpy.ndarray` or other list-like inputs. The parameter now correctly handles these types again, restoring flexibility for non-DataFrame inputs.
11+
12+
## [10.18.0] - 2025-10-29
13+
14+
### Added
15+
- **Automatic Data Preprocessing with `pandas.DataFrame`**:
16+
- When a `pandas.DataFrame` is passed as input `X`, the model now automatically handles missing values and categorical features.
17+
- **Missing Value Imputation**: Columns with missing values (`NaN`) are imputed using the column's median. A new binary feature (e.g., `feature_name_missing`) is created to indicate where imputation occurred. The median calculation correctly handles `sample_weight`.
18+
- **Categorical Feature Encoding**: Columns with `object` or `category` data types are automatically one-hot encoded. The model gracefully handles unseen category levels during prediction by creating columns for all categories seen during training and setting those of them not seen during prediction to zero.
19+
20+
### Changed
21+
- **Enhanced Flexibility in `APLRClassifier`**: The classifier now automatically converts numeric target arrays (e.g., `[0, 1, 0, 1]`) into string representations, simplifying setup for classification tasks.
22+
- **Updated Documentation and Examples**: The API reference and examples have been updated to reflect the new automatic preprocessing capabilities.

aplr/aplr.py

Lines changed: 40 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,9 @@ def _common_X_preprocessing(self, X, is_fitting: bool, X_names=None):
2222
"""Common preprocessing for fit and predict."""
2323
is_dataframe_input = isinstance(X, pd.DataFrame)
2424

25+
if X_names is not None:
26+
X_names = list(X_names)
27+
2528
if not is_dataframe_input:
2629
try:
2730
X_numeric = np.array(X, dtype=np.float64)
@@ -35,11 +38,11 @@ def _common_X_preprocessing(self, X, is_fitting: bool, X_names=None):
3538
X.columns = X_names
3639
else:
3740
X.columns = [f"X{i}" for i in range(X.shape[1])]
38-
elif hasattr(self, "X_names_") and len(self.X_names_) == X.shape[1]:
41+
elif self.X_names_ and len(self.X_names_) == X.shape[1]:
3942
X.columns = self.X_names_
4043
else: # X is already a DataFrame
4144
X = X.copy() # Always copy to avoid modifying original
42-
if not is_fitting and hasattr(self, "X_names_"):
45+
if not is_fitting and self.X_names_:
4346
# Check if input columns for prediction match training columns (before OHE)
4447
if set(X.columns) != set(self.X_names_):
4548
raise ValueError(
@@ -52,11 +55,18 @@ def _common_X_preprocessing(self, X, is_fitting: bool, X_names=None):
5255
self.categorical_features_ = list(
5356
X.select_dtypes(include=["category", "object"]).columns
5457
)
58+
# Ensure it's an empty list if no categorical features, not None
59+
if not self.categorical_features_:
60+
self.categorical_features_ = []
5561

62+
# Apply OHE if categorical_features_ were found during fitting.
5663
if self.categorical_features_:
5764
X = pd.get_dummies(X, columns=self.categorical_features_, dummy_na=False)
5865
if is_fitting:
5966
self.ohe_columns_ = list(X.columns)
67+
# Ensure it's an empty list if no OHE columns, not None
68+
if not self.ohe_columns_:
69+
self.ohe_columns_ = []
6070
else:
6171
missing_cols = set(self.ohe_columns_) - set(X.columns)
6272
for c in missing_cols:
@@ -65,13 +75,17 @@ def _common_X_preprocessing(self, X, is_fitting: bool, X_names=None):
6575

6676
if is_fitting:
6777
self.na_imputed_cols_ = [col for col in X.columns if X[col].isnull().any()]
78+
# Ensure it's an empty list if no NA imputed columns, not None
79+
if not self.na_imputed_cols_:
80+
self.na_imputed_cols_ = []
6881

82+
# Apply NA indicator if na_imputed_cols_ were found during fitting.
6983
if self.na_imputed_cols_:
7084
for col in self.na_imputed_cols_:
7185
X[col + "_missing"] = X[col].isnull().astype(int)
7286

73-
if not is_fitting:
74-
for col in self.median_values_:
87+
if not is_fitting and self.median_values_:
88+
for col in self.median_values_: # Iterate over keys if it's a dict
7589
if col in X.columns:
7690
X[col] = X[col].fillna(self.median_values_[col])
7791

@@ -131,11 +145,30 @@ def _preprocess_X_fit(self, X, X_names, sample_weight):
131145
def _preprocess_X_predict(self, X):
132146
X = self._common_X_preprocessing(X, is_fitting=False)
133147

134-
if hasattr(self, "final_training_columns_"):
148+
# Enforce column order from training if it was set.
149+
if self.final_training_columns_:
135150
X = X[self.final_training_columns_]
136151

137152
return X.values.astype(np.float64)
138153

154+
def __setstate__(self, state):
155+
"""Handles unpickling for backward compatibility."""
156+
self.__dict__.update(state)
157+
158+
# For backward compatibility, initialize new attributes to None if they don't exist,
159+
# indicating the model was trained before these features were introduced.
160+
new_attributes = [
161+
"X_names_",
162+
"categorical_features_",
163+
"ohe_columns_",
164+
"na_imputed_cols_",
165+
"median_values_",
166+
"final_training_columns_",
167+
]
168+
for attr in new_attributes:
169+
if not hasattr(self, attr):
170+
setattr(self, attr, None)
171+
139172

140173
class APLRRegressor(BaseAPLR):
141174
def __init__(
@@ -261,6 +294,7 @@ def __init__(
261294
self.ohe_columns_ = []
262295
self.na_imputed_cols_ = []
263296
self.X_names_ = []
297+
self.final_training_columns_ = []
264298

265299
# Creating aplr_cpp and setting parameters
266300
self.APLRRegressor = aplr_cpp.APLRRegressor()
@@ -702,6 +736,7 @@ def __init__(
702736
self.ohe_columns_ = []
703737
self.na_imputed_cols_ = []
704738
self.X_names_ = []
739+
self.final_training_columns_ = []
705740

706741
# Creating aplr_cpp and setting parameters
707742
self.APLRClassifier = aplr_cpp.APLRClassifier()

cpp/APLRClassifier.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ class APLRClassifier
2121
void invert_second_model_in_two_class_case(APLRRegressor &second_model);
2222
void calculate_validation_metrics();
2323
void calculate_unique_term_affiliations();
24+
void throw_error_if_not_fitted();
2425
void cleanup_after_fit();
2526

2627
public:
@@ -306,8 +307,18 @@ void APLRClassifier::cleanup_after_fit()
306307
response_values.clear();
307308
}
308309

310+
void APLRClassifier::throw_error_if_not_fitted()
311+
{
312+
if (categories.empty())
313+
{
314+
throw std::runtime_error("This APLRClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this estimator.");
315+
}
316+
}
317+
309318
MatrixXd APLRClassifier::predict_class_probabilities(const MatrixXd &X, bool cap_predictions_to_minmax_in_training)
310319
{
320+
throw_error_if_not_fitted();
321+
311322
MatrixXd predictions{MatrixXd::Constant(X.rows(), categories.size(), 0.0)};
312323
for (size_t i = 0; i < categories.size(); ++i)
313324
{
@@ -328,6 +339,7 @@ MatrixXd APLRClassifier::predict_class_probabilities(const MatrixXd &X, bool cap
328339

329340
std::vector<std::string> APLRClassifier::predict(const MatrixXd &X, bool cap_predictions_to_minmax_in_training)
330341
{
342+
throw_error_if_not_fitted();
331343
std::vector<std::string> predictions(X.rows());
332344
MatrixXd predicted_class_probabilities{predict_class_probabilities(X, cap_predictions_to_minmax_in_training)};
333345
for (size_t row = 0; row < predicted_class_probabilities.rows(); ++row)
@@ -342,6 +354,7 @@ std::vector<std::string> APLRClassifier::predict(const MatrixXd &X, bool cap_pre
342354

343355
MatrixXd APLRClassifier::calculate_local_feature_contribution(const MatrixXd &X)
344356
{
357+
throw_error_if_not_fitted();
345358
MatrixXd output{MatrixXd::Constant(X.rows(), unique_term_affiliations.size(), 0)};
346359
std::vector<std::string> predictions{predict(X, false)};
347360
for (size_t row = 0; row < predictions.size(); ++row)
Binary file not shown.

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@
2525

2626
setuptools.setup(
2727
name="aplr",
28-
version="10.18.0",
28+
version="10.18.1",
2929
description="Automatic Piecewise Linear Regression",
3030
ext_modules=[sfc_module],
3131
author="Mathias von Ottenbreit",

0 commit comments

Comments
 (0)