Releases · ottenbreit-data-science/aplr

30 Oct 20:00

mathias-von-ottenbreit

10.18.1

afc0518

Bugfixes Latest

Latest

Fixed

Improved Backward Compatibility for Saved Models: Resolved an issue where loading models trained with older versions of aplr would fail due to missing attributes. The __setstate__ method now initializes new preprocessing-related attributes to None for older models, ensuring they can be loaded and used without AttributeError exceptions.
Stability for Unfitted Models: Fixed a crash that occurred when calling predict on an unfitted APLRClassifier. The model now correctly raises a RuntimeError with an informative message in this scenario, improving stability and user feedback.
Restored Flexibility for X_names Parameter: Fixed a regression from v10.18.0 where the X_names parameter no longer accepted numpy.ndarray or other list-like inputs. The parameter now correctly handles these types again, restoring flexibility for non-DataFrame inputs.

Assets 2

29 Oct 20:13

mathias-von-ottenbreit

10.18.0

420e2ab

APLR version 10.18.0 - automatic handling of missing values and categorical features

This release introduces significant enhancements to streamline the data preprocessing workflow and improve overall usability. The model now intelligently handles pandas.DataFrame inputs, automating common data preparation steps and making it easier to go from raw data to a trained model.

Key Features and Enhancements

1. Automatic Data Preprocessing with `pandas.DataFrame`

When a pandas.DataFrame is passed as input X to APLRRegressor or APLRClassifier, the model now automatically performs the following preprocessing steps, reducing the need for manual data preparation:

Missing Value Imputation:
- For columns containing missing values (NaN), the model automatically imputes them using the column's median.
- To preserve information about the original missingness, a new binary feature (e.g., feature_name_missing) is created for each column that had values imputed.
- The median calculation correctly handles sample_weight if provided during fitting, ensuring a weighted median is used for imputation.
Categorical Feature Encoding:
- Columns with object or category data types are automatically identified and one-hot encoded.
- The model gracefully handles unseen category levels during prediction by creating columns for all categories seen during training and setting them to zero for new data.

2. Enhanced Flexibility in `APLRClassifier`

The APLRClassifier is now more versatile with respect to the target variable y. It automatically converts numeric target arrays (e.g., [0, 1, 0, 1]) into string representations. This simplifies the setup for classification tasks, as you no longer need to manually pre-convert your target variable.

3. Updated Documentation and Examples

The API reference and examples have been updated to reflect these new automatic preprocessing capabilities, providing clearer guidance on leveraging these new, user-friendly features.

Assets 2

25 Oct 11:36

mathias-von-ottenbreit

10.17.1

7e03e55

APLR version 10.17.1

This release focuses on improving robustness and documentation clarity.

Improvements

Input Validation for validation_tuning_metric: Added validation to check if the provided validation_tuning_metric is a valid option before model training begins. This prevents runtime errors from invalid metric names and provides clearer error messages to the user.

Documentation

Clarified faster_convergence Usage: The documentation for the faster_convergence parameter in APLRRegressor has been updated. It now states that this option can be useful not only when the algorithm converges too slowly, but also when it converges prematurely.

Assets 2

23 Oct 20:47

mathias-von-ottenbreit

10.17.0

4de2d31

APLR Version 10.17.0 Release Notes

This release introduces a new flexible loss function, a parameter to accelerate model training, and a significant improvement to the validation and tuning process for more stable and robust modeling.

✨ New Features

1. New `exponential_power` Loss Function

A new loss_function="exponential_power" has been added. This is a versatile loss function where the shape is controlled by the dispersion_parameter (p), making it a generalization of other common loss functions:

When p=2, it is equivalent to the mse loss function.

This allows for greater flexibility in modeling distributions with different tail behaviors. For best results with this new loss function, it is recommended to also set faster_convergence=True.

2. New `faster_convergence` Parameter

A new boolean constructor parameter, faster_convergence (default False), has been introduced to speed up model training, especially for loss functions that might otherwise converge slowly.

When set to True, it applies a scaling to the negative gradient during boosting.
This is currently only applied for the identity and log link functions.
Note: This option is not effective for all loss functions (e.g., mae, quantile) and is not needed for mse with an identity link, as this combination is already optimized for speed.

🚀 Improvements and Behavioral Changes

`validation_tuning_metric` Logic Update

The logic for validation_tuning_metric has been updated to improve both training stability and evaluation flexibility.

For Training and Convergence: During the boosting process and for selecting the optimal number of steps (m), the model now always uses the "default" metric (which corresponds to the loss_function). This ensures the most stable convergence with respect to the loss being minimized.
For Final Evaluation and Tuning: After each cross-validation fold is fitted, the validation_tuning_metric provided by the user is calculated. This final metric is what get_cv_error() returns and what APLRTuner uses to select the best set of hyperparameters.

This change ensures stable model training while still allowing you to evaluate and tune your model based on a business-relevant metric that may differ from the training loss function.

Important Note: When tuning hyperparameters like dispersion_parameter or loss_function, you should not use validation_tuning_metric="default", as the resulting CV errors will not be comparable across different parameter values.

Assets 2

13 Oct 22:00

mathias-von-ottenbreit

10.16.0

d1ba994

Improved performance when multi-threading

This release introduces a performance enhancement by integrating a persistent thread pool to manage multi-threading. Previously, the algorithm created and destroyed threads in each boosting step, which incurred overhead. The performance gain is largest on smaller datasets.

Assets 2

04 Oct 13:21

mathias-von-ottenbreit

10.15.0

cc6af13

New loss function and an optional mean bias correction

✨ New Features

Huber Loss Function: A new huber loss function has been added to APLRRegressor. This provides a robust alternative to Mean Squared Error (MSE) that is less sensitive to outliers while remaining differentiable everywhere, improving model stability on noisy datasets. The delta parameter for this loss function is controlled via dispersion_parameter.
Explicit Mean Bias Correction:
- A new mean_bias_correction constructor parameter (default False) has been introduced to apply an explicit post-processing step to the model's intercept.
- When enabled, this feature adjusts the intercept to make the model's predictions on the training data have the same (weighted) mean as the response variable. This can be particularly useful for loss functions like huber, which can otherwise produce biased predictions.
- The correction is currently implemented for models using the identity and log link functions.

📚 Documentation

The API reference for APLRRegressor has been updated to describe the above.

Assets 2

22 Sep 15:55

mathias-von-ottenbreit

10.14.0

9cc4963

Release Notes - APLR Version 10.14.0

This release focuses on streamlining dependencies and improving the plotting functionality within the library.

🚀 Enhancements

Removed pandas Dependency for Plotting: The plot_affiliation_shape method in APLRRegressor has been refactored to use numpy exclusively for data manipulation. This change removes the pandas dependency for users who want to use the optional plotting features, leading to a lighter and more focused installation.

🔧 Dependency Updates

The optional dependency for plotting is now solely matplotlib>=3.0. pandas is no longer required or installed with the [plots] extra.
The plotting functionality has been verified to be fully compatible with matplotlib versions 3.0 and newer.

✅ Compatibility

The refactored plotting code maintains compatibility with numpy>=1.11.

Assets 2

21 Sep 19:24

mathias-von-ottenbreit

10.13.0

0284683

Release Notes: APLR Version 10.13.0

This release introduces a major new feature for model interpretation through visualization, along with several documentation improvements and a key enhancement for the classification module.

New Features

Visualizing Model Components: A new method, plot_affiliation_shape, has been added to the APLRRegressor class. This allows for easy one-line plotting of main effects (as line plots) and two-way interactions (as heatmaps) directly from a fitted model object. This greatly simplifies model interpretation and debugging.
Optional Plotting Dependencies: To support the new plotting feature, pandas and matplotlib can now be installed as optional dependencies using pip install aplr[plots]. The core library remains lightweight for users who do not require plotting.

Improvements

Classification Model Interpretation: The get_logit_model method has been improved to return a full-featured Python APLRRegressor object. This enables the use of Python-specific methods, making it straightforward to visualize model components for each category by calling the new plot_affiliation_shape method on the returned logit model.
Example Scripts: All example scripts have been updated to use the new plot_affiliation_shape method, making them cleaner and demonstrating the current best practices for model interpretation.

Documentation

The API Reference for APLRRegressor has been updated with detailed documentation for the new plot_affiliation_shape method.
The main README.md now includes instructions for installing the optional plotting dependencies.
The Model Interpretation guides for both regression and classification have been significantly improved to reflect the new, simplified plotting workflow and to provide clearer instructions.

Assets 2

21 Sep 07:14

mathias-von-ottenbreit

10.12.1

8c75abf

Minor bugfix

Fixed an issue in the computation of validation error when using the log link function. Previously, the validation error was incorrectly scaled; this is now corrected. As a result, for the same non-default validation_tuning_metric, validation errors are comparable across link functions.
Minor documentation updates related to validation_tuning_metric.

Assets 2

20 Sep 20:55

mathias-von-ottenbreit

10.12.0

d3fe6c4

Added new validation tuning metrics

Added new validation tuning metrics:

neg_top_quantile_mean_response – Computes the negative of the sample-weighted mean response for observations with predictions in the top quantile (as defined by the quantile parameter).
bottom_quantile_mean_response – Computes the sample-weighted mean response for observations with predictions in the bottom quantile (as defined by the quantile parameter).

These metrics may be useful for example when the objective is to increase lift. Please note that when using these new metrics, a significantly higher early_stopping_rounds than the default of 200 might be needed.

Assets 2

Uh oh!

Releases: ottenbreit-data-science/aplr

Bugfixes

Fixed

Uh oh!

APLR version 10.18.0 - automatic handling of missing values and categorical features

Key Features and Enhancements

1. Automatic Data Preprocessing with pandas.DataFrame

2. Enhanced Flexibility in APLRClassifier

3. Updated Documentation and Examples

Uh oh!

APLR version 10.17.1

Improvements

Documentation

Uh oh!

APLR Version 10.17.0 Release Notes

✨ New Features

1. New exponential_power Loss Function

2. New faster_convergence Parameter

🚀 Improvements and Behavioral Changes

validation_tuning_metric Logic Update

Uh oh!

Improved performance when multi-threading

Uh oh!

New loss function and an optional mean bias correction

✨ New Features

📚 Documentation

Uh oh!

Release Notes - APLR Version 10.14.0

🚀 Enhancements

🔧 Dependency Updates

✅ Compatibility

Uh oh!

Release Notes: APLR Version 10.13.0

New Features

Improvements

Documentation

Uh oh!

Minor bugfix

Uh oh!

Added new validation tuning metrics

Uh oh!

1. Automatic Data Preprocessing with `pandas.DataFrame`

2. Enhanced Flexibility in `APLRClassifier`

1. New `exponential_power` Loss Function

2. New `faster_convergence` Parameter

`validation_tuning_metric` Logic Update