Skip to content

Releases: ottenbreit-data-science/aplr

Bugfixes

30 Oct 20:00

Choose a tag to compare

Fixed

  • Improved Backward Compatibility for Saved Models: Resolved an issue where loading models trained with older versions of aplr would fail due to missing attributes. The __setstate__ method now initializes new preprocessing-related attributes to None for older models, ensuring they can be loaded and used without AttributeError exceptions.
  • Stability for Unfitted Models: Fixed a crash that occurred when calling predict on an unfitted APLRClassifier. The model now correctly raises a RuntimeError with an informative message in this scenario, improving stability and user feedback.
  • Restored Flexibility for X_names Parameter: Fixed a regression from v10.18.0 where the X_names parameter no longer accepted numpy.ndarray or other list-like inputs. The parameter now correctly handles these types again, restoring flexibility for non-DataFrame inputs.

APLR version 10.18.0 - automatic handling of missing values and categorical features

29 Oct 20:13

Choose a tag to compare

This release introduces significant enhancements to streamline the data preprocessing workflow and improve overall usability. The model now intelligently handles pandas.DataFrame inputs, automating common data preparation steps and making it easier to go from raw data to a trained model.

Key Features and Enhancements

1. Automatic Data Preprocessing with pandas.DataFrame

When a pandas.DataFrame is passed as input X to APLRRegressor or APLRClassifier, the model now automatically performs the following preprocessing steps, reducing the need for manual data preparation:

  • Missing Value Imputation:

    • For columns containing missing values (NaN), the model automatically imputes them using the column's median.
    • To preserve information about the original missingness, a new binary feature (e.g., feature_name_missing) is created for each column that had values imputed.
    • The median calculation correctly handles sample_weight if provided during fitting, ensuring a weighted median is used for imputation.
  • Categorical Feature Encoding:

    • Columns with object or category data types are automatically identified and one-hot encoded.
    • The model gracefully handles unseen category levels during prediction by creating columns for all categories seen during training and setting them to zero for new data.

2. Enhanced Flexibility in APLRClassifier

The APLRClassifier is now more versatile with respect to the target variable y. It automatically converts numeric target arrays (e.g., [0, 1, 0, 1]) into string representations. This simplifies the setup for classification tasks, as you no longer need to manually pre-convert your target variable.

3. Updated Documentation and Examples

The API reference and examples have been updated to reflect these new automatic preprocessing capabilities, providing clearer guidance on leveraging these new, user-friendly features.

APLR version 10.17.1

25 Oct 11:36

Choose a tag to compare

This release focuses on improving robustness and documentation clarity.

Improvements

  • Input Validation for validation_tuning_metric: Added validation to check if the provided validation_tuning_metric is a valid option before model training begins. This prevents runtime errors from invalid metric names and provides clearer error messages to the user.

Documentation

  • Clarified faster_convergence Usage: The documentation for the faster_convergence parameter in APLRRegressor has been updated. It now states that this option can be useful not only when the algorithm converges too slowly, but also when it converges prematurely.

APLR Version 10.17.0 Release Notes

23 Oct 20:47

Choose a tag to compare

This release introduces a new flexible loss function, a parameter to accelerate model training, and a significant improvement to the validation and tuning process for more stable and robust modeling.

✨ New Features

1. New exponential_power Loss Function

A new loss_function="exponential_power" has been added. This is a versatile loss function where the shape is controlled by the dispersion_parameter (p), making it a generalization of other common loss functions:

  • When p=2, it is equivalent to the mse loss function.

This allows for greater flexibility in modeling distributions with different tail behaviors. For best results with this new loss function, it is recommended to also set faster_convergence=True.

2. New faster_convergence Parameter

A new boolean constructor parameter, faster_convergence (default False), has been introduced to speed up model training, especially for loss functions that might otherwise converge slowly.

  • When set to True, it applies a scaling to the negative gradient during boosting.
  • This is currently only applied for the identity and log link functions.
  • Note: This option is not effective for all loss functions (e.g., mae, quantile) and is not needed for mse with an identity link, as this combination is already optimized for speed.

🚀 Improvements and Behavioral Changes

validation_tuning_metric Logic Update

The logic for validation_tuning_metric has been updated to improve both training stability and evaluation flexibility.

  • For Training and Convergence: During the boosting process and for selecting the optimal number of steps (m), the model now always uses the "default" metric (which corresponds to the loss_function). This ensures the most stable convergence with respect to the loss being minimized.
  • For Final Evaluation and Tuning: After each cross-validation fold is fitted, the validation_tuning_metric provided by the user is calculated. This final metric is what get_cv_error() returns and what APLRTuner uses to select the best set of hyperparameters.

This change ensures stable model training while still allowing you to evaluate and tune your model based on a business-relevant metric that may differ from the training loss function.

Important Note: When tuning hyperparameters like dispersion_parameter or loss_function, you should not use validation_tuning_metric="default", as the resulting CV errors will not be comparable across different parameter values.

Improved performance when multi-threading

13 Oct 22:00

Choose a tag to compare

This release introduces a performance enhancement by integrating a persistent thread pool to manage multi-threading. Previously, the algorithm created and destroyed threads in each boosting step, which incurred overhead. The performance gain is largest on smaller datasets.

New loss function and an optional mean bias correction

04 Oct 13:21

Choose a tag to compare

✨ New Features

  • Huber Loss Function: A new huber loss function has been added to APLRRegressor. This provides a robust alternative to Mean Squared Error (MSE) that is less sensitive to outliers while remaining differentiable everywhere, improving model stability on noisy datasets. The delta parameter for this loss function is controlled via dispersion_parameter.

  • Explicit Mean Bias Correction:

    • A new mean_bias_correction constructor parameter (default False) has been introduced to apply an explicit post-processing step to the model's intercept.
    • When enabled, this feature adjusts the intercept to make the model's predictions on the training data have the same (weighted) mean as the response variable. This can be particularly useful for loss functions like huber, which can otherwise produce biased predictions.
    • The correction is currently implemented for models using the identity and log link functions.

📚 Documentation

  • The API reference for APLRRegressor has been updated to describe the above.

Release Notes - APLR Version 10.14.0

22 Sep 15:55

Choose a tag to compare

This release focuses on streamlining dependencies and improving the plotting functionality within the library.

🚀 Enhancements

  • Removed pandas Dependency for Plotting: The plot_affiliation_shape method in APLRRegressor has been refactored to use numpy exclusively for data manipulation. This change removes the pandas dependency for users who want to use the optional plotting features, leading to a lighter and more focused installation.

🔧 Dependency Updates

  • The optional dependency for plotting is now solely matplotlib>=3.0. pandas is no longer required or installed with the [plots] extra.
  • The plotting functionality has been verified to be fully compatible with matplotlib versions 3.0 and newer.

✅ Compatibility

  • The refactored plotting code maintains compatibility with numpy>=1.11.

Release Notes: APLR Version 10.13.0

21 Sep 19:24

Choose a tag to compare

This release introduces a major new feature for model interpretation through visualization, along with several documentation improvements and a key enhancement for the classification module.

New Features

  • Visualizing Model Components: A new method, plot_affiliation_shape, has been added to the APLRRegressor class. This allows for easy one-line plotting of main effects (as line plots) and two-way interactions (as heatmaps) directly from a fitted model object. This greatly simplifies model interpretation and debugging.

  • Optional Plotting Dependencies: To support the new plotting feature, pandas and matplotlib can now be installed as optional dependencies using pip install aplr[plots]. The core library remains lightweight for users who do not require plotting.

Improvements

  • Classification Model Interpretation: The get_logit_model method has been improved to return a full-featured Python APLRRegressor object. This enables the use of Python-specific methods, making it straightforward to visualize model components for each category by calling the new plot_affiliation_shape method on the returned logit model.

  • Example Scripts: All example scripts have been updated to use the new plot_affiliation_shape method, making them cleaner and demonstrating the current best practices for model interpretation.

Documentation

  • The API Reference for APLRRegressor has been updated with detailed documentation for the new plot_affiliation_shape method.
  • The main README.md now includes instructions for installing the optional plotting dependencies.
  • The Model Interpretation guides for both regression and classification have been significantly improved to reflect the new, simplified plotting workflow and to provide clearer instructions.

Minor bugfix

21 Sep 07:14

Choose a tag to compare

  • Fixed an issue in the computation of validation error when using the log link function. Previously, the validation error was incorrectly scaled; this is now corrected. As a result, for the same non-default validation_tuning_metric, validation errors are comparable across link functions.
  • Minor documentation updates related to validation_tuning_metric.

Added new validation tuning metrics

20 Sep 20:55

Choose a tag to compare

Added new validation tuning metrics:

  • neg_top_quantile_mean_response – Computes the negative of the sample-weighted mean response for observations with predictions in the top quantile (as defined by the quantile parameter).
  • bottom_quantile_mean_response – Computes the sample-weighted mean response for observations with predictions in the bottom quantile (as defined by the quantile parameter).

These metrics may be useful for example when the objective is to increase lift. Please note that when using these new metrics, a significantly higher early_stopping_rounds than the default of 200 might be needed.