|
| 1 | +# ruff: noqa |
| 2 | +""" |
| 3 | +======================================= |
| 4 | +Release Highlights for scikit-learn 1.6 |
| 5 | +======================================= |
| 6 | +
|
| 7 | +.. currentmodule:: sklearn |
| 8 | +
|
| 9 | +We are pleased to announce the release of scikit-learn 1.6! Many bug fixes |
| 10 | +and improvements were added, as well as some key new features. Below we |
| 11 | +detail the highlights of this release. **For an exhaustive list of |
| 12 | +all the changes**, please refer to the :ref:`release notes <release_notes_1_6>`. |
| 13 | +
|
| 14 | +To install the latest version (with pip):: |
| 15 | +
|
| 16 | + pip install --upgrade scikit-learn |
| 17 | +
|
| 18 | +or with conda:: |
| 19 | +
|
| 20 | + conda install -c conda-forge scikit-learn |
| 21 | +
|
| 22 | +""" |
| 23 | + |
| 24 | +# %% |
| 25 | +# FrozenEstimator: Freezing an estimator |
| 26 | +# -------------------------------------- |
| 27 | +# |
| 28 | +# This meta-estimator allows you to take an estimator and freeze its fit method, meaning |
| 29 | +# that calling `fit` does not perform any operations; also, `fit_predict` and |
| 30 | +# `fit_transform` call `predict` and `transform` respectively without calling `fit`. The |
| 31 | +# original estimator's other methods and properties are left unchanged. An interesting |
| 32 | +# use case for this is to use a pre-fitted model as a transformer step in a pipeline |
| 33 | +# or to pass a pre-fitted model to some of the meta-estimators. Here's a short example: |
| 34 | + |
| 35 | +import time |
| 36 | +from sklearn.datasets import make_classification |
| 37 | +from sklearn.frozen import FrozenEstimator |
| 38 | +from sklearn.linear_model import SGDClassifier |
| 39 | +from sklearn.model_selection import FixedThresholdClassifier |
| 40 | + |
| 41 | +X, y = make_classification(n_samples=1000, random_state=0) |
| 42 | + |
| 43 | +start = time.time() |
| 44 | +classifier = SGDClassifier().fit(X, y) |
| 45 | +print(f"Fitting the classifier took {(time.time() - start) * 1_000:.2f} milliseconds") |
| 46 | + |
| 47 | +start = time.time() |
| 48 | +threshold_classifier = FixedThresholdClassifier( |
| 49 | + estimator=FrozenEstimator(classifier), threshold=0.9 |
| 50 | +).fit(X, y) |
| 51 | +print( |
| 52 | + f"Fitting the threshold classifier took {(time.time() - start) * 1_000:.2f} " |
| 53 | + "milliseconds" |
| 54 | +) |
| 55 | + |
| 56 | +# %% |
| 57 | +# Fitting the threshold classifier skipped fitting the inner `SGDClassifier`. For more |
| 58 | +# details refer to the example :ref:`sphx_glr_auto_examples_frozen_plot_frozen_examples.py`. |
| 59 | + |
| 60 | +# %% |
| 61 | +# Transforming data other than X in a Pipeline |
| 62 | +# -------------------------------------------- |
| 63 | +# |
| 64 | +# The :class:`~pipeline.Pipeline` now supports transforming passed data other than `X` |
| 65 | +# if necessary. This can be done by setting the new `transform_input` parameter. This |
| 66 | +# is particularly useful when passing a validation set through the pipeline. |
| 67 | +# |
| 68 | +# As an example, imagine `EstimatorWithValidationSet` is an estimator which accepts |
| 69 | +# a validation set. We can now have a pipeline which will transform the validation set |
| 70 | +# and pass it to the estimator:: |
| 71 | +# |
| 72 | +# sklearn.set_config(enable_metadata_routing=True) |
| 73 | +# est_gs = GridSearchCV( |
| 74 | +# Pipeline( |
| 75 | +# ( |
| 76 | +# StandardScaler(), |
| 77 | +# EstimatorWithValidationSet(...).set_fit_request(X_val=True, y_val=True), |
| 78 | +# ), |
| 79 | +# # telling pipeline to transform these inputs up to the step which is |
| 80 | +# # requesting them. |
| 81 | +# transform_input=["X_val"], |
| 82 | +# ), |
| 83 | +# param_grid={"estimatorwithvalidationset__param_to_optimize": list(range(5))}, |
| 84 | +# cv=5, |
| 85 | +# ).fit(X, y, X_val, y_val) |
| 86 | +# |
| 87 | +# In the above code, the key parts are the call to `set_fit_request` to specify that |
| 88 | +# `X_val` and `y_val` are required by the `EstimatorWithValidationSet.fit` method, and |
| 89 | +# the `transform_input` parameter to tell the pipeline to transform `X_val` before |
| 90 | +# passing it to `EstimatorWithValidationSet.fit`. |
| 91 | +# |
| 92 | +# Note that at this time scikit-learn estimators have not yet been extended to accept |
| 93 | +# user specified validation sets. This feature is released early to collect feedback |
| 94 | +# from third-party libraries who might benefit from it. |
| 95 | + |
| 96 | +# %% |
| 97 | +# Multiclass support for `LogisticRegression(solver="newton-cholesky")` |
| 98 | +# --------------------------------------------------------------------- |
| 99 | +# |
| 100 | +# The `"newton-cholesky"` solver (originally introduced in scikit-learn version |
| 101 | +# 1.2) was previously limited to binary |
| 102 | +# :class:`~linear_model.LogisticRegression` and some other generalized linear |
| 103 | +# regression estimators (namely :class:`~linear_model.PoissonRegressor`, |
| 104 | +# :class:`~linear_model.GammaRegressor` and |
| 105 | +# :class:`~linear_model.TweedieRegressor`). |
| 106 | +# |
| 107 | +# This new release includes support for multiclass (multinomial) |
| 108 | +# :class:`~linear_model.LogisticRegression`. |
| 109 | +# |
| 110 | +# This solver is particularly useful when the number of features is small to |
| 111 | +# medium. It has been empirically shown to converge more reliably and faster |
| 112 | +# than other solvers on some medium sized datasets with one-hot encoded |
| 113 | +# categorical features as can be seen in the `benchmark results of the |
| 114 | +# pull-request |
| 115 | +# <https://github.com/scikit-learn/scikit-learn/pull/28840#issuecomment-2065368727>`_. |
| 116 | + |
| 117 | +# %% |
| 118 | +# Missing value support for Extra Trees |
| 119 | +# ------------------------------------- |
| 120 | +# |
| 121 | +# The classes :class:`ensemble.ExtraTreesClassifier` and |
| 122 | +# :class:`ensemble.ExtraTreesRegressor` now support missing values. More details in the |
| 123 | +# :ref:`User Guide <tree_missing_value_support>`. |
| 124 | +import numpy as np |
| 125 | +from sklearn.ensemble import ExtraTreesClassifier |
| 126 | + |
| 127 | +X = np.array([0, 1, 6, np.nan]).reshape(-1, 1) |
| 128 | +y = [0, 0, 1, 1] |
| 129 | + |
| 130 | +forest = ExtraTreesClassifier(random_state=0).fit(X, y) |
| 131 | +forest.predict(X) |
| 132 | + |
| 133 | +# %% |
| 134 | +# Download any dataset from the web |
| 135 | +# --------------------------------- |
| 136 | +# |
| 137 | +# The function :func:`datasets.fetch_file` allows downloading a file from any given URL. |
| 138 | +# This convenience function provides built-in local disk caching, sha256 digest |
| 139 | +# integrity check and an automated retry mechanism on network error. |
| 140 | +# |
| 141 | +# The goal is to provide the same convenience and reliability as dataset fetchers while |
| 142 | +# giving the flexibility to work with data from arbitrary online sources and file |
| 143 | +# formats. |
| 144 | +# |
| 145 | +# The dowloaded file can then be loaded with generic or domain specific functions such |
| 146 | +# as `pandas.read_csv`, `pandas.read_parquet`, etc. |
| 147 | + |
| 148 | +# %% |
| 149 | +# Array API support |
| 150 | +# ----------------- |
| 151 | +# |
| 152 | +# Many more estimators and functions have been updated to support array API compatible |
| 153 | +# inputs since version 1.5, in particular the meta-estimators for hyperparameter tuning |
| 154 | +# from the :mod:`sklearn.model_selection` module and the metrics from the |
| 155 | +# :mod:`sklearn.metrics` module. |
| 156 | +# |
| 157 | +# Please refer to the :ref:`array API support<array_api>` page for instructions to use |
| 158 | +# scikit-learn with array API compatible libraries such as PyTorch or CuPy. |
| 159 | + |
| 160 | +# %% |
| 161 | +# Almost complete Metadata Routing support |
| 162 | +# ---------------------------------------- |
| 163 | +# |
| 164 | +# Support for routing metadata has been added to all remaining estimators and |
| 165 | +# functions except AdaBoost. See :ref:`Metadata Routing User Guide <metadata_routing>` |
| 166 | +# for more details. |
| 167 | + |
| 168 | +# %% |
| 169 | +# Free-threaded CPython 3.13 support |
| 170 | +# ---------------------------------- |
| 171 | +# |
| 172 | +# scikit-learn has preliminary support for free-threaded CPython, in particular |
| 173 | +# free-threaded wheels are available for all of our supported platforms. |
| 174 | +# |
| 175 | +# Free-threaded (also known as nogil) CPython 3.13 is an experimental version of |
| 176 | +# CPython 3.13 which aims at enabling efficient multi-threaded use cases by |
| 177 | +# removing the Global Interpreter Lock (GIL). |
| 178 | +# |
| 179 | +# For more details about free-threaded CPython see `py-free-threading doc <https://py-free-threading.github.io>`_, |
| 180 | +# in particular `how to install a free-threaded CPython <https://py-free-threading.github.io/installing_cpython/>`_ |
| 181 | +# and `Ecosystem compatibility tracking <https://py-free-threading.github.io/tracking/>`_. |
| 182 | +# |
| 183 | +# Feel free to try free-threaded CPython on your use case and report any issues! |
| 184 | + |
| 185 | +# %% |
| 186 | +# Improvements to the developer API for third party libraries |
| 187 | +# ----------------------------------------------------------- |
| 188 | +# |
| 189 | +# We have been working on improving the developer API for third party libraries. |
| 190 | +# This is still a work in progress, but a fair amount of work has been done in this |
| 191 | +# release. This release includes: |
| 192 | +# |
| 193 | +# - :func:`sklearn.utils.validation.validate_data` is introduced and replaces the |
| 194 | +# previously private `BaseEstimator._validate_data` method. This function extends |
| 195 | +# :func:`~sklearn.utils.validation.check_array` and adds support for remembering |
| 196 | +# input feature counts and names. |
| 197 | +# - Estimator tags are now revamped and a part of the public API via |
| 198 | +# :class:`sklearn.utils.Tags`. Estimators should now override the |
| 199 | +# :meth:`BaseEstimator.__sklearn_tags__` method instead of implementing a `_more_tags` |
| 200 | +# method. If you'd like to support multiple scikit-learn versions, you can implement |
| 201 | +# both methods in your class. |
| 202 | +# - As a consequence of developing a public tag API, we've removed the `_xfail_checks` |
| 203 | +# tag and tests which are expected to fail are directly passed to |
| 204 | +# :func:`~sklearn.utils.estimator_checks.check_estimator` and |
| 205 | +# :func:`~sklearn.utils.estimator_checks.parametrize_with_checks`. See their |
| 206 | +# corresponding API docs for more details. |
| 207 | +# - Many tests in the common test suite are updated and raise more helpful error |
| 208 | +# messages. We've also added some new tests, which should help you more easily fix |
| 209 | +# potential issues with your estimators. |
| 210 | +# |
| 211 | +# An updated version of our :ref:`develop` is also available, which we recommend you |
| 212 | +# check out. |
0 commit comments