Skip to content

Commit a23aef1

Browse files
jeremiedbbadrinjalalilesteveogrisel
authored
DOC Release Highlights for version 1.6 (scikit-learn#30392)
Co-authored-by: adrinjalali <[email protected]> Co-authored-by: Loïc Estève <[email protected]> Co-authored-by: Olivier Grisel <[email protected]>
1 parent 507c432 commit a23aef1

File tree

2 files changed

+219
-0
lines changed

2 files changed

+219
-0
lines changed

examples/frozen/README.txt

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
.. _frozen_examples:
2+
3+
Frozen Estimators
4+
-----------------
5+
6+
Examples concerning the :mod:`sklearn.frozen` module.
7+
Lines changed: 212 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,212 @@
1+
# ruff: noqa
2+
"""
3+
=======================================
4+
Release Highlights for scikit-learn 1.6
5+
=======================================
6+
7+
.. currentmodule:: sklearn
8+
9+
We are pleased to announce the release of scikit-learn 1.6! Many bug fixes
10+
and improvements were added, as well as some key new features. Below we
11+
detail the highlights of this release. **For an exhaustive list of
12+
all the changes**, please refer to the :ref:`release notes <release_notes_1_6>`.
13+
14+
To install the latest version (with pip)::
15+
16+
pip install --upgrade scikit-learn
17+
18+
or with conda::
19+
20+
conda install -c conda-forge scikit-learn
21+
22+
"""
23+
24+
# %%
25+
# FrozenEstimator: Freezing an estimator
26+
# --------------------------------------
27+
#
28+
# This meta-estimator allows you to take an estimator and freeze its fit method, meaning
29+
# that calling `fit` does not perform any operations; also, `fit_predict` and
30+
# `fit_transform` call `predict` and `transform` respectively without calling `fit`. The
31+
# original estimator's other methods and properties are left unchanged. An interesting
32+
# use case for this is to use a pre-fitted model as a transformer step in a pipeline
33+
# or to pass a pre-fitted model to some of the meta-estimators. Here's a short example:
34+
35+
import time
36+
from sklearn.datasets import make_classification
37+
from sklearn.frozen import FrozenEstimator
38+
from sklearn.linear_model import SGDClassifier
39+
from sklearn.model_selection import FixedThresholdClassifier
40+
41+
X, y = make_classification(n_samples=1000, random_state=0)
42+
43+
start = time.time()
44+
classifier = SGDClassifier().fit(X, y)
45+
print(f"Fitting the classifier took {(time.time() - start) * 1_000:.2f} milliseconds")
46+
47+
start = time.time()
48+
threshold_classifier = FixedThresholdClassifier(
49+
estimator=FrozenEstimator(classifier), threshold=0.9
50+
).fit(X, y)
51+
print(
52+
f"Fitting the threshold classifier took {(time.time() - start) * 1_000:.2f} "
53+
"milliseconds"
54+
)
55+
56+
# %%
57+
# Fitting the threshold classifier skipped fitting the inner `SGDClassifier`. For more
58+
# details refer to the example :ref:`sphx_glr_auto_examples_frozen_plot_frozen_examples.py`.
59+
60+
# %%
61+
# Transforming data other than X in a Pipeline
62+
# --------------------------------------------
63+
#
64+
# The :class:`~pipeline.Pipeline` now supports transforming passed data other than `X`
65+
# if necessary. This can be done by setting the new `transform_input` parameter. This
66+
# is particularly useful when passing a validation set through the pipeline.
67+
#
68+
# As an example, imagine `EstimatorWithValidationSet` is an estimator which accepts
69+
# a validation set. We can now have a pipeline which will transform the validation set
70+
# and pass it to the estimator::
71+
#
72+
# sklearn.set_config(enable_metadata_routing=True)
73+
# est_gs = GridSearchCV(
74+
# Pipeline(
75+
# (
76+
# StandardScaler(),
77+
# EstimatorWithValidationSet(...).set_fit_request(X_val=True, y_val=True),
78+
# ),
79+
# # telling pipeline to transform these inputs up to the step which is
80+
# # requesting them.
81+
# transform_input=["X_val"],
82+
# ),
83+
# param_grid={"estimatorwithvalidationset__param_to_optimize": list(range(5))},
84+
# cv=5,
85+
# ).fit(X, y, X_val, y_val)
86+
#
87+
# In the above code, the key parts are the call to `set_fit_request` to specify that
88+
# `X_val` and `y_val` are required by the `EstimatorWithValidationSet.fit` method, and
89+
# the `transform_input` parameter to tell the pipeline to transform `X_val` before
90+
# passing it to `EstimatorWithValidationSet.fit`.
91+
#
92+
# Note that at this time scikit-learn estimators have not yet been extended to accept
93+
# user specified validation sets. This feature is released early to collect feedback
94+
# from third-party libraries who might benefit from it.
95+
96+
# %%
97+
# Multiclass support for `LogisticRegression(solver="newton-cholesky")`
98+
# ---------------------------------------------------------------------
99+
#
100+
# The `"newton-cholesky"` solver (originally introduced in scikit-learn version
101+
# 1.2) was previously limited to binary
102+
# :class:`~linear_model.LogisticRegression` and some other generalized linear
103+
# regression estimators (namely :class:`~linear_model.PoissonRegressor`,
104+
# :class:`~linear_model.GammaRegressor` and
105+
# :class:`~linear_model.TweedieRegressor`).
106+
#
107+
# This new release includes support for multiclass (multinomial)
108+
# :class:`~linear_model.LogisticRegression`.
109+
#
110+
# This solver is particularly useful when the number of features is small to
111+
# medium. It has been empirically shown to converge more reliably and faster
112+
# than other solvers on some medium sized datasets with one-hot encoded
113+
# categorical features as can be seen in the `benchmark results of the
114+
# pull-request
115+
# <https://github.com/scikit-learn/scikit-learn/pull/28840#issuecomment-2065368727>`_.
116+
117+
# %%
118+
# Missing value support for Extra Trees
119+
# -------------------------------------
120+
#
121+
# The classes :class:`ensemble.ExtraTreesClassifier` and
122+
# :class:`ensemble.ExtraTreesRegressor` now support missing values. More details in the
123+
# :ref:`User Guide <tree_missing_value_support>`.
124+
import numpy as np
125+
from sklearn.ensemble import ExtraTreesClassifier
126+
127+
X = np.array([0, 1, 6, np.nan]).reshape(-1, 1)
128+
y = [0, 0, 1, 1]
129+
130+
forest = ExtraTreesClassifier(random_state=0).fit(X, y)
131+
forest.predict(X)
132+
133+
# %%
134+
# Download any dataset from the web
135+
# ---------------------------------
136+
#
137+
# The function :func:`datasets.fetch_file` allows downloading a file from any given URL.
138+
# This convenience function provides built-in local disk caching, sha256 digest
139+
# integrity check and an automated retry mechanism on network error.
140+
#
141+
# The goal is to provide the same convenience and reliability as dataset fetchers while
142+
# giving the flexibility to work with data from arbitrary online sources and file
143+
# formats.
144+
#
145+
# The dowloaded file can then be loaded with generic or domain specific functions such
146+
# as `pandas.read_csv`, `pandas.read_parquet`, etc.
147+
148+
# %%
149+
# Array API support
150+
# -----------------
151+
#
152+
# Many more estimators and functions have been updated to support array API compatible
153+
# inputs since version 1.5, in particular the meta-estimators for hyperparameter tuning
154+
# from the :mod:`sklearn.model_selection` module and the metrics from the
155+
# :mod:`sklearn.metrics` module.
156+
#
157+
# Please refer to the :ref:`array API support<array_api>` page for instructions to use
158+
# scikit-learn with array API compatible libraries such as PyTorch or CuPy.
159+
160+
# %%
161+
# Almost complete Metadata Routing support
162+
# ----------------------------------------
163+
#
164+
# Support for routing metadata has been added to all remaining estimators and
165+
# functions except AdaBoost. See :ref:`Metadata Routing User Guide <metadata_routing>`
166+
# for more details.
167+
168+
# %%
169+
# Free-threaded CPython 3.13 support
170+
# ----------------------------------
171+
#
172+
# scikit-learn has preliminary support for free-threaded CPython, in particular
173+
# free-threaded wheels are available for all of our supported platforms.
174+
#
175+
# Free-threaded (also known as nogil) CPython 3.13 is an experimental version of
176+
# CPython 3.13 which aims at enabling efficient multi-threaded use cases by
177+
# removing the Global Interpreter Lock (GIL).
178+
#
179+
# For more details about free-threaded CPython see `py-free-threading doc <https://py-free-threading.github.io>`_,
180+
# in particular `how to install a free-threaded CPython <https://py-free-threading.github.io/installing_cpython/>`_
181+
# and `Ecosystem compatibility tracking <https://py-free-threading.github.io/tracking/>`_.
182+
#
183+
# Feel free to try free-threaded CPython on your use case and report any issues!
184+
185+
# %%
186+
# Improvements to the developer API for third party libraries
187+
# -----------------------------------------------------------
188+
#
189+
# We have been working on improving the developer API for third party libraries.
190+
# This is still a work in progress, but a fair amount of work has been done in this
191+
# release. This release includes:
192+
#
193+
# - :func:`sklearn.utils.validation.validate_data` is introduced and replaces the
194+
# previously private `BaseEstimator._validate_data` method. This function extends
195+
# :func:`~sklearn.utils.validation.check_array` and adds support for remembering
196+
# input feature counts and names.
197+
# - Estimator tags are now revamped and a part of the public API via
198+
# :class:`sklearn.utils.Tags`. Estimators should now override the
199+
# :meth:`BaseEstimator.__sklearn_tags__` method instead of implementing a `_more_tags`
200+
# method. If you'd like to support multiple scikit-learn versions, you can implement
201+
# both methods in your class.
202+
# - As a consequence of developing a public tag API, we've removed the `_xfail_checks`
203+
# tag and tests which are expected to fail are directly passed to
204+
# :func:`~sklearn.utils.estimator_checks.check_estimator` and
205+
# :func:`~sklearn.utils.estimator_checks.parametrize_with_checks`. See their
206+
# corresponding API docs for more details.
207+
# - Many tests in the common test suite are updated and raise more helpful error
208+
# messages. We've also added some new tests, which should help you more easily fix
209+
# potential issues with your estimators.
210+
#
211+
# An updated version of our :ref:`develop` is also available, which we recommend you
212+
# check out.

0 commit comments

Comments
 (0)