You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Historically, scikit-learn's API has been divided into public and private. Public API is
34
-
intended to be used by users, and private API is used internally in scikit-learn to
35
-
develop new features and estimators. However, many of those functionalities have become
36
-
essential to develop scikit-learn estimators by third parties who develop them outside
37
-
the scikit-learn codebase.
32
+
The scikit-learn project always puts efforts on education to build and nurture a strong vibrant open-source community. The goal is straightforward: give everyone, everywhere, the tools they need to easily grasp, engage with, and meaningfully contribute to data science using open-source software. This mission is shared and actively supported by Probabl, a company that helps maintain scikit-learn by employing many of its core contributors and investing in its long-term sustainability. With their support and a deep commitment from the community, we continue building bridges between research, software, and education.
38
33
39
-
When it comes to our public API, we have very strict and high standards on backward
40
-
compatibility. The rule of thumb is that no change should cause a change in users'
41
-
code unless we warn about it for two release cycles, which means we give users a year
42
-
time to update their code.
34
+
When the [Inria scikit-learn MOOC](https://inria.github.io/scikit-learn-mooc/)(Massive Open Online Course) first went live, our community got a front-row seat to the amazing impact of practical, accessible and open learning. More than 40,000 people worldwide have jumped into these courses, clearly highlighting the demand for organized, hands-on resources that blend theory with real-world practice.
43
35
44
-
On the other hand, we have no such guarantees or constraints on our private API. This
45
-
brings an issue to third party developers who would like to use methods used by
46
-
scikit-learn developers to develop their estimators. Constantly changing private API
47
-
without prior warning brings certain challenges to third party developers which is not
48
-
ideal.
36
+
Today, [Probabl](https://probabl.ai/) is excited to introduce Skolar, a new, fully open-source educational initiative, built directly from your feedback and all the lessons we've learned along the way. Developed by the maintainers and core developers of scikit-learn, Skolar is designed specifically for data science practitioners, offering hands-on, high-quality learning resources grounded in real-world applications and open-source values.
49
37
50
-
As a result, we've been working on creating a developer API which would sit somewhere
51
-
between our public and private API in terms of backward compatibility. That means we
52
-
intend to try to keep that API stable, and if needed, introduce changes with one release
53
-
cycle warning.
38
+
Skolar exists to boost our shared values: openness, teamwork, and practicality. It offers clear, interactive tutorials and structured courses carefully designed to match industry challenges and specialized use-cases. But even more importantly, it captures the true spirit of open source: encouraging collaboration, peer-to-peer learning, and guidance from experts.
54
39
55
-
In the past few releases, we've slowly introduced more functionalities under this
56
-
umbrella. `__sklearn_clone__` and `__sklearn_is_fitted__` are two examples.
40
+
Right now, we’re just at the beginning. Today, you can dive into our Scikit-learn Associate Practitioner online course, adapted from the popular Inria MOOC but enhanced with new material on unsupervised learning, especially clustering.
57
41
58
-
In the 1.6 release, we focused on the testing infrastructure and estimator tag system.
59
-
Estimator tags used to be private, and we were not sure about their design. In the 1.6
60
-
release, new tags are introduced and using them looks like the following:
42
+
The next stages, professional and expert levels, will launch soon. We’ll also add more courses covering other open-source libraries such as skrub (for data wrangling), hazardous (for survival analysis), and fairlearn (for fairness). Additionally, our scikit-learn team is planning to create industry-specific modules tackling real-world needs in fields like healthcare, finance, medicine, and beyond.
61
43
62
-
```python
63
-
from sklearn.base import BaseEstimator, ClassifierMixin
44
+
At its core, Skolar is about empowering people through education, driven entirely by our passion for openness and collaboration. We firmly believe that true open data science begins with community-built learning resources.
45
+
We warmly welcome you, whether you're a contributor, learner, teacher, or just someone curious, to join us. Help shape Skolar’s future and support open-source education in data science.
64
46
65
-
classMyEstimator(ClassifierMixin, BaseEstimator):
66
-
67
-
...
68
-
69
-
def__sklearn_tags__(self):
70
-
tags =super().__sklearn_tags__()
71
-
# modify tags here
72
-
tags.non_deterministic =True
73
-
return tags
74
-
```
75
-
76
-
The new tags mostly follow the same structure as the old tags, but there are certain
77
-
changes to them. The main change is that the old `_xfail_checks` is no longer present
78
-
in the new tags. That tag was used to tell the common testing tools about the tests
79
-
which are known to fail and are to be skipped. That information is now directly passed
80
-
to the test functionalities. The old way of skipping a test was the following:
81
-
82
-
```python
83
-
from sklearn.base import BaseEstimator, ClassifierMixin
84
-
85
-
classMyEstimator(ClassifierMixin, BaseEstimator):
86
-
87
-
...
88
-
89
-
def_more_tags(self):
90
-
return {
91
-
"_xfail_checks": {
92
-
"check_to_skip_name": "this check is known to fail",
93
-
...
94
-
}
95
-
}
96
-
```
97
-
98
-
And then when calling `check_estimator` or using `parametrize_with_checks` with `pytest`
99
-
would automatically ignore those tests for the estimator.
100
-
101
-
Instead, in this release, you pass that information directly to those methods:
102
-
103
-
```python
104
-
from sklearn.utils.estimator_checks import check_estimator, parametrize_with_checks
105
-
106
-
CHECKS_EXPECTED_TO_FAIL= {
107
-
"check_to_skip_name": "this check is known to fail",
0 commit comments