Skip to content

Commit c24fecf

Browse files
author
Pénélope Gittos
committed
Push the right content
1 parent ca03fbc commit c24fecf

File tree

1 file changed

+14
-108
lines changed

1 file changed

+14
-108
lines changed

_posts/2025-30-06-probabl-skolar.md

Lines changed: 14 additions & 108 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
#### Blog Post Template ####
33

44
#### Post Information ####
5-
title: "Changes and development of scikit-learn's developer API"
6-
date: December 12, 2024
5+
title: "Skolar: an open-source initiative to democratize open data science"
6+
date: June 30, 2025
77

88
#### Post Category and Tags ####
99
# Format in titlecase without dashes (Ex. "Open Source" instead of "open-source")
@@ -12,7 +12,6 @@ categories:
1212
tags:
1313
- Open Source
1414
- Machine Learning
15-
- License
1615

1716
#### Featured Image ####
1817
featured-image: BSD_watermark.svg
@@ -21,121 +20,28 @@ featured-image: BSD_watermark.svg
2120
# Can accomodate multiple authors
2221
# Add SQUARE Author Image to /assets/images/author_images/ folder
2322
postauthors:
24-
- name: Adrin Jalali
25-
website: https://adrin.info/
26-
image: adrin-jalali.jpeg
23+
- name: Pénélope Gittos
24+
website: https://www.linkedin.com/in/gittospenelope-data-analyst-growth-bilingual/
2725
---
2826
<div>
2927
<img src="/assets/images/posts_images/{{ page.featured-image }}" alt="">
3028
{% include postauthor.html %}
3129
</div>
30+
https://skolar.probabl.ai/
3231

33-
Historically, scikit-learn's API has been divided into public and private. Public API is
34-
intended to be used by users, and private API is used internally in scikit-learn to
35-
develop new features and estimators. However, many of those functionalities have become
36-
essential to develop scikit-learn estimators by third parties who develop them outside
37-
the scikit-learn codebase.
32+
The scikit-learn project always puts efforts on education to build and nurture a strong vibrant open-source community. The goal is straightforward: give everyone, everywhere, the tools they need to easily grasp, engage with, and meaningfully contribute to data science using open-source software. This mission is shared and actively supported by Probabl, a company that helps maintain scikit-learn by employing many of its core contributors and investing in its long-term sustainability. With their support and a deep commitment from the community, we continue building bridges between research, software, and education.
3833

39-
When it comes to our public API, we have very strict and high standards on backward
40-
compatibility. The rule of thumb is that no change should cause a change in users'
41-
code unless we warn about it for two release cycles, which means we give users a year
42-
time to update their code.
34+
When the [Inria scikit-learn MOOC](https://inria.github.io/scikit-learn-mooc/)(Massive Open Online Course) first went live, our community got a front-row seat to the amazing impact of practical, accessible and open learning. More than 40,000 people worldwide have jumped into these courses, clearly highlighting the demand for organized, hands-on resources that blend theory with real-world practice.
4335

44-
On the other hand, we have no such guarantees or constraints on our private API. This
45-
brings an issue to third party developers who would like to use methods used by
46-
scikit-learn developers to develop their estimators. Constantly changing private API
47-
without prior warning brings certain challenges to third party developers which is not
48-
ideal.
36+
Today, [Probabl](https://probabl.ai/) is excited to introduce Skolar, a new, fully open-source educational initiative, built directly from your feedback and all the lessons we've learned along the way. Developed by the maintainers and core developers of scikit-learn, Skolar is designed specifically for data science practitioners, offering hands-on, high-quality learning resources grounded in real-world applications and open-source values.
4937

50-
As a result, we've been working on creating a developer API which would sit somewhere
51-
between our public and private API in terms of backward compatibility. That means we
52-
intend to try to keep that API stable, and if needed, introduce changes with one release
53-
cycle warning.
38+
Skolar exists to boost our shared values: openness, teamwork, and practicality. It offers clear, interactive tutorials and structured courses carefully designed to match industry challenges and specialized use-cases. But even more importantly, it captures the true spirit of open source: encouraging collaboration, peer-to-peer learning, and guidance from experts.
5439

55-
In the past few releases, we've slowly introduced more functionalities under this
56-
umbrella. `__sklearn_clone__` and `__sklearn_is_fitted__` are two examples.
40+
Right now, we’re just at the beginning. Today, you can dive into our Scikit-learn Associate Practitioner online course, adapted from the popular Inria MOOC but enhanced with new material on unsupervised learning, especially clustering.
5741

58-
In the 1.6 release, we focused on the testing infrastructure and estimator tag system.
59-
Estimator tags used to be private, and we were not sure about their design. In the 1.6
60-
release, new tags are introduced and using them looks like the following:
42+
The next stages, professional and expert levels, will launch soon. We’ll also add more courses covering other open-source libraries such as skrub (for data wrangling), hazardous (for survival analysis), and fairlearn (for fairness). Additionally, our scikit-learn team is planning to create industry-specific modules tackling real-world needs in fields like healthcare, finance, medicine, and beyond.
6143

62-
```python
63-
from sklearn.base import BaseEstimator, ClassifierMixin
44+
At its core, Skolar is about empowering people through education, driven entirely by our passion for openness and collaboration. We firmly believe that true open data science begins with community-built learning resources.
45+
We warmly welcome you, whether you're a contributor, learner, teacher, or just someone curious, to join us. Help shape Skolar’s future and support open-source education in data science.
6446

65-
class MyEstimator(ClassifierMixin, BaseEstimator):
66-
67-
...
68-
69-
def __sklearn_tags__(self):
70-
tags = super().__sklearn_tags__()
71-
# modify tags here
72-
tags.non_deterministic = True
73-
return tags
74-
```
75-
76-
The new tags mostly follow the same structure as the old tags, but there are certain
77-
changes to them. The main change is that the old `_xfail_checks` is no longer present
78-
in the new tags. That tag was used to tell the common testing tools about the tests
79-
which are known to fail and are to be skipped. That information is now directly passed
80-
to the test functionalities. The old way of skipping a test was the following:
81-
82-
```python
83-
from sklearn.base import BaseEstimator, ClassifierMixin
84-
85-
class MyEstimator(ClassifierMixin, BaseEstimator):
86-
87-
...
88-
89-
def _more_tags(self):
90-
return {
91-
"_xfail_checks": {
92-
"check_to_skip_name": "this check is known to fail",
93-
...
94-
}
95-
}
96-
```
97-
98-
And then when calling `check_estimator` or using `parametrize_with_checks` with `pytest`
99-
would automatically ignore those tests for the estimator.
100-
101-
Instead, in this release, you pass that information directly to those methods:
102-
103-
```python
104-
from sklearn.utils.estimator_checks import check_estimator, parametrize_with_checks
105-
106-
CHECKS_EXPECTED_TO_FAIL = {
107-
"check_to_skip_name": "this check is known to fail",
108-
...
109-
}
110-
111-
# Using check_estimator
112-
def test_with_check_estimator():
113-
check_estimator(MyEstimator(), expected_failed_checks=CHECKS_EXPECTED_TO_FAIL)
114-
115-
# Using parametrize_with_checks
116-
@parametrize_with_checks(
117-
[MyEstimator()],
118-
expected_failed_checks=lambda est: CHECKS_EXPECTED_TO_FAIL
119-
)
120-
def test_with_parametrize_with_checks(estimator, check):
121-
check(estimator)
122-
```
123-
124-
While working on the testing infrastructure, we have also been working on improving our
125-
tests and that means in this release we had a particularly high number of changes in
126-
their names and what they do. The changes will make it easier for developers to fix
127-
issues with their estimators. Note that you can now pass `legacy=False` to both
128-
`check_estimator` and `parametrize_with_checks` to include only strictly API related
129-
tests.
130-
131-
The above changes mean developers need to update their estimators and depending on
132-
what they use, write scikit-learn version specific code to handle supporting multiple
133-
scikit-learn versions. To make that process easier, we've worked on a package called
134-
[`sklearn_compat`](https://github.com/sklearn-compat/sklearn-compat/). You can either
135-
depend on it as a package dependency, or vendor a single file inside your project. At
136-
the moment this project is in its infancy and might change in the future. But hopefully
137-
it helps developers out there.
138-
139-
If you think there are missing functionalities in the developer API, please let us know
140-
and give us feedback on our [issue tracker](
141-
https://github.com/scikit-learn/scikit-learn/issues).
47+
Create your account on Skolar today: https://skolar.probabl.ai

0 commit comments

Comments
 (0)