Push the right content

Pénélope Gittos · Pénélope Gittos · commit c24fecf44386 · 2025-06-30T17:15:53.000+02:00
diff --git a/_posts/2025-30-06-probabl-skolar.md b/_posts/2025-30-06-probabl-skolar.md
@@ -2,8 +2,8 @@
 #### Blog Post Template ####
 
 #### Post Information ####
-title: "Changes and development of scikit-learn's developer API"
-date: December 12, 2024
+title: "Skolar: an open-source initiative to democratize open data science"
+date: June 30, 2025
 
 #### Post Category and Tags ####
 # Format in titlecase without dashes (Ex. "Open Source" instead of "open-source")
@@ -12,7 +12,6 @@ categories:
 tags:
   - Open Source
   - Machine Learning
-  - License
 
 #### Featured Image ####
 featured-image: BSD_watermark.svg
@@ -21,121 +20,28 @@ featured-image: BSD_watermark.svg
 # Can accomodate multiple authors
 # Add SQUARE Author Image to /assets/images/author_images/ folder
 postauthors:
-  - name: Adrin Jalali
-    website: https://adrin.info/
-    image: adrin-jalali.jpeg
+  - name: Pénélope Gittos
+    website: https://www.linkedin.com/in/gittospenelope-data-analyst-growth-bilingual/
 ---
 <div>
   <img src="/assets/images/posts_images/{{ page.featured-image }}" alt="">
   {% include postauthor.html %}
 </div>
+https://skolar.probabl.ai/
 
-Historically, scikit-learn's API has been divided into public and private. Public API is
-intended to be used by users, and private API is used internally in scikit-learn to
-develop new features and estimators. However, many of those functionalities have become
-essential to develop scikit-learn estimators by third parties who develop them outside
-the scikit-learn codebase.
+The scikit-learn project always puts efforts on education to build and nurture a strong vibrant open-source community. The goal is straightforward: give everyone, everywhere, the tools they need to easily grasp, engage with, and meaningfully contribute to data science using open-source software. This mission is shared and actively supported by Probabl, a company that helps maintain scikit-learn by employing many of its core contributors and investing in its long-term sustainability. With their support and a deep commitment from the community, we continue building bridges between research, software, and education.
 
-When it comes to our public API, we have very strict and high standards on backward
-compatibility. The rule of thumb is that no change should cause a change in users'
-code unless we warn about it for two release cycles, which means we give users a year
-time to update their code.
+When the [Inria scikit-learn MOOC](https://inria.github.io/scikit-learn-mooc/)(Massive Open Online Course) first went live, our community got a front-row seat to the amazing impact of practical, accessible and open learning. More than 40,000 people worldwide have jumped into these courses, clearly highlighting the demand for organized, hands-on resources that blend theory with real-world practice.
 
-On the other hand, we have no such guarantees or constraints on our private API. This
-brings an issue to third party developers who would like to use methods used by
-scikit-learn developers to develop their estimators. Constantly changing private API
-without prior warning brings certain challenges to third party developers which is not
-ideal.
+Today, [Probabl](https://probabl.ai/) is excited to introduce Skolar, a new, fully open-source educational initiative, built directly from your feedback and all the lessons we've learned along the way. Developed by the maintainers and core developers of scikit-learn, Skolar is designed specifically for data science practitioners, offering hands-on, high-quality learning resources grounded in real-world applications and open-source values.
 
-As a result, we've been working on creating a developer API which would sit somewhere
-between our public and private API in terms of backward compatibility. That means we
-intend to try to keep that API stable, and if needed, introduce changes with one release
-cycle warning.
+Skolar exists to boost our shared values: openness, teamwork, and practicality. It offers clear, interactive tutorials and structured courses carefully designed to match industry challenges and specialized use-cases. But even more importantly, it captures the true spirit of open source: encouraging collaboration, peer-to-peer learning, and guidance from experts.
 
-In the past few releases, we've slowly introduced more functionalities under this
-umbrella. `__sklearn_clone__` and `__sklearn_is_fitted__` are two examples.
+Right now, we’re just at the beginning. Today, you can dive into our Scikit-learn Associate Practitioner online course, adapted from the popular Inria MOOC but enhanced with new material on unsupervised learning, especially clustering.
 
-In the 1.6 release, we focused on the testing infrastructure and estimator tag system.
-Estimator tags used to be private, and we were not sure about their design. In the 1.6
-release, new tags are introduced and using them looks like the following:
+The next stages, professional and expert levels, will launch soon. We’ll also add more courses covering other open-source libraries such as skrub (for data wrangling), hazardous (for survival analysis), and fairlearn (for fairness). Additionally, our scikit-learn team is planning to create  industry-specific modules tackling real-world needs in fields like healthcare, finance, medicine, and beyond.
 
-```python
-from sklearn.base import BaseEstimator, ClassifierMixin
+At its core, Skolar is about empowering people through education, driven entirely by our passion for openness and collaboration. We firmly believe that true open data science begins with community-built learning resources.
+We warmly welcome you, whether you're a contributor, learner, teacher, or just someone curious, to join us. Help shape Skolar’s future and support open-source education in data science.
 
-class MyEstimator(ClassifierMixin, BaseEstimator):
-
-  ...
-
-  def __sklearn_tags__(self):
-    tags = super().__sklearn_tags__()
-    # modify tags here
-    tags.non_deterministic = True
-    return tags
-```
-
-The new tags mostly follow the same structure as the old tags, but there are certain
-changes to them. The main change is that the old `_xfail_checks` is no longer present
-in the new tags. That tag was used to tell the common testing tools about the tests
-which are known to fail and are to be skipped. That information is now directly passed
-to the test functionalities. The old way of skipping a test was the following:
-
-```python
-from sklearn.base import BaseEstimator, ClassifierMixin
-
-class MyEstimator(ClassifierMixin, BaseEstimator):
-
-  ...
-
-  def _more_tags(self):
-    return {
-      "_xfail_checks": {
-        "check_to_skip_name": "this check is known to fail",
-        ...
-      }
-    }
-```
-
-And then when calling `check_estimator` or using `parametrize_with_checks` with `pytest`
-would automatically ignore those tests for the estimator.
-
-Instead, in this release, you pass that information directly to those methods:
-
-```python
-from sklearn.utils.estimator_checks import check_estimator, parametrize_with_checks
-
-CHECKS_EXPECTED_TO_FAIL = {
-  "check_to_skip_name": "this check is known to fail",
-  ...
-}
-
-# Using check_estimator
-def test_with_check_estimator():
-  check_estimator(MyEstimator(), expected_failed_checks=CHECKS_EXPECTED_TO_FAIL)
-
-# Using parametrize_with_checks
-@parametrize_with_checks(
-  [MyEstimator()],
-  expected_failed_checks=lambda est: CHECKS_EXPECTED_TO_FAIL
-)
-def test_with_parametrize_with_checks(estimator, check):
-  check(estimator)
-```
-
-While working on the testing infrastructure, we have also been working on improving our
-tests and that means in this release we had a particularly high number of changes in
-their names and what they do. The changes will make it easier for developers to fix
-issues with their estimators. Note that you can now pass `legacy=False` to both
-`check_estimator` and `parametrize_with_checks` to include only strictly API related
-tests.
-
-The above changes mean developers need to update their estimators and depending on
-what they use, write scikit-learn version specific code to handle supporting multiple
-scikit-learn versions. To make that process easier, we've worked on a package called
-[`sklearn_compat`](https://github.com/sklearn-compat/sklearn-compat/). You can either
-depend on it as a package dependency, or vendor a single file inside your project. At
-the moment this project is in its infancy and might change in the future. But hopefully
-it helps developers out there.
-
-If you think there are missing functionalities in the developer API, please let us know
-and give us feedback on our [issue tracker](
-https://github.com/scikit-learn/scikit-learn/issues).
+Create your account on Skolar today: https://skolar.probabl.ai