Skip to content

Commit ceda2e9

Browse files
Add blog post about Hugging Face and scikit-learn
1 parent 68026cd commit ceda2e9

File tree

5 files changed

+82
-0
lines changed

5 files changed

+82
-0
lines changed
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
---
2+
title: "scikit-learn and Hugging Face join forces"
3+
date: October 13, 2022
4+
5+
categories:
6+
- Updates
7+
- Community
8+
tags:
9+
- Open Source
10+
11+
featured-image: HFxsklearn.png
12+
13+
postauthors:
14+
- name: Lysandre Debut
15+
16+
website: https://github.com/LysandreJik
17+
image: "lysandre_debut.jpg"
18+
- name: François Goupil
19+
20+
website: https://github.com/francoisgoupil
21+
image: "francois_goupil.jpeg"
22+
---
23+
<div>
24+
<img src="/assets/images/posts_images/{{ page.featured-image }}" alt="">
25+
{% include postauthor.html %}
26+
</div>
27+
28+
29+
[Hugging Face](hf.co) is happy to announce that we're partnering with [scikit-learn](https://scikit-learn.org/stable/index.html) to further our support of the machine learning tools and ecosystem.
30+
31+
At Hugging Face, we've been putting a lot of effort into supporting deep learning, but we believe that machine learning as a whole can benefit from the tools we release. With statistical machine learning being essential in this field and scikit-learn dominating statistical ML, we're excited to partner and move forward together.
32+
33+
As of September 2022, the Hugging Face Hub already hosts nearly 4,000 tabular classification and tabular regression model checkpoints, and we strive for this trend to continue.
34+
35+
<div>
36+
<video preload="auto" autoplay loop muted="muted" volume="0">
37+
<source src="/assets/videos/HFxsklearn.mp4" type="video/mp4">
38+
</video>
39+
</div>
40+
41+
## Support to the scikit-learn consortium
42+
43+
Starting June 2022, Hugging Face is now an official sponsor of the scikit-learn consortium . Through this support, Hugging Face actively promotes the development and sustainability of sklearn. As a sponsor of the scikit-learn consortium hosted at the Inria foundation, we'll now participate in the scikit-learn consortium technical committee
44+
45+
## Development support
46+
To help  sustaining the development of the library , we're happy to welcome Adrin Jalali and Benjamin Bossan to the Hugging Face team. Adrin is a core developer of scikit-learn as well as [fairlearn](fairlearn.org), while Benjamin is the author of the [skorch](https://github.com/skorch-dev/skorch) library and is now a contributor to scikit-learn.
47+
48+
Hugging Face is happy to support the development of scikit-learn through code contributions, issues, pull requests, reviews, and discussions.
49+
50+
## Integration to and from the Hugging Face Hub
51+
52+
["Skops"](https://github.com/skops-dev/skops) is the name of the framework being actively developed as the link between the scikit-learn and the Hugging Face ecosystems. With Skops, we hope to facilitate essential workflows:
53+
54+
- The ability to push scikit-learn models on the Hugging Face Hub
55+
- The possibility to try out models directly in the browser
56+
- The automatic creation of model cards, to improve model documentation and understanding
57+
- The ability to collaborate with others on machine learning projects
58+
59+
### Snapshot of your work
60+
61+
Working at the intersection of scikit-learn and the Hub offers challenges linked to the two platforms. One of these challenges is secure persistence: the ability to serialize models in a secure, safe manner.
62+
63+
scikit-learn models (estimators, predictors, ...) are usually saved using pickle, which is notorious for not being a secure format. Sharing scikit-learn models in this format exposes receivers to potentially malicious data which could execute arbitrary code when run.
64+
65+
That's where secure persistence comes in: as the Hugging Face Hub aims to provide a platform for models, the ability to share safe, secure objects is essential. We've been working on adding secure persistence for scikit-learn models in [skops#128](https://github.com/skops-dev/skops/pull/128) and [skops#145](https://github.com/skops-dev/skops/pull/145)([doc preview](https://skops--145.org.readthedocs.build/en/145/persistence.html)). Instead of serializing using pickle, the object's contents are put into a zip file with an accompanying schema JSON file.
66+
67+
Read about the Skops library in the following blog post: [Introducing Skops](https://huggingface.co/blog/skops).
68+
69+
## Improving interoperability
70+
71+
Skops is an example of an integration of scikit-learn within our tools, but it is not the only example! We will strive to integrate with the rest of our ecosystem so that Hugging Face users may benefit from using scikit-learn tools and vice-versa.
72+
73+
An example is the `evaluate` library, dedicated to efficiently evaluating machine learning models and datasets. We aim for this tool to natively support [scikit-learn metrics](https://github.com/huggingface/evaluate/issues/297) in its API.
74+
75+
---
76+
77+
Through these efforts, we hope to kickstart a lasting relationship between the two ecosystems and provide simple, efficient bridges to lower the barrier of entry. We believe that educating and sharing models is the best way to foster inclusive machine learning from which all can benefit. We're excited to partner with scikit-learn for this endeavor.

assets/css/main.scss

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,8 @@ html {
2727
position: relative;
2828
font-size: 22px;
2929
}
30+
31+
video {
32+
width: 100% !important;
33+
height: auto !important;
34+
}
56.8 KB
Loading
87.4 KB
Loading

assets/videos/HFxsklearn.mp4

481 KB
Binary file not shown.

0 commit comments

Comments
 (0)