You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2022-05-18-sprints-value.md
+40-10Lines changed: 40 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ postauthors:
20
20
{% include postauthor.html %}
21
21
</div>
22
22
23
-
With contributions from: Gaël Varoquaux, Andreas Mueller, Olivier Grisel, Julien Jerphanion, Guillaume Lemaitre
23
+
With contributions from: Gaël Varoquaux, Andreas Mueller, Olivier Grisel, Julien Jerphanion, Guillaume LeMaitre
24
24
25
25
## Top Line Summary
26
26
@@ -30,7 +30,7 @@ Sprints are **working sessions to contribute to an open source library**. The go
30
30
31
31
The [scikit-learn](https://scikit-learn.org/dev/index.html) project has a long and extraordinary legacy of open source sprints. Since 2010, when its [first public version](https://en.wikipedia.org/wiki/Scikit-learn) was released, there have been as many as [45 sprints organized](https://blog.scikit-learn.org/sprints/). The 45 number is a lower bound, since there are likely more sprints that have not been listed.
32
32
33
-
To date, [scikit-learn](https://github.com/scikit-learn/scikit-learn) has **over 2300** contributors to the library. The number of contributors to scikit-learn exceeds those of other related libraries such as numpy, scipy and matplotlib, with the exception of the [pandas](https://github.com/pandas-dev/pandas), which has a greater number of contributors (See Appendix A).
33
+
To date, more than 2300 people have contributed to [scikit-learn](https://github.com/scikit-learn/scikit-learn). The number of contributors to scikit-learn exceeds those of other related libraries such as numpy, scipy and matplotlib, with the exception of the [pandas](https://github.com/pandas-dev/pandas), which has a greater number of contributors (See Appendix A).
34
34
35
35
The public discourse on open source has expanded to explore topics of sustainability, funding models, and diversity and inclusion, to name a few. A *reasonable*, yet *”difficult to answer”* question that has been posed is:
36
36
>*<spanstyle="background-color: #CAE9F5;">
@@ -39,7 +39,7 @@ What is the effectiveness of sprint models and what is the long-term engagement
39
39
40
40
41
41
42
-
Due to technological limitations of GitHub, we do not hold precise data on how many scikit-learn contributors connected to the project via a sprint. We have no formal data collection process which records statistics on how many sprint participants are recurring or information on their contributions to other open source projects or other long term positive ripple effects. A scientific look at the correlation between the number of sprints and contributors is beyond the scope of this article. What we *will examine* in this article are the **objectives, results and aspirations** of running the scikit-learn open source sprints.
42
+
Due to technological limitations of GitHub and privacy concerns, we do not hold precise data on how many scikit-learn contributors connected to the project via a sprint. We have no formal data collection process which records statistics on how many sprint participants are recurring or information on their contributions to other open source projects or other long term positive ripple effects. A scientific look at the correlation between the number of sprints and contributors is beyond the scope of this article. What we *will examine* in this article are the **objectives, results and aspirations** of running the scikit-learn sprints.
43
43
44
44
The queries from other open-source projects requesting guidance on sprints and diversity and inclusions have been increasing. We share these experiences and lessons learned with the community, potential funders and open source project maintainers, particularly those projects which are nascent in their quest to build community, sustainability and diversity and inclusion.
45
45
@@ -52,9 +52,9 @@ In this article we examine the following:
52
52
- What value do open source sprints bring to the project and community?
53
53
- What are the aspirations of the scikit-learn project, in terms of connecting with the community?
54
54
55
-
## Definition of Sprint
55
+
## Definition of a scikit-learn Sprint
56
56
57
-
A sprint has traditionally been an event where contributors come together to work on issues in the scikit-learn repository. A sprint can be as short as a few hours, or last over several days, even a week or longer. They may be in-person, online, hybrid or asynchronous. Sprints may be organized by the developers of the library, community groups (such as Meetups), scheduled alongside scientific or Python conferences, or even at home with a few friends. They can more simply and less dauntingly be described as
57
+
A scikit-learn sprint has traditionally been an event where contributors come together to work on issues in the scikit-learn repository. A sprint can be as short as a few hours, or last over several days, even a week or longer. They may be in-person, online, hybrid or partially asynchronous. Sprints may be organized by the developers of the library, community groups (such as Meetups), scheduled alongside scientific or Python conferences, or even at home with a few friends. They can more simply and less dauntingly be described as
58
58
<spanstyle="background-color: #CAE9F5;">
59
59
working sessions to contribute to the open source library.
60
60
</span>
@@ -67,15 +67,17 @@ We distinguish between a Developer (Dev) and Community sprint because the goals
67
67
68
68
A Developer, or “dev”, sprint is one that is typically organized by the maintainers of the library. A dev sprint is one where the developers or maintainers of the library gather to work on issues and to discuss the resolution of ongoing complex issues. This also provides the team an opportunity to focus on tasks related to the long-term roadmap of the project.
69
69
70
-
For scikit-learn, the early sprints were alongside the [SciPy conferences](https://conference.scipy.org) and the practice has continued for over a decade.
70
+
For scikit-learn, the early Community sprints were alongside the [SciPy conferences](https://conference.scipy.org) and the practice has continued for over a decade.
71
+
72
+
The first early Dev sprints were organized at Inria. The first [major Dev sprint](https://github.com/scikit-learn/scikit-learn/wiki/Past-sprints#granada-19th-21th-dec-2011) was held in Granada after the NIPS 2011 conference (now renamed NeurIPS). It was the first time that most of the team had met in real life after months or years of online collaboration, and over a dozen developers participated. Later, Dev sprints were often hosted in the offices of partnering tech companies, typically from 3 to 7 days, once a year, in pre-COVID times.
71
73
72
74
**Community Sprint**
73
75
74
76
A Community sprint can be a collaboration by individuals, by affinity communities such as Meetup Groups (Data Umbrella, PyLadies, etc.), by conferences (SciPy, PyData Global, JupyterCon, etc.). A Community sprint is one that is with the general public and it may be beginners, experts, or a combination of both.
75
77
76
78
At a Developer sprint, a contributor may work on a PR that has been ongoing for three months. Conversely, Community sprints require curated issues which newcomers can complete in a shorter period of time (such as 1 day, or 1 day with 1-2 months follow-up).
77
79
78
-
The landscape of community sprints with other [scientific python](https://scientific-python.org/calendars/) libraries is unknown. It is possible that scikit-learn may have had community sprints earlier than other projects.
80
+
The landscape of community sprints with other [scientific python](https://scientific-python.org/calendars/) libraries is unknown.
79
81
80
82
## Goals of the Sprints
81
83
@@ -94,7 +96,7 @@ The landscape of community sprints with other [scientific python](https://scient
94
96
- To onboard new contributors to scikit-learn and PyData generally
95
97
- To onboard new contributors who would become recurring contributors
96
98
- To collaborate with community groups to increase diversity of contributor base with intentional outreach
97
-
- To increase the number of recurring contributors
99
+
- To strengthen and support existing contributors in order to maintain recurring community contributors
98
100
99
101
## scikit-learn Team Members Who Connected to the Project Via a Sprint
100
102
@@ -135,7 +137,8 @@ In her PyConDE PyData Berlin keynote from April 2022, [5 Years, 10 Sprints, a s
135
137
[Juan Martín Loyola](https://github.com/jmloyola) started [contributing to scikit-learn](https://blog.scikit-learn.org/team/jml-interview/) as preparation for the [Data Umbrella Latin America, June 2021](https://blog.dataumbrella.org/data-umbrella-afme2-2021-scikit-learn-sprint-report) sprint. He continued to contribute prolifically after the sprint, and he was invited to join the team in December 2021. Given his location in Argentina, he will be providing support at the [2022 SciPy Latin America](https://www.scipy.lat/es/scipycon.html) sprint.
136
138
137
139
### Second Degree Impact
138
-
[Lauren Burke](https://github.com/laurburke) joined the scikit-learn Communications Team in November 2021 at the recommendation of Reshama Shaikh, and this can be considered a network effect.
140
+
[Lauren Burke](https://github.com/laurburke) joined the scikit-learn Communications Team in November 2021 at the recommendation of Reshama Shaikh, and this can be considered a network effect. This demonstrates that sprints can result in valuable contributions other than code.
141
+
139
142
140
143
## Sprints: Observed Impact and Lessons Learned
141
144
@@ -151,7 +154,7 @@ Sprint participants, whether one-time or recurring, become ambassadors for the p
151
154
152
155
**Open source workflow knowledge**
153
156
154
-
Users learn about testing, control version system (i.e. git), documentation which they bring to their work. The sprint experience assists contributors in developing a [wider set of technical skills](https://academiccommons.columbia.edu/doi/10.7916/D89G70BS) that can be shared across projects, networking, on to jobs and more.
157
+
Users learn a range of tools such as: virtual environment setup, version control systems (i.e. Git), testing (flake8, pytest, continuous integration) and unit tests. They also learn software development best practices. For many users of scikit-learn, the sprint is the first time they navigate through the codebase and structure of scikit-learn, dig into functions and learn about errors. They develop experience in collaborative open source workflow. For employers, letting their team contribute to open-source might be a plus as they learn how to collaborate properly and learn about the internals of the library. The sprint experience assists contributors in developing a [wider set of technical skills](https://academiccommons.columbia.edu/doi/10.7916/D89G70BS) that can be shared across projects, networking, on to jobs and more.
155
158
156
159
157
160
**Overcoming barriers to entry**
@@ -291,6 +294,33 @@ get some time but I currently have limited of it).
291
294
>Finally, I would also really treasure having in-person sprints [in Paris] with external (recurring)
292
295
contributors (with a specific expertise) on advanced subjects when it is possible in the future.
293
296
297
+
## Conclusion
298
+
299
+
### Connecting and Supporting scikit-learn
300
+
301
+
To connect with the scikit-learn project, these are the most active social media platforms:
It is most welcome for users to “star” the code repository on GitHub: [scikit-learn/scikit-learn](https://github.com/scikit-learn/scikit-learn)
306
+
307
+
Our office hours, in addition to public developers and triage meetings are all posted on our [Community Calendar](https://blog.scikit-learn.org/calendar/).
308
+
309
+
The next Community sprint may be held at [EuroScipy 2022](https://www.euroscipy.org/2022/index.html) in Basel Switzerland in early September. Information on past and [upcoming sprints](https://blog.scikit-learn.org/sprints/) are shared on our community site.
310
+
311
+
312
+
### Contributing to scikit-learn
313
+
314
+
To contribute to scikit-learn, we have resources available here:
0 commit comments