You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: _posts/2022-05-18-sprints-value.md
+28-28Lines changed: 28 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -28,11 +28,11 @@ Sprints are **working sessions to contribute to an open source library**. The go
28
28
29
29
## Introduction
30
30
31
-
The [scikit-learn](https://scikit-learn.org/dev/index.html) project has a long and extraordinary legacy of open source sprints. Since 2010, when its [first public version](https://en.wikipedia.org/wiki/Scikit-learn) was released, there have been as many as [45 sprints organized](https://blog.scikit-learn.org/sprints/). The 45 number is a lower bound, since there are likely more sprints that have not been listed.
31
+
The [scikit-learn](https://scikit-learn.org/dev/index.html) project has a long and extraordinary legacy of open source sprints. Since 2010, when its [first public version](https://en.wikipedia.org/wiki/Scikit-learn) was released, there have been as many as [45 sprints organized](https://blog.scikit-learn.org/sprints/). The number 45 is a lower bound, since there are likely more sprints that have not been listed.
32
32
33
33
To date, more than 2300 people have contributed to [scikit-learn](https://github.com/scikit-learn/scikit-learn). The number of contributors to scikit-learn exceeds those of other related libraries such as numpy, scipy and matplotlib, with the exception of the [pandas](https://github.com/pandas-dev/pandas), which has a greater number of contributors (See Appendix A).
34
34
35
-
The public discourse on open source has expanded to explore topics of sustainability, funding models, and diversity and inclusion, to name a few. A *reasonable*, yet *”difficult to answer”* question that has been posed is:
35
+
The public discourse on open source has expanded to explore topics of sustainability, funding models, and diversity and inclusion, to name a few. A *reasonable*, yet ”difficult to answer” question that has been posed is:
36
36
>*<spanstyle="background-color: #CAE9F5;">
37
37
What is the effectiveness of sprint models and what is the long-term engagement as a result of these sprints?
38
38
</span>*
@@ -41,7 +41,7 @@ What is the effectiveness of sprint models and what is the long-term engagement
41
41
42
42
Due to technological limitations of GitHub and privacy concerns, we do not hold precise data on how many scikit-learn contributors connected to the project via a sprint. We have no formal data collection process which records statistics on how many sprint participants are recurring or information on their contributions to other open source projects or other long term positive ripple effects. A scientific look at the correlation between the number of sprints and contributors is beyond the scope of this article. What we *will examine* in this article are the **objectives, results and aspirations** of running the scikit-learn sprints.
43
43
44
-
The queries from other open-source projects requesting guidance on sprints and diversity and inclusions have been increasing. We share these experiences and lessons learned with the community, potential funders and open source project maintainers, particularly those projects which are nascent in their quest to build community, sustainability and diversity and inclusion.
44
+
<spanstyle="background-color: #CAE9F5;">The queries from other open-source projects requesting guidance on sprints and diversity and inclusions have been increasing.</span> We share these experiences and lessons learned with the community, potential funders and open source project maintainers, particularly those projects which are nascent in their quest to build community, sustainability and diversity and inclusion.
45
45
46
46
## Outline
47
47
@@ -65,42 +65,42 @@ We distinguish between a Developer (Dev) and Community sprint because the goals
65
65
66
66
**Developer (Dev) Sprint**
67
67
68
-
A Developer, or “dev”, sprint is one that is typically organized by the maintainers of the library. A dev sprint is one where the developers or maintainers of the library gather to work on issues and to discuss the resolution of ongoing complex issues. This also provides the team an opportunity to focus on tasks related to the long-term roadmap of the project.
69
-
70
-
For scikit-learn, the early Community sprints were alongside the [SciPy conferences](https://conference.scipy.org) and the practice has continued for over a decade.
68
+
A Developer, or “Dev”, sprint is one that is typically organized by the maintainers of the library. A Dev sprint is one where the developers or maintainers of the library gather to work on issues and to discuss the resolution of ongoing complex issues. This also provides the team an opportunity to focus on tasks related to the long-term roadmap of the project.
71
69
72
70
The first early Dev sprints were organized at Inria. The first [major Dev sprint](https://github.com/scikit-learn/scikit-learn/wiki/Past-sprints#granada-19th-21th-dec-2011) was held in Granada after the NIPS 2011 conference (now renamed NeurIPS). It was the first time that most of the team had met in real life after months or years of online collaboration, and over a dozen developers participated. Later, Dev sprints were often hosted in the offices of partnering tech companies, typically from 3 to 7 days, once a year, in pre-COVID times.
73
71
74
72
**Community Sprint**
75
73
76
-
A Community sprint can be a collaboration by individuals, by affinity communities such as Meetup Groups (Data Umbrella, PyLadies, etc.), by conferences (SciPy, PyData Global, JupyterCon, etc.). A Community sprint is one that is with the general public and it may be beginners, experts, or a combination of both.
74
+
A Community sprint can be a collaboration by individuals, by affinity communities such as Meetup Groups (Data Umbrella, PyLadies, etc.), by conferences (SciPy, PyCon, PyData Global, JupyterCon, etc.). A Community sprint is one that is with the general public and it may be beginners, experts, or a combination of both.
75
+
76
+
For scikit-learn, the early Community sprints were alongside the [SciPy conferences](https://conference.scipy.org) and the practice has continued for over a decade.
77
77
78
78
At a Developer sprint, a contributor may work on a PR that has been ongoing for three months. Conversely, Community sprints require curated issues which newcomers can complete in a shorter period of time (such as 1 day, or 1 day with 1-2 months follow-up).
79
79
80
-
The landscape of community sprints with other [scientific python](https://scientific-python.org/calendars/) libraries is unknown.
80
+
The landscape of Dev and Community sprints with other [scientific python](https://scientific-python.org/calendars/) libraries is unknown.
81
81
82
82
## Goals of the Sprints
83
83
84
84
### Goals of Dev Sprints
85
85
- To get maintainers in one room to efficiently discuss open issues and pull requests
86
86
- To move along contributions in a synchronous fashion
87
87
- To foster existing collaborations with external developers synchronously
88
-
- To building rapport: Maintainers reside in various continents and the in-person sprints build rapport within the team. Social interactions are critical in having a productive team
88
+
- To build rapport: Maintainers reside in various continents and the in-person sprints build rapport within the team. Social interactions are critical in having a productive team.
89
89
- To foster collaborations with the project’s corporate sponsors (members of the [scikit-learn Consortium](https://scikit-learn.org/stable/about.html#funding))
90
90
91
91
### Goals of Community & Beginner Sprints
92
92
93
93
- To broaden the project’s contributor base
94
94
- To build community and connect the project maintainers with its users
95
-
- To get interactive feedback from new scikit-learn users and contributors
95
+
- To obtain interactive feedback from new scikit-learn users and contributors
96
96
- To onboard new contributors to scikit-learn and PyData generally
97
97
- To onboard new contributors who would become recurring contributors
98
98
- To collaborate with community groups to increase diversity of contributor base with intentional outreach
99
99
- To strengthen and support existing contributors in order to maintain recurring community contributors
100
100
101
101
## scikit-learn Team Members Who Connected to the Project Via a Sprint
102
102
103
-
It is notable that a number of the current maintainers of the library found their way to the project via a sprint. Additionally, some members of the Contributor Experience Team also connected to the scikit-learn project via the sprints.
103
+
It is notable that a number of the current maintainers of the library found their way to the project via a sprint. Additionally, some members of the Contributor Experience Team connected to the scikit-learn project via the sprints.
104
104
105
105
### Olivier Grisel
106
106
@@ -126,7 +126,7 @@ Olivier shares:
126
126
He contributed code, reviews, and documentation since March 2021, joined Inria in April 2021 and in October 2021, Julien became a core developer.
127
127
128
128
### Other Maintainers
129
-
There are [other maintainers](https://scikit-learn.org/dev/about.html#people) and emeritus contributors who had participated in a developer or community sprint along their journey with the scikit-learn team, such as Vlad Nicolae (current maintainer), Gilles Loupe (Emeritus), Thouis (Ray) Jones (Emeritus).
129
+
There are [other maintainers](https://scikit-learn.org/dev/about.html#people) and emeritus contributors who had participated in a Developer or Community sprint along their journey with the scikit-learn team, such as Vlad Nicolae (current maintainer), Gilles Loupe (Emeritus), Thouis (Ray) Jones (Emeritus).
130
130
131
131
### Reshama Shaikh
132
132
[Reshama Shaikh](https://github.com/reshamas) has organized nine scikit-learn [community sprints](https://www.dataumbrella.org/sprints) from 2017 to 2021. She first contributed code and documentation fixes to scikit-learn in September 2018. In September 2020, she was invited to join the scikit-learn team.
@@ -159,7 +159,7 @@ Users learn a range of tools such as: virtual environment setup, version control
159
159
160
160
**Overcoming barriers to entry**
161
161
162
-
The sprints, as a “hands-on working session”, provides an avenue for potential contributors to overcome common barriers to entry, particularly “getting started”, and moving from the *possibility* to an *actuality* stage.
162
+
The sprints, as a “hands-on working session”, provide an avenue for potential contributors to overcome common barriers to entry, particularly “getting started”, and moving from the *possibility* to an *actuality* stage.
163
163
164
164
**Providing an avenue for advanced contributions**
165
165
@@ -192,41 +192,41 @@ These have been the observed benefits of the online sprints, which began in 2020
192
192
193
193
**Networking**
194
194
195
-
Sprints make it easier to meet new people with different backgrounds, and in particular, online sprints help break geographical barriers.
195
+
Online sprints make it easier to meet new people with different backgrounds.
196
196
197
197
**International collaboration**
198
198
199
-
Collaborating with affinity communities can attract more candidates from various backgrounds.
199
+
Collaborating with affinity communities can attract more candidates from various backgrounds. In particular, online sprints help break geographical barriers.
200
200
201
201
**Pair programming**
202
202
203
-
The pairing of contributors seems to work well. Pair programming was consistently ranked as a positive experience by online sprint participants
203
+
The pairing of contributors seems to work well. Pair programming was consistently ranked as a positive experience by online sprint participants.
204
204
205
205
**Increases accessibility**
206
206
207
-
The use of online tools in particular makes it possible to interact with people
208
-
who would not have joined traditional community events organized in
207
+
The use of online tools makes it possible to interact with people
208
+
who would not have joined community events traditionally organized in
209
209
North America or western Europe e.g. because of the travel costs and
210
-
complexity to get a visa in time. Attending those online events is probably also less disruptive for people with young children.
210
+
complexity of obtaining a visa in time. Attending the online events is probably also less disruptive for people with young children.
211
211
212
212
For the scikit-learn project itself, it made it possible to "recruit" a couple of new recurring contributors who attend regular office hours after the original sprints.
213
213
214
214
**Office Hours**
215
215
216
-
Actually the fact that we now have community office hours on discord is probably a consequence of us attending the Data Umbrella online sprints.
216
+
Actually the fact that we now have community office hours on Discord is probably a consequence of us attending the Data Umbrella online sprints.
217
217
218
218
Olivier shares:
219
219
>I think they [the sprints] were the most interesting online events I attended during
220
220
the COVID-19 crisis when all traditional on-site tech events were canceled. In particular the active planning by the Data Umbrella team for participants to work in pairs with audio rooms on Discord + a central help desk audio room worked really well.
221
221
222
222
>The pre-sprint and post-sprint office hours also made it possible to limit the time spent on helping fix setup issues compared to what we experience in traditional sprints. They also forced us as maintainers to review and fix our documentation before the event.
223
223
224
-
**Creation of supplementary resources in various medium forms**
224
+
**Creation of supplementary resources in different media types**
225
225
226
-
Data Umbrella coordinated the creation of a series of videos and transcripts that provided learning materials for the community to prepare for the sprint. These resources were available to the public and have a wide reach:
226
+
Data Umbrella coordinated the creation of a series of videos and transcripts that provided learning materials for the community to prepare for the sprint. These resources are available to the public and have a wide reach:
227
227
228
228
This is the [Contributing to scikit-learn](https://www.youtube.com/playlist?list=PLBKcU7Ik-ir-b1fwjNabO3b8ebs9ez5ga
229
-
) list of videos that were created for the sprints.
229
+
) list of videos that were created for the sprints:
230
230
- Andreas Mueller: [Crash Course in Contributing to scikit-learn](https://youtu.be/5OL8XoMMOfA)
231
231
- Reshama Shaikh: [Example of scikit-learn Pull Request](https://youtu.be/PU1WyDPGePI)
232
232
- Andreas Mueller: [Sprint FAQs](https://youtu.be/p_2Uw2BxdhA)
@@ -246,7 +246,7 @@ This is the [Contributing to scikit-learn](https://www.youtube.com/playlist?list
246
246
<spanstyle="background-color: #CAE9F5;">
247
247
One of the primary goals of the Community sprints was to onboard new contributors who would become recurring contributors. This goal has generally not been realized. scikit-learn is a complex and advanced project, and a one-time sprint does not provide sufficient opportunity and support to sprint participants to become recurring contributors.</span> A few sprint participants have progressed to become returning contributors, and it is a very small number relative to the number of sprint participants.
248
248
249
-
Onboarding a first-time contributor takes time. People who are contributing for the first time need to go through a lot of information simultaneously regarding both technical and organizational aspects of contributions. People may run into unexpected issues at the really start depending on their
249
+
Onboarding a first-time contributor takes time. People who are contributing for the first time need to go through a lot of information simultaneously regarding both technical and organizational aspects of contributions. People may run into unexpected issues at the start depending on their
250
250
setup and experience, might get frustrated and or discouraged and might not
251
251
report the problem they are having (thinking it is their fault). Pre-event office hours have been successful at alleviating some of these roadblocks, for those sprint participants who have completed their pre-work.
252
252
@@ -259,7 +259,7 @@ Here are some adjustments that can be made in the future to reach the goal of re
259
259
- Have smaller sprint events
260
260
261
261
**Mentoring**
262
-
Sprints may not be sufficient for onboarding people. Mentoring is needed to take to the next level. Mentoring relationships can be established during sprint events.
262
+
Sprints may not be sufficient for onboarding people. Mentoring is needed to take to the next level, and mentoring relationships can be established during sprint events.
263
263
264
264
**Improve the onboarding process**
265
265
@@ -332,8 +332,8 @@ A comparison of the contributor base to other related libraries in the same spac
332
332
333
333
## References
334
334
335
-
-[Interview with Maren Westermann: Extending the Impact of the scikit-learn Sprints to the Community](https://blog.dataumbrella.org/mwestermann-sprints-experience)
336
-
-[Interview with scikit-learn Triage Team Member: Juan Martín Loyola](https://blog.dataumbrella.org/jmloyola-opensource-experience)
335
+
-[Behind the Scenes: What It Takes to Run Data Umbrella’s scikit-learn Open Source Sprints](https://eventfund.codeforscience.org/behind-the-scenes-what-it-takes-to-run-data-umbrellas-scikit-learn-open-source-sprints/)
337
336
- Data Umbrella [sprint reports](https://blog.dataumbrella.org/tags/#sprint-report)
338
337
- Data Umbrella community [sprint blogs](https://blog.dataumbrella.org/tags/#sprint-blog)
339
-
338
+
-[Interview with Maren Westermann: Extending the Impact of the scikit-learn Sprints to the Community](https://blog.dataumbrella.org/mwestermann-sprints-experience)
339
+
-[Interview with scikit-learn Triage Team Member: Juan Martín Loyola](https://blog.dataumbrella.org/jmloyola-opensource-experience)
0 commit comments