Skip to content

Commit 52dcc13

Browse files
authored
Merge pull request #123 from reshamas/main
incorporate feedback from reviewers
2 parents b99ff92 + ebd577a commit 52dcc13

File tree

1 file changed

+40
-10
lines changed

1 file changed

+40
-10
lines changed

_posts/2022-05-18-sprints-value.md

Lines changed: 40 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ postauthors:
2020
{% include postauthor.html %}
2121
</div>
2222

23-
With contributions from: Gaël Varoquaux, Andreas Mueller, Olivier Grisel, Julien Jerphanion, Guillaume Lemaitre
23+
With contributions from: Gaël Varoquaux, Andreas Mueller, Olivier Grisel, Julien Jerphanion, Guillaume LeMaitre
2424

2525
## Top Line Summary
2626

@@ -30,7 +30,7 @@ Sprints are **working sessions to contribute to an open source library**. The go
3030

3131
The [scikit-learn](https://scikit-learn.org/dev/index.html) project has a long and extraordinary legacy of open source sprints. Since 2010, when its [first public version](https://en.wikipedia.org/wiki/Scikit-learn) was released, there have been as many as [45 sprints organized](https://blog.scikit-learn.org/sprints/). The 45 number is a lower bound, since there are likely more sprints that have not been listed.
3232

33-
To date, [scikit-learn](https://github.com/scikit-learn/scikit-learn) has **over 2300** contributors to the library. The number of contributors to scikit-learn exceeds those of other related libraries such as numpy, scipy and matplotlib, with the exception of the [pandas](https://github.com/pandas-dev/pandas), which has a greater number of contributors (See Appendix A).
33+
To date, more than 2300 people have contributed to [scikit-learn](https://github.com/scikit-learn/scikit-learn). The number of contributors to scikit-learn exceeds those of other related libraries such as numpy, scipy and matplotlib, with the exception of the [pandas](https://github.com/pandas-dev/pandas), which has a greater number of contributors (See Appendix A).
3434

3535
The public discourse on open source has expanded to explore topics of sustainability, funding models, and diversity and inclusion, to name a few. A *reasonable*, yet *”difficult to answer”* question that has been posed is:
3636
>*<span style="background-color: #CAE9F5;">
@@ -39,7 +39,7 @@ What is the effectiveness of sprint models and what is the long-term engagement
3939

4040

4141

42-
Due to technological limitations of GitHub, we do not hold precise data on how many scikit-learn contributors connected to the project via a sprint. We have no formal data collection process which records statistics on how many sprint participants are recurring or information on their contributions to other open source projects or other long term positive ripple effects. A scientific look at the correlation between the number of sprints and contributors is beyond the scope of this article. What we *will examine* in this article are the **objectives, results and aspirations** of running the scikit-learn open source sprints.
42+
Due to technological limitations of GitHub and privacy concerns, we do not hold precise data on how many scikit-learn contributors connected to the project via a sprint. We have no formal data collection process which records statistics on how many sprint participants are recurring or information on their contributions to other open source projects or other long term positive ripple effects. A scientific look at the correlation between the number of sprints and contributors is beyond the scope of this article. What we *will examine* in this article are the **objectives, results and aspirations** of running the scikit-learn sprints.
4343

4444
The queries from other open-source projects requesting guidance on sprints and diversity and inclusions have been increasing. We share these experiences and lessons learned with the community, potential funders and open source project maintainers, particularly those projects which are nascent in their quest to build community, sustainability and diversity and inclusion.
4545

@@ -52,9 +52,9 @@ In this article we examine the following:
5252
- What value do open source sprints bring to the project and community?
5353
- What are the aspirations of the scikit-learn project, in terms of connecting with the community?
5454

55-
## Definition of Sprint
55+
## Definition of a scikit-learn Sprint
5656

57-
A sprint has traditionally been an event where contributors come together to work on issues in the scikit-learn repository. A sprint can be as short as a few hours, or last over several days, even a week or longer. They may be in-person, online, hybrid or asynchronous. Sprints may be organized by the developers of the library, community groups (such as Meetups), scheduled alongside scientific or Python conferences, or even at home with a few friends. They can more simply and less dauntingly be described as
57+
A scikit-learn sprint has traditionally been an event where contributors come together to work on issues in the scikit-learn repository. A sprint can be as short as a few hours, or last over several days, even a week or longer. They may be in-person, online, hybrid or partially asynchronous. Sprints may be organized by the developers of the library, community groups (such as Meetups), scheduled alongside scientific or Python conferences, or even at home with a few friends. They can more simply and less dauntingly be described as
5858
<span style="background-color: #CAE9F5;">
5959
working sessions to contribute to the open source library.
6060
</span>
@@ -67,15 +67,17 @@ We distinguish between a Developer (Dev) and Community sprint because the goals
6767

6868
A Developer, or “dev”, sprint is one that is typically organized by the maintainers of the library. A dev sprint is one where the developers or maintainers of the library gather to work on issues and to discuss the resolution of ongoing complex issues. This also provides the team an opportunity to focus on tasks related to the long-term roadmap of the project.
6969

70-
For scikit-learn, the early sprints were alongside the [SciPy conferences](https://conference.scipy.org) and the practice has continued for over a decade.
70+
For scikit-learn, the early Community sprints were alongside the [SciPy conferences](https://conference.scipy.org) and the practice has continued for over a decade.
71+
72+
The first early Dev sprints were organized at Inria. The first [major Dev sprint](https://github.com/scikit-learn/scikit-learn/wiki/Past-sprints#granada-19th-21th-dec-2011) was held in Granada after the NIPS 2011 conference (now renamed NeurIPS). It was the first time that most of the team had met in real life after months or years of online collaboration, and over a dozen developers participated. Later, Dev sprints were often hosted in the offices of partnering tech companies, typically from 3 to 7 days, once a year, in pre-COVID times.
7173

7274
**Community Sprint**
7375

7476
A Community sprint can be a collaboration by individuals, by affinity communities such as Meetup Groups (Data Umbrella, PyLadies, etc.), by conferences (SciPy, PyData Global, JupyterCon, etc.). A Community sprint is one that is with the general public and it may be beginners, experts, or a combination of both.
7577

7678
At a Developer sprint, a contributor may work on a PR that has been ongoing for three months. Conversely, Community sprints require curated issues which newcomers can complete in a shorter period of time (such as 1 day, or 1 day with 1-2 months follow-up).
7779

78-
The landscape of community sprints with other [scientific python](https://scientific-python.org/calendars/) libraries is unknown. It is possible that scikit-learn may have had community sprints earlier than other projects.
80+
The landscape of community sprints with other [scientific python](https://scientific-python.org/calendars/) libraries is unknown.
7981

8082
## Goals of the Sprints
8183

@@ -94,7 +96,7 @@ The landscape of community sprints with other [scientific python](https://scient
9496
- To onboard new contributors to scikit-learn and PyData generally
9597
- To onboard new contributors who would become recurring contributors
9698
- To collaborate with community groups to increase diversity of contributor base with intentional outreach
97-
- To increase the number of recurring contributors
99+
- To strengthen and support existing contributors in order to maintain recurring community contributors
98100

99101
## scikit-learn Team Members Who Connected to the Project Via a Sprint
100102

@@ -135,7 +137,8 @@ In her PyConDE PyData Berlin keynote from April 2022, [5 Years, 10 Sprints, a s
135137
[Juan Martín Loyola](https://github.com/jmloyola) started [contributing to scikit-learn](https://blog.scikit-learn.org/team/jml-interview/) as preparation for the [Data Umbrella Latin America, June 2021](https://blog.dataumbrella.org/data-umbrella-afme2-2021-scikit-learn-sprint-report ) sprint. He continued to contribute prolifically after the sprint, and he was invited to join the team in December 2021. Given his location in Argentina, he will be providing support at the [2022 SciPy Latin America](https://www.scipy.lat/es/scipycon.html) sprint.
136138

137139
### Second Degree Impact
138-
[Lauren Burke](https://github.com/laurburke) joined the scikit-learn Communications Team in November 2021 at the recommendation of Reshama Shaikh, and this can be considered a network effect.
140+
[Lauren Burke](https://github.com/laurburke) joined the scikit-learn Communications Team in November 2021 at the recommendation of Reshama Shaikh, and this can be considered a network effect. This demonstrates that sprints can result in valuable contributions other than code.
141+
139142

140143
## Sprints: Observed Impact and Lessons Learned
141144

@@ -151,7 +154,7 @@ Sprint participants, whether one-time or recurring, become ambassadors for the p
151154

152155
**Open source workflow knowledge**
153156

154-
Users learn about testing, control version system (i.e. git), documentation which they bring to their work. The sprint experience assists contributors in developing a [wider set of technical skills](https://academiccommons.columbia.edu/doi/10.7916/D89G70BS) that can be shared across projects, networking, on to jobs and more.
157+
Users learn a range of tools such as: virtual environment setup, version control systems (i.e. Git), testing (flake8, pytest, continuous integration) and unit tests. They also learn software development best practices. For many users of scikit-learn, the sprint is the first time they navigate through the codebase and structure of scikit-learn, dig into functions and learn about errors. They develop experience in collaborative open source workflow. For employers, letting their team contribute to open-source might be a plus as they learn how to collaborate properly and learn about the internals of the library. The sprint experience assists contributors in developing a [wider set of technical skills](https://academiccommons.columbia.edu/doi/10.7916/D89G70BS) that can be shared across projects, networking, on to jobs and more.
155158

156159

157160
**Overcoming barriers to entry**
@@ -291,6 +294,33 @@ get some time but I currently have limited of it).
291294
>Finally, I would also really treasure having in-person sprints [in Paris] with external (recurring)
292295
contributors (with a specific expertise) on advanced subjects when it is possible in the future.
293296

297+
## Conclusion
298+
299+
### Connecting and Supporting scikit-learn
300+
301+
To connect with the scikit-learn project, these are the most active social media platforms:
302+
- Twitter: [@scikit_learn](https://twitter.com/scikit_learn)
303+
- LinkedIn: [@scikit-learn](https://www.linkedin.com/company/scikit-learn/)
304+
305+
It is most welcome for users to “star” the code repository on GitHub: [scikit-learn/scikit-learn](https://github.com/scikit-learn/scikit-learn)
306+
307+
Our office hours, in addition to public developers and triage meetings are all posted on our [Community Calendar](https://blog.scikit-learn.org/calendar/).
308+
309+
The next Community sprint may be held at [EuroScipy 2022](https://www.euroscipy.org/2022/index.html) in Basel Switzerland in early September. Information on past and [upcoming sprints](https://blog.scikit-learn.org/sprints/) are shared on our community site.
310+
311+
312+
### Contributing to scikit-learn
313+
314+
To contribute to scikit-learn, we have resources available here:
315+
- [English](https://scikit-learn.org/dev/developers/contributing.html)
316+
- [Spanish](https://qu4nt.github.io/sklearn-doc-es/)
317+
318+
There are additional resources for contributing:
319+
- [Contributing Videos](https://www.youtube.com/playlist?list=PLM-1QqX7UksT6tREbR-n9Mhup0OoRBU34)
320+
- [English, Spanish and some Portuguese language transcripts](https://github.com/data-umbrella/data-umbrella-scikit-learn-sprint)
321+
322+
323+
294324
## Appendix A: GitHub Contributors Comparison of Libraries
295325

296326
A comparison of the contributor base to other related libraries in the same space (May 2022):

0 commit comments

Comments
 (0)