Skip to content

Commit 7bc9335

Browse files
committed
Merge branch 'main' into fix-ci-again
2 parents ad047ef + 25ac577 commit 7bc9335

13 files changed

+778
-15
lines changed

.github/workflows/build-site.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,4 +38,5 @@ jobs:
3838
with:
3939
directory: '_site'
4040
arguments: |
41-
--ignore-urls "https://fonts.googleapis.com,https://fonts.gstatic.com"
41+
--ignore-urls "https://fonts.googleapis.com,https://fonts.gstatic.com,_site/_posts/README/index.html"
42+
--ignore-files "/.+\/_posts\/README.md"

LICENSE

Lines changed: 438 additions & 0 deletions
Large diffs are not rendered by default.

_data/packages.yml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -46,10 +46,10 @@
4646
description: 'Physcraper: a Python package for continually updated phylogenetic trees using the Open Tree of Life'
4747
maintainer: ["Luna Luisa Sánchez-Reyes", "Martha Kandziora", "Emily Jane McTavish"]
4848
link: "https://github.com/McTavishLab/physcraper"
49-
date-accepted:
49+
date-accepted: 2021-09-14
5050
highlight:
51-
docs-url:
52-
citation-link:
51+
docs-url: "https://physcraper.readthedocs.io/en/main/"
52+
citation-link: "https://zenodo.org/badge/latestdoi/41294748"
5353
- package-name: pyrolite
5454
description: 'Tools for getting the most from your geochemical data'
5555
maintainer: ["Morgan Williams"]

_includes/archive-cards.html

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
<!-- This is a layout for 3 cards - it is a flex box -->
2+
<!-- -->
3+
<div class="feature__item">
4+
<a href="{{ post.url}}">
5+
<img src="{{ post.feature_img }}" alt="{{ post.feature_alt }}">
6+
<div class="card">
7+
8+
<div class="archive__item">
9+
<div class="archive__item-body">
10+
<h3 class="card-title">{{ post.title | markdownify | strip_html | truncate: 60 }}</h3>
11+
<div class="card-excerpt">
12+
<p>{{ post.excerpt | truncate: 100}}</p>
13+
</div>
14+
</div>
15+
</div>
16+
</a>
17+
</div>
18+
</div>
19+
<!-- -->
20+

_layouts/posts_gallery.html

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
---
2+
layout: archive
3+
---
4+
5+
{{ content }}
6+
7+
8+
<ul class="taxonomy__index">
9+
{% assign postsInYear = site.posts | where_exp: "item", "item.hidden != true" | group_by_exp: 'post', 'post.date | date: "%Y"' %}
10+
{% for year in postsInYear %}
11+
<li>
12+
<a href="#{{ year.name }}">
13+
<strong>{{ year.name }}</strong> <span class="taxonomy__count">{{ year.items | size }}</span>
14+
</a>
15+
</li>
16+
{% endfor %}
17+
</ul>
18+
19+
{% assign entries_layout = page.entries_layout | default: 'list' %}
20+
{% assign postsByYear = site.posts | where_exp: "item", "item.hidden != true" | group_by_exp: 'post', 'post.date | date: "%Y"' %}
21+
{% for year in postsByYear %}
22+
<section id="{{ year.name }}" class="taxonomy__section">
23+
<h2 class="archive__subtitle">{{ year.name }}</h2>
24+
<div class="entries-{{ entries_layout }}">
25+
{% for post in year.items %}
26+
{% comment %}{% include archive-single.html type=entries_layout %}{% endcomment %}
27+
{% include archive-cards.html %}
28+
{% endfor %}
29+
</div>
30+
<a href="#page-title" class="back-to-top">{{ site.data.ui-text[site.locale].back_to_top | default: 'Back to Top' }} &uarr;</a>
31+
</section>
32+
{% endfor %}

_pages/blog.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
---
2-
layout: archive
2+
layout: posts_gallery
33
permalink: /blog/
4+
classes: wide
45
title: "pyOpenSci Blog"
56
excerpt: "Here we will both post updates about pyOpenSci and also highlight contributors. We will also highlight new packages that have been reviewed and accepted into the pyOpenSci ecosystem."
67
header:
@@ -10,7 +11,7 @@ author_profile: true
1011
---
1112

1213
## Recent pyOpenSci Posts!
13-
14+
<!--
1415
{% comment %}
1516
{% include base_path %}
1617
{% include group-by-array collection=site.posts field="categories" %}
@@ -28,4 +29,4 @@ author_profile: true
2829
2930
{% for post in site.posts %}
3031
{% include archive-single.html %}
31-
{% endfor %}
32+
{% endfor %} -->

_posts/2019-11-18-pandera-dataframe-validation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ users make statistical assertions about pandas data structures.
3232
## A Statistical Data Validation Toolkit for Pandas
3333

3434
<img src="https://github.com/unionai-oss/pandera/tree/master/docs/source/_static/pandera-logo.png"
35-
width="250px">
35+
width="250px" alt="Image showing pandera package logo.">
3636

3737
To illustrate `pandera`'s capabilities let's use a small toy example. Suppose
3838
you're analyzing data for some insights in the context of a mission-critical

_posts/2019-12-03-agu-2019-pyopensci-events.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ Leonardo Uieda and Lindsey Heagy.
2929

3030
## pyOpenSci Town Hall: Wed Dec 11, 2019 12:30-1:30pm in Moscone West - 2002
3131

32-
Please join us for a <a href="https://www.agu.org/Fall-Meeting/Events/Data-TH33F" target="_blank">town hall dedicated to pyOpenSci: Data FAIR: pyOpenSci: Building a Community Around Open Source Python Software for Science</a>.
32+
Please join us for a town hall dedicated to pyOpenSci: Data FAIR: pyOpenSci: Building a Community Around Open Source Python Software for Science.
3333
We expect a strong presence from the rOpenSci community at this event as well!
3434
pyOpenSci is being modeled after rOpenSci, but focused on the Python
3535
programming language!
@@ -38,9 +38,7 @@ programming language!
3838
### AGU Event Resources - Sign Up To Be A Reviewer & Townhall Presentation
3939

4040
Here are a few resources shared at the town hall:
41-
1. The <a href="https://docs.google.com/presentation/d/14UN-AD8p2S8q_gUX1SwvSjT7SCVBGID9kQ74_ZexS9s/edit?usp=sharing" target="_blank">pyopensci ignite talk </a>from the open source ignite session
42-
2. <a href="https://forms.gle/wvwLaLQre58YLHpD6" target="_blank">Sign up to stay involved with pyopensci using this google form. </a>
43-
</div>
41+
1. The <a href="https://docs.google.com/presentation/d/1emWah0WC9Or5uSH5IYyf0FfiPinLgqCr/edit?usp=sharing&ouid=112367566823345023071&rtpof=true&sd=true" target="_blank">pyopensci ignite talk </a>from the open source ignite session
4442

4543
### Signing Off For Now
4644
I look forward to meeting you at AGU 2019!! Connect with my on twitter if you have questions at AGU! <a href="https://twitter.com/leahawasser" target="_blank">@leahawasser</a>. I will be tweeting during the event!
Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
---
2+
layout: single
3+
title: "Why should Python open source package health matter to scientists? (and to you!)"
4+
excerpt: "Free and open source software tools are the foundation for thousands if not millions of scientific workflows. Yet, it is rare that users fully understand it's importance in moving science forward. Here, I discuss the value of free and open source software for science; why you as a scientist should care; and what pyOpenSci is doing to try to support Python scientific tools for science. "
5+
author: "Leah Wasser"
6+
permalink: /blog/why-python-open-source-software-matters-for-scientists
7+
header:
8+
overlay_color: "#666"
9+
overlay_filter: 0.6
10+
categories:
11+
- blog-post
12+
- highlight
13+
- python-packaging
14+
- peer-review
15+
toc: false
16+
---
17+
18+
## Why Python open source package health should matter to you as a scientist
19+
20+
If you are a scientist, the health of a scientific python package may not be something that
21+
you care about. What might seem more important is doing your science, and processing
22+
your data to get to that coveted scientific inquiry and exploration stage.
23+
24+
In actuality, package health is incredibly important to science, especially open,
25+
data-intensive science. It should be important to you too!
26+
27+
Why? Let me provide a few reasons below:
28+
29+
#### 1. Free and open source tools (FOSS) make your code simpler.
30+
31+
Free and open source tools provide commonly needed functionality wrapped up in simple tested functions and objects that you don't need to recreate yourself.
32+
33+
If you are creating open workflows to process your data, you are likely using free-to-download (and use) tools - software. These tools make it easier for you to access, open, process and visualize your data. These free and open tools allow
34+
allow you to write less (complicated) code to process your data. Code that
35+
someone else (a package maintainer) maintains (often in their spare time but we'll
36+
get to that in another blog post!).
37+
38+
#### 2. Open source software provides centralized maintenance of commonly used workflows
39+
40+
Imagine 1,000 scientists accessing climate data. They all need to download the
41+
data and plot it. However they made download different types of climate data,
42+
different models, different variables. The base code that all 1,000 scientists
43+
write to download and open the data is the same. Similarly the base code for
44+
plotting is also the same.
45+
46+
Isn't it better that one person writes great code and
47+
updates it as things like the download API change? Or they update plotting
48+
functionality?
49+
50+
* This central maintenance avoids you and many other people needing to write code that
51+
does the same thing. It makes it easier for you to process your climate data and get
52+
to the science. Your code is simpler.
53+
54+
* This centralization of tools that perform tasks that many
55+
people need to do avoids the problem of 1,000 people trying to create the same workflow and creating different and potentially problematic code.
56+
57+
* This avoids everyone reinventing the scientific wheel.
58+
59+
#### 3. Free and open source software reduces the barrier of needing a paid license to build upon your work.
60+
61+
Free and open source software removes the barrier a paid license to run your code. This makes your work more accessible.
62+
63+
There is a lot more to say about the value of open source here but i'll leave
64+
that to another blog post.
65+
66+
<figure>
67+
<a href="/images/foss-free-open-source-software/why-scientists-should-care-open-source.png">
68+
<img src="/images/foss-free-open-source-software/why-scientists-should-care-open-source.png" style="max-width:70%" alt="Image with a blue computer keyboard background with the text: why scientists should care about open source software on top.">
69+
</a>
70+
<figcaption>If you are a scientist using open science approaches, then your workflow likely depends upon open source tools. These tools are critical to our work and yet are often not supported or consider
71+
as integral components of open science.
72+
</figcaption>
73+
</figure>
74+
75+
## Creating and maintaining open source software is hard work
76+
77+
Creating these open source tools to work with data is not a trivial task. Often
78+
the people (who may be developers or scientists) who develop the tools:
79+
80+
* Aren't paid to do the work
81+
* Burn out from all of the effort associated with supporting the tools
82+
* Aren't acknowledged for their effort
83+
* Have to deal with users who are frustrated by bugs, but aren't able to communicate that frustration kindly or effectively to help the maintainer get it fixed while also acknowledging their effort (which again is often volunteer)
84+
85+
Maintainers also get new jobs, and need to step away from maintaining that tool.
86+
87+
All of the above causes a challenge where once-maintained tools are now left
88+
unmaintained and vulnerable to new bugs as other packages that tool depends on
89+
are updated, or as the Python language itself is updated.
90+
91+
### Package usability is also important but not always considered by maintainers
92+
93+
Not all developers focus on usability when designing a new tool. Some do.
94+
Maintainers often start,
95+
as expected, with trying to get a job done. Documenting a package well enough
96+
so that a beginner can get started with using it, is not always an immediate goal.
97+
98+
Yet usability is critical to developing a user base. To making a tool accessible
99+
to more people which could in turn help grow community around it that
100+
supports that tool.
101+
102+
This can be frustrating for scientists who are trying to find the right tool
103+
to use to support their analysis. And as such another area that we definitely
104+
want to consider when building pyOpenSci.
105+
106+
107+
## Maintainers do a lot of hard work and rarely get credit for it
108+
109+
Before I say anything else here:
110+
111+
> Please - cite software in your work if you use it! And also if you need to report a bug, please do so in a kind and thoughtful way!
112+
113+
Maintainers work hard on their packages. One package may be the foundation for
114+
data processing and analysis across hundreds to thousands (or more) scientific
115+
papers. But unlike scientific papers, work on the package continues long after a
116+
paper is published. A package is a living thing that needs continual work and love.
117+
118+
So, what happens to that package that you are using in your workflow,
119+
when the developer gets a new job or finds they no longer have the time to
120+
maintain it? What happens when you need to update your workflow to support a paper review OR when you want to
121+
build upon it for another analysis if that tool no longer
122+
is maintained?
123+
124+
<figure>
125+
<a href="/images/foss-free-open-source-software/orphan-python-open-source-packages.png">
126+
<img src="/images/foss-free-open-source-software/orphan-python-open-source-packages.png" style="max-width:70%" alt="Image showing girl crying with text orphan python packages are breaking my open workflows.">
127+
</a>
128+
<figcaption>Your workflow likely depends upon tools that are being developed
129+
by volunteers. Supporting these tools is critical to supporting open science. And
130+
open science is critical to accelerating scientific discovery Source: Meme created by yours truly. :) </figcaption>
131+
</figure>
132+
133+
This, my friend, is why you should care about, and support, open source software!
134+
135+
## pyOpenSci is designed to create diverse community for and to support the open source Python tools that you as a scientist are using in your workflows.
136+
137+
The issues discussed above around maintenance, usability and quality of software,
138+
are the types of issues that pyOpenSci will address.
139+
140+
<figure>
141+
<a href="/images/foss-free-open-source-software/xkcd-open-source-dependency.png">
142+
<img src="/images/foss-free-open-source-software/xkcd-open-source-dependency.png" style="max-width:70%" alt="Image showing xkcd comic with a robotic type of image representing a scientific workflow and pointing to the small open source package maintained by one person that is a major dependency of the workflow.">
143+
</a>
144+
<figcaption>Critical scientific workflows and projects often have dependencies
145+
that are maintained by volunteers. Source: XKCD </figcaption>
146+
</figure>
147+
148+
But we (pyOpenSci) need to track package use, and maintenance, collect data and
149+
quantify outcomes to determine if we are making the impact that we want to. To
150+
determine if we are truly helping you, as a scientist in selecting the tool that
151+
will help you in your workflow and also be maintained over time and documented
152+
enough that you can get started using it quickly.
153+
154+
We also want to ensure that we are supporting maintainers as well. To help them
155+
with the hard job of showing up each day to maintain a package that maybe hundreds
156+
to thousands of scientists are using.
157+
158+
### pyOpenSci needs to collect data around metrics to track all of these issues
159+
160+
pyOpenSci needs to do more than just open peer review of scientific Python
161+
packages. We need to collect data to better understand the issues and how we
162+
impact those issues over time.
163+
164+
A few package-related goals of pyOpenSci include:
165+
166+
* Ensure that package quality is better after the review than before
167+
* Inspire maintainers to develop more robust package infrastructure including testing
168+
* Improve the usability of packages through documentation and vignettes (short tutorials showing users how to get started with the package)
169+
* Ensure that packages are maintained over time; If they aren't maintained, ensure that they are archived or sunsetted in a way that users know they are no longer maintained. (no more dark orphan repositories!)
170+
171+
172+
To make sure we reach our goals, we have to collect metrics on packages
173+
submitted to our open peer review process to track quality and health over time.
174+
And hopefully, through our review process and support of
175+
maintainers, we will help to improve the overall quality of packages being created
176+
to support scientific workflows.
177+
178+
> We want to help the community.
179+
180+
181+
## How pyOpenSci hopes to improve the usability and quality of smaller open source software packages that support science
182+
183+
These, my friend are lofty goals. But our mission is to help
184+
scientists build better software. And to ensure that the community understands the
185+
maintenance level of that software before they adopt using it.
186+
187+
We also want scientists to understand how hard maintainers work to create the
188+
tools that they use. And to cite that work if they use the tools in the same
189+
way they might cite a peer reviewed article. But that is another blog to be written.
190+
191+
So how do we track open source tool health (for science)?
192+
193+
### Peer review is actually the second step in our process.
194+
195+
We won't begin to review a package [without bare minimum checks](https://www.pyopensci.org/contributing-guide/open-source-software-submissions/author-guide.html#pyopensci-review-guide-for-python-open-source-package-authors).
196+
We hope that these bare minimal checks help maintainers as they try to decide
197+
what is good enough infrastructure for their package.
198+
199+
We hope that these checks will also help new maintainers that are creating
200+
new packages even if they never submit their package to us for peer review.
201+
202+
## Goals for package metrics
203+
204+
These metrics will help us quantify several of our goals:
205+
206+
We hope that:
207+
208+
* Peer review improves Python package structure and usability.
209+
* Peer review in some way supports maintenance and/or responsible archiving when a package comes to life-end.
210+
* Over time, the package is improved and maintained with possible contributions for those other than the maintainer.
211+
212+
We need metrics to understand things like
213+
214+
* Community adoption of the package (are scientists using it?)
215+
* Maintenance level of the package (are maintainers still working on it and fixing bugs?)
216+
* Infrastructure (are tests setup to help identify if contributions break things? )
217+
* Usability (is the package documented in a way that helps users quickly get started)
218+
219+
### A discussion about package health on twitter
220+
221+
A few weeks ago, I posted on twitter to see what the community
222+
thought about "*what constitutes package health*".
223+
224+
<blockquote class="twitter-tweet" data-partner="tweetdeck"><p lang="en" dir="ltr">controversial topic: How do we measure the &quot;health&quot; of a <a href="https://twitter.com/hashtag/science?src=hash&amp;ref_src=twsrc%5Etfw">#science</a> <a href="https://twitter.com/hashtag/python?src=hash&amp;ref_src=twsrc%5Etfw">#python</a> package? GitHub stars? downloads, date of latest commit? # of commits a month / quarter? Spread of commits? Thoughts? <a href="https://twitter.com/hashtag/opensource?src=hash&amp;ref_src=twsrc%5Etfw">#opensource</a> <a href="https://twitter.com/hashtag/OpenScience?src=hash&amp;ref_src=twsrc%5Etfw">#OpenScience</a> <a href="https://twitter.com/pyOpenSci?ref_src=twsrc%5Etfw">@pyOpenSci</a></p>&mdash; Leah Wasser 🦉 (@LeahAWasser) <a href="https://twitter.com/LeahAWasser/status/1577730887818498049?ref_src=twsrc%5Etfw">October 5, 2022</a></blockquote>
225+
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
226+
227+
The twitter convo made me realize that there are
228+
many different perspectives that we can consider when addressing this question.
229+
230+
More specifically, pyOpenSci is interested in the health of packages that
231+
support science. So we may need to build upon already existing
232+
efforts that determine metrics and customize them to our needs.
233+
234+
235+
236+
[In the next post, I will recap that convo on twitter.](/blog/what-makes-a-python-package-healthy)
237+
238+
## Feedback? Leave it below
239+
240+
If you have any thoughts on pyOpenSci metrics and goals or questions, please
241+
leave them in the comments below!

0 commit comments

Comments
 (0)