-
Notifications
You must be signed in to change notification settings - Fork 75
enh(blog): Add blog post on generative AI peer review policy #734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 3 commits
65773ae
7149bc7
0912bd5
0a11648
41cd2d0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,139 @@ | ||||||||||||||||||
--- | ||||||||||||||||||
layout: single | ||||||||||||||||||
title: "Navigating LLMs in Open Source: pyOpenSci's New Peer Review Policy" | ||||||||||||||||||
excerpt: "Generative AI products are reducing the effort and skill necessary to generate large amounts of code, which in some cases is causing a strain on volunteer peer review programs like ours. Learn about pyOpenSci's policy on generative AI in peer review in this blog post." | ||||||||||||||||||
author: "pyopensci" | ||||||||||||||||||
permalink: /blog/generative-ai-peer-review-policy.html | ||||||||||||||||||
header: | ||||||||||||||||||
overlay_image: images/headers/pyopensci-floral.png | ||||||||||||||||||
categories: | ||||||||||||||||||
- blog-post | ||||||||||||||||||
- community | ||||||||||||||||||
classes: wide | ||||||||||||||||||
toc: true | ||||||||||||||||||
comments: true | ||||||||||||||||||
last_modified: 2025-09-16 | ||||||||||||||||||
--- | ||||||||||||||||||
|
||||||||||||||||||
authors: Leah Wasser, Mandy Moore, | ||||||||||||||||||
|
||||||||||||||||||
## Generative AI meets scientific open source | ||||||||||||||||||
|
||||||||||||||||||
It has been suggested that for some developers, using AI tools for tasks can increase efficiency by as much as 55%. But in open source scientific software, speed isn't everything—transparency, quality, and community trust matter just as much. So do the ethical questions these tools raise. | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
|
||||||||||||||||||
**Edit this.** Whatever breakout content we want here.... needs to be all on a single line. | ||||||||||||||||||
{: .notice--success} | ||||||||||||||||||
|
||||||||||||||||||
|
||||||||||||||||||
## Why we need guidelines | ||||||||||||||||||
|
||||||||||||||||||
At [pyOpenSci](https://www.pyopensci.org/), we’ve drafted a new policy for our peer review process to set clear expectations around disclosing use of LLMs in scientific software packages. | ||||||||||||||||||
|
||||||||||||||||||
This is not about banning AI tools. We recognize their value to some. Instead, our goal is transparency. We want maintainers to **disclose when and how they’ve used LLMs** so editors and reviewers can fairly and efficiently evaluate submissions. | ||||||||||||||||||
|
||||||||||||||||||
## Our Approach: Transparency and Disclosure | ||||||||||||||||||
|
||||||||||||||||||
We know that people will continue to use LLMs. We also know they can meaningfully increase productivity and lower barriers to contribution for some. We also know that there are significant ethical, societal and other challenges that come with the development and use of LLM’s. | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Conjectures about the future depend greatly on legal outcomes and how society processes this moment. I would not say it's inevitable, but perhaps that pyOpenSci's policy will not on its own change the behavior of the community, especially those who aren't thinking about pyOpenSci. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure - I was thinking about right now. People who use LLMs are likely to continue using them for the time being. My goal is really to avoid shaming people. I want to avoid divides Jed, but I also want us to be able to talk openly and thoughtfully disagree if we need to disagree! if that makes sense. I wonder if we could modify the language. ths idea is that we don't expect to change how people are working with our policy. We are seeking transparency (from a program operation perspective). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I don't know if it's really true that pyOpenSci does not have the capability to influence decisions, in which case it could be worded that pyOpenSci advisors recommend against taking a position about whether LLM-generated content can be allowed in submissions. Or even a stronger stance that although pyOpenSci finds many uses of LLMs to run contrary to our community values, we believe that attempting to enforce a ban would incentivize dishonestly and erode community trust so severely that it outweighs epistemic benefits of such a policy, and therefore we seek only informed consent. |
||||||||||||||||||
|
||||||||||||||||||
Our community’s expectation is simple: **be open about and disclose any generative AI use in your package**. | ||||||||||||||||||
|
||||||||||||||||||
* Disclose LLM use in your README and at the top of relevant modules. | ||||||||||||||||||
* Describe how the tools were used | ||||||||||||||||||
* Be clear about what human review you performed. | ||||||||||||||||||
|
||||||||||||||||||
Transparency helps reviewers understand context, trace decisions, and focus their time where it matters most. | ||||||||||||||||||
|
||||||||||||||||||
### Human oversight | ||||||||||||||||||
|
||||||||||||||||||
LLM-assisted code must be **reviewed, edited, and tested by humans** before submission. | ||||||||||||||||||
|
||||||||||||||||||
* Run tests and confirm correctness. | ||||||||||||||||||
* Check for security and quality issues. | ||||||||||||||||||
* Ensure style, readability, and clear docstrings. | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ensure style, readability, clear, and concise docstrings. Depending on the AI tool, generated docstrings can sometimes be overly verbose without adding meaningful understanding. |
||||||||||||||||||
* Explain your review process in your software submission to pyOpenSci. | ||||||||||||||||||
|
||||||||||||||||||
Please don’t offload vetting to volunteer reviewers. Arrive with human-reviewed code that you understand, have tested, and can maintain. | ||||||||||||||||||
|
||||||||||||||||||
### Licensing awareness | ||||||||||||||||||
|
||||||||||||||||||
LLMs may be trained on mixed-license corpora. Outputs can create **license compatibility questions**, especially when your package uses a permissive license (MIT/BSD-3). | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. LLM output does not comply with the license of the input package, even when the input is permissively licensed (MIT, CC-BY), because it fails to comply with the attribution requirement of the license. The license of the package incorporating LLM output does not matter. License compatibility only matters after an egregious violation is discovered: if the licenses are compatible, one could become compliant merely by adding attribution. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @jedbrown is this better? to have an example? we don't need to get into the weeds of licensing for this post but i do want to get the big picture correct. I also thought MIT didn't require attribution (my understanding might be totally wrong!), so I provided an example using BSD-3. Lemme know what you think as you seem to have deep expertise on licenses! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. MIT also requires attribution (emphasis added)
The point I was trying to make was that even if all sources for a given output have the same license, it's still a license violation because the copyright notice (attribution) is not preserved. Here's a crack at a more comprehensive description of the situation and possible remedies.
Suggested change
|
||||||||||||||||||
|
||||||||||||||||||
* Acknowledge potential license ambiguity in your disclosure. | ||||||||||||||||||
* Avoid pasting verbatim outputs that resemble known copyrighted code. | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How would someone determine this? Due diligence is to never use the output of an LLM directly, but that isn't how LLM-based coding products are marketed or used. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question. I'm not sure, actually, so we should rephrase. Please see suggested edit below and let me know if you agree / have suggestions! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I honestly don't know either. In my opinion, a minimum of due diligence would be that any time a multi-line completion is accepted (and before editing it, since editing copied code would mean it is a derivative work), one runs a code search to see if there are any public results that are similar to the output. This is incredibly unrealistic because the interfaces of all coding LLM products are built to promote "don't know, don't care". But to endorse the premise that "dunno; the LLM generated that" is a watertight defense with no ethical implications is to say that there is no ethical transgression other than not sufficiently covering one's tracks.
Comment on lines
+61
to
+62
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||||||
* Prefer human-edited, transformative outputs you fully understand. | ||||||||||||||||||
|
||||||||||||||||||
We can’t control upstream model training data, but we can be cautious, explicit and critical about our usage. | ||||||||||||||||||
|
||||||||||||||||||
### Ethics and inclusion | ||||||||||||||||||
|
||||||||||||||||||
LLM outputs can reflect and amplify bias in training data. In documentation and tutorials, that bias can harm the very communities we want to support. | ||||||||||||||||||
|
||||||||||||||||||
* Review AI-generated text for stereotypes or exclusionary language. | ||||||||||||||||||
* Prefer plain, inclusive language. | ||||||||||||||||||
* Invite feedback and review from diverse contributors. | ||||||||||||||||||
|
||||||||||||||||||
Inclusion is part of quality. Treat AI-generated text with the same care as code. | ||||||||||||||||||
|
||||||||||||||||||
## Supporting volunteer peer review | ||||||||||||||||||
|
||||||||||||||||||
Peer review runs on **volunteer time**. Rapid, AI-assisted submissions can overwhelm reviewers—especially when code hasn’t been vetted. | ||||||||||||||||||
|
||||||||||||||||||
* Submit smaller PRs with clear scopes. | ||||||||||||||||||
* Summarize changes and provide test evidence. | ||||||||||||||||||
* Flag AI-assisted sections so reviewers know where to look closely. | ||||||||||||||||||
* Be responsive to feedback, especially on AI-generated code. | ||||||||||||||||||
|
||||||||||||||||||
These safeguards protect human capacity so high-quality packages can move through review efficiently. | ||||||||||||||||||
|
||||||||||||||||||
## Benefits and opportunities | ||||||||||||||||||
|
||||||||||||||||||
LLMs are already helping developers: | ||||||||||||||||||
lwasser marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||
|
||||||||||||||||||
* Explaining complex codebases | ||||||||||||||||||
* Generating unit tests and docstrings | ||||||||||||||||||
* In some cases, simplifying language barriers for participants in open source around the world | ||||||||||||||||||
* Speeding up everyday workflows | ||||||||||||||||||
|
||||||||||||||||||
For some contributors, these tools make open source more accessible. | ||||||||||||||||||
lwasser marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||||||||||||||
|
||||||||||||||||||
## Challenges we must address | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know you mention it above in the human oversight section, but maybe it's important to add another section here explaining that LLMs frequently incorrectly do programming tasks (especially those that are slightly more complex!). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was an interesting report from last year on this https://arxiv.org/html/2407.06153v1 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And this is a great study on how LLMs can actually slow down developers https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ Full paper here https://arxiv.org/abs/2507.09089 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @crhea93, I am very open to adding a new section, and if you'd like to suggest the changes or write a few sentences/paragraph with links and resources, I welcome that too 👐🏻 This is up to you, but it's a great suggestion. they are definitely frequently wrong in their suggestions, will use dated dependencies, dated and or wrong approaches, etc. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I knew suggesting this could be dangerous !! 😝😝😝 I'll write up a proposed section :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Incorrectness of LLMs and Misleading Time BenefitsAlthough it is commonly stated that LLM's help improve the productivity of high-level developers, recently scientific explorations of this hypothesis indicate the contrary (see https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ for an excellent discussion on this). What's more is that the responses of LLM's for complex coding tasks tend to be incorrect (e.g., https://arxiv.org/html/2407.06153v1). Therefore, it is crucial that, if an LLM is used to help produce code, that the correctness of the code is evaluated separately from the LLM.
Comment on lines
+98
to
+99
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @crhea93 thank you for this!! It's also great because it addresses some of the comments above that @jedbrown made related to time benefits (we adjusted above to say "perceived" rather than stating it as a fact. i'll go back and reread the entire thing but what you added is an important missing section!! |
||||||||||||||||||
|
||||||||||||||||||
### Overloaded peer review | ||||||||||||||||||
|
||||||||||||||||||
Peer review relies on volunteers. LLMs can produce large volumes of code quickly, increasing submissions with content that may not have been carefully reviewed by a human before reaching our review system. | ||||||||||||||||||
|
||||||||||||||||||
### Ethical and legal complexities | ||||||||||||||||||
|
||||||||||||||||||
LLMs are often trained on copyrighted or licensed material. Outputs may create conflicts when used in projects under different licenses. They can also reflect extractive practices, like data colonialism, and disproportionately harm underserved communities. | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
The licenses do not need to be different to be a license violation (and copyright infringement and/or plagiarism). |
||||||||||||||||||
|
||||||||||||||||||
### Bias and equity concerns | ||||||||||||||||||
|
||||||||||||||||||
AI-generated text can perpetuate bias. When it appears in documentation or tutorials, it can alienate the very groups open source most needs to welcome. | ||||||||||||||||||
|
||||||||||||||||||
### Environmental impacts | ||||||||||||||||||
|
||||||||||||||||||
Training and running LLMs [requires massive energy consumption](https://www.technologyreview.com/2019/06/06/239031/training-a-single-ai-model-can-emit-as-much-carbon-as-five-cars-in-their-lifetimes/), raising sustainability concerns that sit uncomfortably alongside much of the scientific research our community supports. | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The numbers are way higher now, and training is only one component. |
||||||||||||||||||
|
||||||||||||||||||
### Impact on learning | ||||||||||||||||||
|
||||||||||||||||||
Heavy reliance on LLMs risks producing developers who can prompt, but not debug or maintain, code—undermining long-term project sustainability and growth. | ||||||||||||||||||
|
||||||||||||||||||
## What you can do now | ||||||||||||||||||
|
||||||||||||||||||
* **Be transparent.** Disclose LLM use in your README and modules. | ||||||||||||||||||
* **Be accountable.** Thoroughly review, test, and edit AI-assisted code. | ||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does it mean to be "accountable"? What sort of lapses would constitute misconduct and what are the consequences? (E.g., a lawyer can lose their job and be disbarred when their use of LLMs undermine the integrity of the court.) |
||||||||||||||||||
* **Be license-aware.** Note uncertainties and avoid verbatim look-alikes. | ||||||||||||||||||
* **Be inclusive.** Check AI-generated docs for bias and clarity. | ||||||||||||||||||
* **Be considerate.** Respect volunteer reviewers’ time. | ||||||||||||||||||
|
||||||||||||||||||
|
||||||||||||||||||
<div class="notice" markdown="1"> | ||||||||||||||||||
## Join the conversation | ||||||||||||||||||
|
||||||||||||||||||
This policy is just the beginning. As AI continues to evolve, so will our practices. We invite you to: | ||||||||||||||||||
|
||||||||||||||||||
👉 Read the full draft policy | ||||||||||||||||||
👉 Share your feedback and help us shape how the scientific Python community approaches AI in open source. | ||||||||||||||||||
|
||||||||||||||||||
The conversation is only starting, and your voice matters. | ||||||||||||||||||
</div> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't air such conjecture without citation (and believing in the integrity of that work). It could say that studies and perception are mixed and perhaps that perception of efficacy appears to exceed reality (citing the METR study).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally - i found some articles - it's not at all citable. I've reworded below to talk more about some things that they think are great, others do not.