Skip to content

Conversation

lwasser
Copy link
Member

@lwasser lwasser commented Sep 16, 2025

This blog post outlines pyOpenSci's new peer review policy regarding the use of generative AI tools in scientific software, emphasizing transparency, ethical considerations, and the importance of human oversight in the review process.

It is codeveloped by the pyOpenSci community and relates to a discussion here:

pyOpenSci/software-peer-review#331


For some contributors, these tools make open source more accessible.

## Challenges we must address
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you mention it above in the human oversight section, but maybe it's important to add another section here explaining that LLMs frequently incorrectly do programming tasks (especially those that are slightly more complex!).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an interesting report from last year on this https://arxiv.org/html/2407.06153v1

Copy link

@crhea93 crhea93 Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this is a great study on how LLMs can actually slow down developers https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/

Full paper here https://arxiv.org/abs/2507.09089

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crhea93, I am very open to adding a new section, and if you'd like to suggest the changes or write a few sentences/paragraph with links and resources, I welcome that too 👐🏻 This is up to you, but it's a great suggestion. they are definitely frequently wrong in their suggestions, will use dated dependencies, dated and or wrong approaches, etc.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I knew suggesting this could be dangerous !! 😝😝😝

I'll write up a proposed section :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrectness of LLMs and Misleading Time Benefits

Although it is commonly stated that LLM's help improve the productivity of high-level developers, recently scientific explorations of this hypothesis indicate the contrary (see https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ for an excellent discussion on this). What's more is that the responses of LLM's for complex coding tasks tend to be incorrect (e.g., https://arxiv.org/html/2407.06153v1). Therefore, it is crucial that, if an LLM is used to help produce code, that the correctness of the code is evaluated separately from the LLM.

This blog post outlines pyOpenSci's new peer review policy regarding the use of generative AI tools in scientific software, emphasizing transparency, ethical considerations, and the importance of human oversight in the review process.

## Generative AI meets scientific open source

It has been suggested that for some developers, using AI tools for tasks can increase efficiency by as much as 55%. But in open source scientific software, speed isn't everything—transparency, quality, and community trust matter just as much. So do the ethical questions these tools raise.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't air such conjecture without citation (and believing in the integrity of that work). It could say that studies and perception are mixed and perhaps that perception of efficacy appears to exceed reality (citing the METR study).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally - i found some articles - it's not at all citable. I've reworded below to talk more about some things that they think are great, others do not.


## Our Approach: Transparency and Disclosure

We know that people will continue to use LLMs. We also know they can meaningfully increase productivity and lower barriers to contribution for some. We also know that there are significant ethical, societal and other challenges that come with the development and use of LLM’s.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conjectures about the future depend greatly on legal outcomes and how society processes this moment. I would not say it's inevitable, but perhaps that pyOpenSci's policy will not on its own change the behavior of the community, especially those who aren't thinking about pyOpenSci.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - I was thinking about right now. People who use LLMs are likely to continue using them for the time being. My goal is really to avoid shaming people. I want to avoid divides Jed, but I also want us to be able to talk openly and thoughtfully disagree if we need to disagree! if that makes sense.

I wonder if we could modify the language. ths idea is that we don't expect to change how people are working with our policy. We are seeking transparency (from a program operation perspective).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
We know that people will continue to use LLMs. We also know they can meaningfully increase productivity and lower barriers to contribution for some. We also know that there are significant ethical, societal and other challenges that come with the development and use of LLM’s.
We acknowledge that social and ethical norms and concern for environmental and societal externalities varies greatly across the community, and yet few members of the community will look to pyOpenSci for guidance on whether to use LLMs in their own work. Our focus thus centers on assisting with informed decision-making and consent with respect to LLM use in the submission, reviewing, and editorial process.

I don't know if it's really true that pyOpenSci does not have the capability to influence decisions, in which case it could be worded that pyOpenSci advisors recommend against taking a position about whether LLM-generated content can be allowed in submissions. Or even a stronger stance that although pyOpenSci finds many uses of LLMs to run contrary to our community values, we believe that attempting to enforce a ban would incentivize dishonestly and erode community trust so severely that it outweighs epistemic benefits of such a policy, and therefore we seek only informed consent.


### Licensing awareness

LLMs may be trained on mixed-license corpora. Outputs can create **license compatibility questions**, especially when your package uses a permissive license (MIT/BSD-3).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LLM output does not comply with the license of the input package, even when the input is permissively licensed (MIT, CC-BY), because it fails to comply with the attribution requirement of the license. The license of the package incorporating LLM output does not matter.

License compatibility only matters after an egregious violation is discovered: if the licenses are compatible, one could become compliant merely by adding attribution.

LLMs may be trained on mixed-license corpora. Outputs can create **license compatibility questions**, especially when your package uses a permissive license (MIT/BSD-3).

* Acknowledge potential license ambiguity in your disclosure.
* Avoid pasting verbatim outputs that resemble known copyrighted code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would someone determine this? Due diligence is to never use the output of an LLM directly, but that isn't how LLM-based coding products are marketed or used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I'm not sure, actually, so we should rephrase. Please see suggested edit below and let me know if you agree / have suggestions!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't know either. In my opinion, a minimum of due diligence would be that any time a multi-line completion is accepted (and before editing it, since editing copied code would mean it is a derivative work), one runs a code search to see if there are any public results that are similar to the output. This is incredibly unrealistic because the interfaces of all coding LLM products are built to promote "don't know, don't care".

But to endorse the premise that "dunno; the LLM generated that" is a watertight defense with no ethical implications is to say that there is no ethical transgression other than not sufficiently covering one's tracks.


### Ethical and legal complexities

LLMs are often trained on copyrighted or licensed material. Outputs may create conflicts when used in projects under different licenses. They can also reflect extractive practices, like data colonialism, and disproportionately harm underserved communities.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LLMs are often trained on copyrighted or licensed material. Outputs may create conflicts when used in projects under different licenses. They can also reflect extractive practices, like data colonialism, and disproportionately harm underserved communities.
LLMs are often trained on copyrighted material with varying (or no) licenses. Outputs may constitute copyright infringement and/or ethical violations such as plagiarism. They can also reflect extractive practices, like data colonialism, and disproportionately harm underserved communities.

The licenses do not need to be different to be a license violation (and copyright infringement and/or plagiarism).


### Environmental impacts

Training and running LLMs [requires massive energy consumption](https://www.technologyreview.com/2019/06/06/239031/training-a-single-ai-model-can-emit-as-much-carbon-as-five-cars-in-their-lifetimes/), raising sustainability concerns that sit uncomfortably alongside much of the scientific research our community supports.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numbers are way higher now, and training is only one component.

## What you can do now

* **Be transparent.** Disclose LLM use in your README and modules.
* **Be accountable.** Thoroughly review, test, and edit AI-assisted code.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean to be "accountable"? What sort of lapses would constitute misconduct and what are the consequences? (E.g., a lawyer can lose their job and be disbarred when their use of LLMs undermine the integrity of the court.)


We know that people will continue to use LLMs. We also know they can meaningfully increase productivity and lower barriers to contribution for some. We also know that there are significant ethical, societal and other challenges that come with the development and use of LLM’s.

Our community’s expectation is simple: **be open about it**.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our community’s expectation is simple: be open about any AI usage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent suggestion @elliesch !

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@all-contributors please add @elliesch for review, blog


* Run tests and confirm correctness.
* Check for security and quality issues.
* Ensure style, readability, and clear docstrings.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ensure style, readability, clear, and concise docstrings.

Depending on the AI tool, generated docstrings can sometimes be overly verbose without adding meaningful understanding.


## Generative AI meets scientific open source

It has been suggested that for some developers, using AI tools for tasks can increase efficiency by as much as 55%. But in open source scientific software, speed isn't everything—transparency, quality, and community trust matter just as much. So do the ethical questions these tools raise.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
It has been suggested that for some developers, using AI tools for tasks can increase efficiency by as much as 55%. But in open source scientific software, speed isn't everything—transparency, quality, and community trust matter just as much. So do the ethical questions these tools raise.
Some developers believe that using AI products increases efficiency. However, in scientific open-source, speed isn't everything—transparency, quality, and community trust are just as important. Similarly, the ethical questions that these tools raise are also a concern.

@lwasser
Copy link
Member Author

lwasser commented Sep 23, 2025

@all-contributors please add @elliesch for review, blog

Copy link
Contributor

@lwasser

I've put up a pull request to add @elliesch! 🎉


### Licensing awareness

LLMs may be trained on mixed-license corpora. Outputs can create **license compatibility questions**, especially when your package uses a permissive license (MIT/BSD-3).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LLMs may be trained on mixed-license corpora. Outputs can create **license compatibility questions**, especially when your package uses a permissive license (MIT/BSD-3).
LLMs may be trained on mixed-license corpora. Outputs can create **license compatibility questions**, especially when your package uses a permissive license (MIT/BSD-3). For example, content that is licensed using BSD-3 require attribution when using the code for derivative works. Currently, LLMs don't provide that attribution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jedbrown is this better? to have an example? we don't need to get into the weeds of licensing for this post but i do want to get the big picture correct. I also thought MIT didn't require attribution (my understanding might be totally wrong!), so I provided an example using BSD-3. Lemme know what you think as you seem to have deep expertise on licenses!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MIT also requires attribution (emphasis added)

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

The point I was trying to make was that even if all sources for a given output have the same license, it's still a license violation because the copyright notice (attribution) is not preserved. Here's a crack at a more comprehensive description of the situation and possible remedies.

Suggested change
LLMs may be trained on mixed-license corpora. Outputs can create **license compatibility questions**, especially when your package uses a permissive license (MIT/BSD-3).
LLMs are trained on source code and documents with many licenses, most of which require attribution/preservation of a copyright notice (possibly in addition to other terms). LLM outputs sometimes produce verbatim or near-verbatim copies of [code](https://githubcopilotlitigation.com/case-updates.html) or [prose](https://arxiv.org/abs/2505.12546) from the training data, but with attribution stripped. Without attribution, such instances constitute a derivative work that violates the license, thus are likely to be copyright infringement and are certainly plagiarism. Copyright infringement and plagiarism are issues of process, not merely of the final artifact, so it is difficult to prescribe a reliable procedure for due diligence when working with LLM output, short of assuming that such output is always tainted and thus the generated code or derivative works can never come into the code base. We recognize that many users of LLM products for software development would consider such diligence impractical.
If similarities with existing software is detected **and** the licenses are compatible, one can come into compliance with the license by complying with its terms, such as by adding attribution. When the source package has an [incompatible license](https://dwheeler.com/essays/floss-license-slide.html), there is no simple fix. For example, if LGPL-2.1 code is emitted by an LLM into an Apache-2.0 project, no amount of attribution or license changes can bring the project into compliance. The Apache-2.0 project cannot even relicense to LGPL-2.1 without consent from every contributor (or their copyright holder). In such cases, the project would be responsible for deleting all implicated code and derivative works, and rewriting it all using [clean-room techniques](https://en.wikipedia.org/wiki/Clean-room_design).

Comment on lines +61 to +62
* Acknowledge potential license ambiguity in your disclosure.
* Avoid pasting verbatim outputs that resemble known copyrighted code.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Acknowledge potential license ambiguity in your disclosure.
* Avoid pasting verbatim outputs that resemble known copyrighted code.
* Be aware that when you directly use content developed by an LLM, there will be inherent license conflicts.
* Be aware that LLM products can potentially return copyrighted code verbatim in some cases. Avoid pasting verbatim outputs from an LLM into your package. Rather, if you use LLMs in your work, carefully review, edit, and modify the content, and

lwasser and others added 2 commits September 23, 2025 16:55
Comment on lines +98 to +99

## Challenges we must address
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
## Challenges we must address
### Incorrectness of LLMs and misleading time benefits
Although it is commonly stated that LLMs help improve the productivity of high-level developers, recent scientific explorations of this hypothesis indicate the contrary (see https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/ for an excellent discussion on this). What's more is that the responses of LLM's for complex coding tasks tend to be incorrect (e.g., https://arxiv.org/html/2407.06153v1). Therefore, it is crucial that, if an LLM is used to help produce code, that the correctness of the code is evaluated separately from the LLM.
## Challenges we must address

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crhea93 thank you for this!! It's also great because it addresses some of the comments above that @jedbrown made related to time benefits (we adjusted above to say "perceived" rather than stating it as a fact. i'll go back and reread the entire thing but what you added is an important missing section!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants