Skip to content

Commit 35ad468

Browse files
lwasserjedbrown
andauthored
Update _posts/2025-09-16-generative-ai-peer-review.md
Co-authored-by: Jed Brown <[email protected]>
1 parent 82f3399 commit 35ad468

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

_posts/2025-09-16-generative-ai-peer-review.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,9 @@ Please don’t offload vetting to volunteer reviewers. Arrive with human-reviewe
5656

5757
### Licensing awareness
5858

59-
LLMs may be trained on mixed-license corpora. Outputs can create **license compatibility questions**, especially when your package uses a permissive license (MIT/BSD-3).
59+
LLMs are trained on source code and documents with many licenses, most of which require attribution/preservation of a copyright notice (possibly in addition to other terms). LLM outputs sometimes produce verbatim or near-verbatim copies of [code](https://githubcopilotlitigation.com/case-updates.html) or [prose](https://arxiv.org/abs/2505.12546) from the training data, but with attribution stripped. Without attribution, such instances constitute a derivative work that violates the license, thus are likely to be copyright infringement and are certainly plagiarism. Copyright infringement and plagiarism are issues of process, not merely of the final artifact, so it is difficult to prescribe a reliable procedure for due diligence when working with LLM output, short of assuming that such output is always tainted and thus the generated code or derivative works can never come into the code base. We recognize that many users of LLM products for software development would consider such diligence impractical.
60+
61+
If similarities with existing software is detected **and** the licenses are compatible, one can come into compliance with the license by complying with its terms, such as by adding attribution. When the source package has an [incompatible license](https://dwheeler.com/essays/floss-license-slide.html), there is no simple fix. For example, if LGPL-2.1 code is emitted by an LLM into an Apache-2.0 project, no amount of attribution or license changes can bring the project into compliance. The Apache-2.0 project cannot even relicense to LGPL-2.1 without consent from every contributor (or their copyright holder). In such cases, the project would be responsible for deleting all implicated code and derivative works, and rewriting it all using [clean-room techniques](https://en.wikipedia.org/wiki/Clean-room_design).
6062

6163
* Acknowledge potential license ambiguity in your disclosure.
6264
* Avoid pasting verbatim outputs that resemble known copyrighted code.

0 commit comments

Comments
 (0)