Update _posts/2025-09-16-generative-ai-peer-review.md

lwasser · jedbrown · web-flow · commit 35ad468f5d93 · 2025-11-18T10:40:55.000-07:00
Co-authored-by: Jed Brown &lt;jed@jedbrown.org&gt;
diff --git a/_posts/2025-09-16-generative-ai-peer-review.md b/_posts/2025-09-16-generative-ai-peer-review.md
@@ -56,7 +56,9 @@ Please don’t offload vetting to volunteer reviewers. Arrive with human-reviewe
 
 ### Licensing awareness
 
-LLMs may be trained on mixed-license corpora. Outputs can create **license compatibility questions**, especially when your package uses a permissive license (MIT/BSD-3).
+LLMs are trained on source code and documents with many licenses, most of which require attribution/preservation of a copyright notice (possibly in addition to other terms). LLM outputs sometimes produce verbatim or near-verbatim copies of [code](https://githubcopilotlitigation.com/case-updates.html) or [prose](https://arxiv.org/abs/2505.12546) from the training data, but with attribution stripped. Without attribution, such instances constitute a derivative work that violates the license, thus are likely to be copyright infringement and are certainly plagiarism. Copyright infringement and plagiarism are issues of process, not merely of the final artifact, so it is difficult to prescribe a reliable procedure for due diligence when working with LLM output, short of assuming that such output is always tainted and thus the generated code or derivative works can never come into the code base. We recognize that many users of LLM products for software development would consider such diligence impractical.
+
+If similarities with existing software is detected **and** the licenses are compatible, one can come into compliance with the license by complying with its terms, such as by adding attribution. When the source package has an [incompatible license](https://dwheeler.com/essays/floss-license-slide.html), there is no simple fix. For example, if LGPL-2.1 code is emitted by an LLM into an Apache-2.0 project, no amount of attribution or license changes can bring the project into compliance. The Apache-2.0 project cannot even relicense to LGPL-2.1 without consent from every contributor (or their copyright holder). In such cases, the project would be responsible for deleting all implicated code and derivative works, and rewriting it all using [clean-room techniques](https://en.wikipedia.org/wiki/Clean-room_design).
 
 * Acknowledge potential license ambiguity in your disclosure.  
 * Avoid pasting verbatim outputs that resemble known copyrighted code.