Skip to content

Commit bdda2ff

Browse files
authored
section on ai code
1 parent cf70813 commit bdda2ff

File tree

1 file changed

+17
-1
lines changed

1 file changed

+17
-1
lines changed

content/software-licensing.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ Containers are a bit more tricky when it comes to licenses.
111111
- Distribution of container recipes: it's like distributing source code
112112
- Distribution of container images: it can be considered like distributing a binary compiled software
113113

114-
The latter case is a bit more nuanced and the interested reader should read more about "Mere Aggregation" at [GPL-FAQ](https://www.gnu.org/licenses/gpl-faq.html#MereAggregation). Briefly, if the conatiner image just bundles separate programs that talk through normal system interfaces, it is an **aggregate** and each keeps its own license (like a CD-ROM with various packages). If the components are tightly integrated into one program (e.g. a pipeline with various parts that the container can run as a single program), the image may be treated as a **derivative work**, and stricter license obligations (e.g. GPL copyleft) can apply.
114+
The latter case is a bit more nuanced and the interested reader should read more about "Mere Aggregation" at [GPL-FAQ](https://www.gnu.org/licenses/gpl-faq.html#MereAggregation). Briefly, if the container image just bundles separate programs that talk through normal system interfaces, it is an **aggregate** and each keeps its own license (like a CD-ROM with various packages). If the components are tightly integrated into one program (e.g. a pipeline with various parts that the container can run as a single program), the image may be treated as a **derivative work**, and stricter license obligations (e.g. GPL copyleft) can apply.
115115

116116
---
117117

@@ -291,6 +291,22 @@ Practical steps:
291291
files and they only have one `LICENSE` file and that is
292292
OK for really small projects.
293293

294+
295+
296+
```{admonition} Licensing code produced by generative AI systems
297+
298+
With generative AI tools for coding such as GitHub copilot, Cursor, or even basic chat implementations (ChatGPT, Claude, Grok, ...) the responsibility fully lays on the person who is going to use (and publish) the generated code. You can never blame the autopilot or the company who invented it, only the driver (you!).
299+
300+
There are various risks in using generative AI code (this is not a taxonomy). A few examples:
301+
302+
- Risks for the derivative work: you think your code is doing what you asked, but you did not review it and your results are false
303+
- Risks for the system in use: your generated code has software security issues, e.g. an import is a *typosquat* of an actual library (e.g. "microsoft" is spelled "rnicrosoft" and depending on the font you might totally miss it...)
304+
- Risks related to licenses/IPR: you have generated code that is actually verbatim copy of fully copyrighted code, or code that requires a strict copyleft license. Plagiarism (ethics) also applies.
305+
306+
If we focus on the last one, a recent paper ([ref](https://arxiv.org/html/2408.02487v1)) estimates that around 2% of AI generated code is "strikingly similar to existing open-source implementations". Generative AI tools are typically not able to provide an exact reference of where certain bits of generated code were copied from, so it is the responsibility of the researcher to verify that the produced code is citing and referencing the license of other published pieces of software. Possibly, future AI systems for code generation can be trained on code that share the same set of licenses (e.g. based only on MIT) to mitigate these risks.
307+
308+
```
309+
294310
---
295311

296312

0 commit comments

Comments
 (0)