-
Notifications
You must be signed in to change notification settings - Fork 390
Add the Bevy Org's AI Policy #2204
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good. I think this lays out a clear position, and explains why we feel this is necessary without excessive moralizing or encouraging us to police problems that cannot be detected.
Co-authored-by: François Mockers <[email protected]>
Co-authored-by: François Mockers <[email protected]>
Co-authored-by: François Mockers <[email protected]>
Co-authored-by: Alice Cecile <[email protected]>
I did try a render of this locally and the footnote rendering we have on docs pages leaves quite a bit to be desired. Might want to follow this PR up with a style update to make it look nicer. |
The unsolicited use of automated systems to communicate issues, bugs, or security vulnerabilities | ||
about Bevy Organization projects under the guise of a human is considered unacceptable | ||
and a Code of Conduct violation. Any individual contributor, operator of automated systems, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this apply to communication that is clearly marked as AI-generated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If unsolicited, yes. I think we should cut "under the guise of".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "under the guise of" was explicitly included to avoid capturing automated systems that do not pretend to be humans. There are systems like GitGuardian that does a useful service but makes zero effort to hide that it's a crawler bot.
Under the proposed situation, so long as the fuzzer results are presented either by a human, or by a clearly demarked bot account, it shouldn't be an issue. It's only when it files issues or PRs as if it were a human, and then subsequently wastes valuable volunteer time with the expectations that said bot would continue to engage as if it were a human where it becomes a major problem.
|
||
## AI Generated Communications | ||
|
||
The unsolicited use of automated systems to communicate issues, bugs, or security vulnerabilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"automated systems" is not a good phrase here IMO, it excludes things like fuzzers and linters. i would explicitly say "generative AI" or "LLMs" throughout.
(the other two places you say "automated systems" are further qualified and probably don't need changes, but this one is unqualified.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly I think that automated systems is fine here: the key term is "unsolicited".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh interesting - so you wouldn't want e.g. an academic lab to post the output of a fancy dynamic fuzzer they built? that's reasonable i suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not without asking us first, no :) We'd say yes, but asking first is much more polite!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This reasoning clashes with what @james7132 says above:
The "under the guise of" was explicitly included to avoid capturing automated systems that do not pretend to be humans. There are systems like GitGuardian that does a useful service but makes zero effort to hide that it's a crawler bot.
Under the proposed situation, so long as the fuzzer results are presented either by a human, or by a clearly demarked bot account, it shouldn't be an issue. It's only when it files issues or PRs as if it were a human, and then subsequently wastes valuable volunteer time with the expectations that said bot would continue to engage as if it were a human where it becomes a major problem.
looks like James would find it okay to have a lab post the results of their academic fuzzer without asking as long as they clearly state what's happening?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I slightly disagree with James here, but it's not critical. Automated bots that are clearly marked as such and are trying to be helpful are both rare and not very obnoxious.
internationally. In the case that AI generated works are not copyrightable, those same works | ||
cannot be licensed to others, and thus AI-generated code and assets would not legally be | ||
licensed under MIT or Apache. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way this is worded appears wrong to me. A work that isn't copyrightable (and not an infringement of copyright) is public domain. I'd assume public domain code contributions would not generally be excluded from being contributed to bevy (there is code that would be public domain, typically code that has an explicit copyright waiver, or with 0BSD/CC0). Yes, you cannot apply a license to them, but you also do not need to apply a license to them to distribute them.
If the answer to the question is "No they aren't copyrightable (and aren't derivative of the input dataset)", then there should be no issue. The issue arrises if they are copyrightable (they'd probably be copyright of the creator of the model), or they are derivative of the input set (which would infringe 3rd party copyright). I think this should be better worded to reflect those cases rather than calling out the one case that would be legally fine.
Note that I am not taking a stance on what legal outcome it is, and I agree with erring on the side of caution, but I think it should refer to the actual legally problematic case, and I don't think this does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very fair: we can be clearer here. The concern here is not that they are public domain (which is compatible with our licensing). The concern is that they are infringing on the works the model was trained on.
Co-authored-by: Niklas Eicker <[email protected]>
Policy is Well-ScopedI recently read Asahi Linux's Generative AI policy, and I think this policy does a good job of avoiding the pitfalls that IMHO they fell into:
Reasoning Seems IncorrectI very much agree with
Comparing Risk LevelsOne thing that's interesting about the comparison with Asahi's policy is that they make the (compelling-to-me) case that because their problem space is so esoteric and highly specific, LLMs may be more likely to reproduce the scarce, copyrighted training data that relates to the publicly undocumented hardware they write software for. FWIW, Asahi's reasoning seems to apply somewhat less readily to Bevy. Still, the three reports on AI by the US Copyright Office do indeed raise concerns for any non-trivial use of LLMs to author copyrighted content, concerns that they do not resolve. Unclear CasesI think the current draft is a bit unclear on whether the following use cases would be forbidden or allowed:
These feel like they fall somewhere between "autocomplete" and "generating entire function blocks". They're not equivalent to pre-LLM functionality, but neither are they Jesus-take-the-wheel-style vibe coding. I think the intention was to forbid these cases, though my own (perhaps less risk-averse) assessment would lean the other way, so it might be worth being a bit clearer. These cases would also be really hard to detect. Futility?Finally, I'd be surprised if non-trivial LLM-generated code hasn't already made it into Bevy or any widely-contributed-to open source project. Not sure what to do about that, but I guess if the intent is to err on the side of caution, a good-faith attempt to forbid the practice would be all that's practical to do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recent feedback about "why is this a problem" being legally suspect has convinced me: that needs to be resolved before we can move forward.
Co-authored-by: Alice Cecile <[email protected]>
I don't remember such comments? I agree with your concern, but did you mean @chorman0773? |
[stated publicly][us-copyright-office-response] that "human authorship is a | ||
pre-requisite to copyright protection". A [more recent report][us-copyright-office-report] | ||
from the same institution shows a much more contested legal space, both within the US and | ||
internationally. In the case that AI generated works are protected under copyright, those works |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IANAL, but I believe this suggests there is no problem with an non-derivative ruling, which I don't think is true.
The output could also be copyrighted and not derivative.
In that case, if the copyright is with the creator of the LLM, it cannot be open sourced unless the creator themselves waives the copyright.
If the copyright is with the authors in the training data, there's no meaningful way to license the output (AFAIK?).
It could also be copyrighted by the user writing the prompt and not derivative, in which case there is no legal issue at all.
Maybe reword this to say that it would be bad if it was copyright by someone else than the user and/or it was derivative?
Oops yeah that’s who I meant. 😅 #2204 (comment). |
Some other interesting discussion of policies around AI-generated code in the Linux Kernel. |
RENDERED
Added a new Policies section to the Contributing Guide and incorporated the above draft under that section. The triage reference has also been updated to make guidance for
S-Nominated-to-Close
aligned with the new AI policy.