Skip to content

Conversation

james7132
Copy link
Member

RENDERED

Added a new Policies section to the Contributing Guide and incorporated the above draft under that section. The triage reference has also been updated to make guidance for S-Nominated-to-Close aligned with the new AI policy.

@james7132 james7132 added C-Content X-Controversial There is active debate or serious implications around merging this PR A-Contributing-Guide labels Aug 12, 2025
Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. I think this lays out a clear position, and explains why we feel this is necessary without excessive moralizing or encouraging us to police problems that cannot be detected.

Co-authored-by: François Mockers <[email protected]>
alice-i-cecile and others added 3 commits August 12, 2025 13:07
Co-authored-by: François Mockers <[email protected]>
Co-authored-by: François Mockers <[email protected]>
@james7132
Copy link
Member Author

I did try a render of this locally and the footnote rendering we have on docs pages leaves quite a bit to be desired. Might want to follow this PR up with a style update to make it look nicer.

Comment on lines +27 to +29
The unsolicited use of automated systems to communicate issues, bugs, or security vulnerabilities
about Bevy Organization projects under the guise of a human is considered unacceptable
and a Code of Conduct violation. Any individual contributor, operator of automated systems,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this apply to communication that is clearly marked as AI-generated?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If unsolicited, yes. I think we should cut "under the guise of".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "under the guise of" was explicitly included to avoid capturing automated systems that do not pretend to be humans. There are systems like GitGuardian that does a useful service but makes zero effort to hide that it's a crawler bot.

Under the proposed situation, so long as the fuzzer results are presented either by a human, or by a clearly demarked bot account, it shouldn't be an issue. It's only when it files issues or PRs as if it were a human, and then subsequently wastes valuable volunteer time with the expectations that said bot would continue to engage as if it were a human where it becomes a major problem.


## AI Generated Communications

The unsolicited use of automated systems to communicate issues, bugs, or security vulnerabilities
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"automated systems" is not a good phrase here IMO, it excludes things like fuzzers and linters. i would explicitly say "generative AI" or "LLMs" throughout.

(the other two places you say "automated systems" are further qualified and probably don't need changes, but this one is unqualified.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interestingly I think that automated systems is fine here: the key term is "unsolicited".

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh interesting - so you wouldn't want e.g. an academic lab to post the output of a fancy dynamic fuzzer they built? that's reasonable i suppose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not without asking us first, no :) We'd say yes, but asking first is much more polite!

Copy link
Member

@janhohenheim janhohenheim Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reasoning clashes with what @james7132 says above:

The "under the guise of" was explicitly included to avoid capturing automated systems that do not pretend to be humans. There are systems like GitGuardian that does a useful service but makes zero effort to hide that it's a crawler bot.

Under the proposed situation, so long as the fuzzer results are presented either by a human, or by a clearly demarked bot account, it shouldn't be an issue. It's only when it files issues or PRs as if it were a human, and then subsequently wastes valuable volunteer time with the expectations that said bot would continue to engage as if it were a human where it becomes a major problem.

looks like James would find it okay to have a lab post the results of their academic fuzzer without asking as long as they clearly state what's happening?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I slightly disagree with James here, but it's not critical. Automated bots that are clearly marked as such and are trying to be helpful are both rare and not very obnoxious.

Comment on lines 48 to 50
internationally. In the case that AI generated works are not copyrightable, those same works
cannot be licensed to others, and thus AI-generated code and assets would not legally be
licensed under MIT or Apache.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way this is worded appears wrong to me. A work that isn't copyrightable (and not an infringement of copyright) is public domain. I'd assume public domain code contributions would not generally be excluded from being contributed to bevy (there is code that would be public domain, typically code that has an explicit copyright waiver, or with 0BSD/CC0). Yes, you cannot apply a license to them, but you also do not need to apply a license to them to distribute them.
If the answer to the question is "No they aren't copyrightable (and aren't derivative of the input dataset)", then there should be no issue. The issue arrises if they are copyrightable (they'd probably be copyright of the creator of the model), or they are derivative of the input set (which would infringe 3rd party copyright). I think this should be better worded to reflect those cases rather than calling out the one case that would be legally fine.
Note that I am not taking a stance on what legal outcome it is, and I agree with erring on the side of caution, but I think it should refer to the actual legally problematic case, and I don't think this does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very fair: we can be clearer here. The concern here is not that they are public domain (which is compatible with our licensing). The concern is that they are infringing on the works the model was trained on.

Co-authored-by: Niklas Eicker <[email protected]>
@ariofrio
Copy link

ariofrio commented Aug 15, 2025

Disclaimer: My job is to write software that incorporates LLMs, with the assistance of LLMs, but not for an LLM provider. These are just my personal opinions though.

Policy is Well-Scoped

I recently read Asahi Linux's Generative AI policy, and I think this policy does a good job of avoiding the pitfalls that IMHO they fell into:

  • It doesn't attempt to predict or assess the limitations of LLMs as they develop into the future.
  • It doesn't make unsupported philosophical claims about what "thought" or "reasoning" means.
  • As @alice-i-cecile mentioned, it avoids excessive moralizing.

Reasoning Seems Incorrect

I very much agree with @cart's @chorman0773 concerns about the current draft's reasoning for forbidding non-trivial AI-generated output from contributions.

  • If the output is not copyrightable, then it is under the public domain (source) and therefore can be freely included in any work. The scenario mentioned in the current draft is not actually a problem.
  • The US Copyright Office's second report seems to rule out copyrightability by the model's developer, "Copyright does not extend to purely AI-generated material" (source). LLM providers like OpenAI and Anthropic make doubly sure by assigning any rights they might have in the output to the user.
  • The actual problem only occurs if the output is copyrightable, but under a pre-existing copyright—that of the original author(s).

Comparing Risk Levels

One thing that's interesting about the comparison with Asahi's policy is that they make the (compelling-to-me) case that because their problem space is so esoteric and highly specific, LLMs may be more likely to reproduce the scarce, copyrighted training data that relates to the publicly undocumented hardware they write software for.

FWIW, Asahi's reasoning seems to apply somewhat less readily to Bevy. Still, the three reports on AI by the US Copyright Office do indeed raise concerns for any non-trivial use of LLMs to author copyrighted content, concerns that they do not resolve.

Unclear Cases

I think the current draft is a bit unclear on whether the following use cases would be forbidden or allowed:

  • Write most of a feature, all models and interfaces/signatures, and have an LLM fill in the gaps and resolve compiler errors by running the compiler iteratively ("compiler-driven development" with AI).
  • Have an LLM write a first draft of unit tests, and then review and rewrite as needed.

These feel like they fall somewhere between "autocomplete" and "generating entire function blocks". They're not equivalent to pre-LLM functionality, but neither are they Jesus-take-the-wheel-style vibe coding. I think the intention was to forbid these cases, though my own (perhaps less risk-averse) assessment would lean the other way, so it might be worth being a bit clearer. These cases would also be really hard to detect.

Futility?

Finally, I'd be surprised if non-trivial LLM-generated code hasn't already made it into Bevy or any widely-contributed-to open source project. Not sure what to do about that, but I guess if the intent is to err on the side of caution, a good-faith attempt to forbid the practice would be all that's practical to do.

Copy link
Member

@alice-i-cecile alice-i-cecile left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recent feedback about "why is this a problem" being legally suspect has convinced me: that needs to be resolved before we can move forward.

@alice-i-cecile
Copy link
Member

I very much agree with @cart's concerns about the current draft's reasoning for forbidding non-trivial AI-generated output from contributions.

I don't remember such comments? I agree with your concern, but did you mean @chorman0773?

[stated publicly][us-copyright-office-response] that "human authorship is a
pre-requisite to copyright protection". A [more recent report][us-copyright-office-report]
from the same institution shows a much more contested legal space, both within the US and
internationally. In the case that AI generated works are protected under copyright, those works
Copy link
Member

@janhohenheim janhohenheim Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IANAL, but I believe this suggests there is no problem with an non-derivative ruling, which I don't think is true.
The output could also be copyrighted and not derivative.
In that case, if the copyright is with the creator of the LLM, it cannot be open sourced unless the creator themselves waives the copyright.
If the copyright is with the authors in the training data, there's no meaningful way to license the output (AFAIK?).
It could also be copyrighted by the user writing the prompt and not derivative, in which case there is no legal issue at all.

Maybe reword this to say that it would be bad if it was copyright by someone else than the user and/or it was derivative?

@ariofrio
Copy link

I very much agree with @cart's concerns about the current draft's reasoning for forbidding non-trivial AI-generated output from contributions.

I don't remember such comments? I agree with your concern, but did you mean @chorman0773?

Oops yeah that’s who I meant. 😅 #2204 (comment).

@alice-i-cecile
Copy link
Member

Some other interesting discussion of policies around AI-generated code in the Linux Kernel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Contributing-Guide C-Content X-Controversial There is active debate or serious implications around merging this PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants