|
| 1 | +# Extractive Contribution Policy |
| 2 | + |
| 3 | +LLVM's policy on AI-assisted tooling is fundamentally liberal -- We want to |
| 4 | +enable contributors to use the latest and greatest tools available. Our policy |
| 5 | +guided by two major concerns, the latter of which is the most important: |
| 6 | + |
| 7 | +1. Ensuring that contributions do not contain copyrighted content. |
| 8 | +2. Ensuring that contributions are not extractive and meet our |
| 9 | + [quality](#quality) bar. |
| 10 | + |
| 11 | +This policy covers, but is not limited to, the following kinds of |
| 12 | +contributions: |
| 13 | + |
| 14 | +- Code, usually in the form of a pull request |
| 15 | +- RFCs or design proposals |
| 16 | +- Issues or security vulnerabilities |
| 17 | +- Comments and feedback on pull requests |
| 18 | + |
| 19 | +## Copyright |
| 20 | + |
| 21 | +Artificial intelligence systems raise many questions around copyright that have |
| 22 | +yet to be answered. Our policy on AI tools is similar to our copyright policy: |
| 23 | +Contributors are responsible for ensuring that they have the right to |
| 24 | +contribute code under the terms of our license, typically meaning that either |
| 25 | +they, their employer, or their collaborators hold the copyright. Using AI tools |
| 26 | +to regenerate copyrighted material does not remove the copyright, and |
| 27 | +contributors are responsible for ensuring that such material does not appear in |
| 28 | +their contributions. Contributions found to violate this policy will be removed |
| 29 | +just like any other offending contribution. |
| 30 | + |
| 31 | +## Quality |
| 32 | + |
| 33 | +Sending patches, PRs, RFCs, comments, etc to LLVM, is not free -- it takes a |
| 34 | +lot of maintainer time and energy to review those contributions! Recent |
| 35 | +improvements in AI-assisted tooling have made it easy to generate large volumes |
| 36 | +of code and text with little effort on the part of the contributor. This has |
| 37 | +increased the asymmetry between the work of producing a contribution, and the |
| 38 | +work of reviewing the contribution. Our **golden rule** is that a contribution |
| 39 | +should be worth more to the project than the time it takes to review it. These |
| 40 | +ideas are captured by this quote from the book [Working in Public][1] by |
| 41 | +Nadia Eghbal: |
| 42 | + |
| 43 | +[1]: https://press.stripe.com/working-in-public |
| 44 | + |
| 45 | +> \"When attention is being appropriated, producers need to weigh the costs and |
| 46 | +> benefits of the transaction. To assess whether the appropriation of attention |
| 47 | +> is net-positive, it's useful to distinguish between *extractive* and |
| 48 | +> *non-extractive* contributions. Extractive contributions are those where the |
| 49 | +> marginal cost of reviewing and merging that contribution is greater than the |
| 50 | +> marginal benefit to the project's producers. In the case of a code |
| 51 | +> contribution, it might be a pull request that's too complex or unwieldy to |
| 52 | +> review, given the potential upside.\" \-- Nadia Eghbal |
| 53 | +
|
| 54 | +We encourage contributions that help sustain the project. We want the LLVM |
| 55 | +project to be welcoming and open to aspiring compiler engineers who are willing |
| 56 | +to invest time and effort to learn and grow, because growing our contributor |
| 57 | +base and recruiting new maintainers helps sustain the project over the long |
| 58 | +term. We therefore automatically post a greeting comment to pull requests from |
| 59 | +new contributors and encourage maintainers to spend their time to help new |
| 60 | +contributors learn. |
| 61 | + |
| 62 | +However, we expect to see a growth pattern in the quality of a contributor's |
| 63 | +work over time. Maintainers are empowered to push back against *extractive* |
| 64 | +contributions and explain why they believe a contribution is overly burdensome |
| 65 | +or not aligned with the project goals. |
| 66 | + |
| 67 | +If a maintainer judges that a contribution is extractive (i.e. it is generated |
| 68 | +with tool-assistance and is not valuable), they should copy-paste the following |
| 69 | +response, add the `extractive` label if applicable, and refrain from further |
| 70 | +engagement: |
| 71 | + |
| 72 | + This PR appears to be extractive, and requires additional justification for |
| 73 | + why it is valuable enough to the project for us to review it. Please see |
| 74 | + our developer policy on extractive contributions: |
| 75 | + http://llvm.org/docs/ExtractiveContributions.html |
| 76 | + |
| 77 | +Other reviewers should use the label prioritize their review time. |
| 78 | + |
| 79 | +Contributors are welcome to improve their work or make the case for why it has |
| 80 | +value for the community, but they should keep in mind that they may be |
| 81 | +moderated for excessive extractive communications. |
| 82 | + |
| 83 | +While our quality policy is subjective at its core, here are some guidelines |
| 84 | +that can be used to assess the quality of a contribution: |
| 85 | + |
| 86 | +- Contribution size: Larger contributions require more time to read and review. |
| 87 | + RFCs and issues should be clear and concise, and pull requests should not |
| 88 | + change unrelated code. |
| 89 | +- Potential user base: Contributions with more users are inherently more valuable. |
| 90 | +- Code must adhere to the [LLVM Coding Standards](CodingStandards.html). |
| 91 | +- Pull requests should build and pass premerge checks. For first-time |
| 92 | + contributors, this will require an initial cursory review to run the |
| 93 | + checks. |
| 94 | + |
| 95 | +The best ways to make a change less extractive and more valuable are to reduce |
| 96 | +its size or complexity or to increase its usefulness to the community. These |
| 97 | +factors are impossible to weigh objectively, and our project policy leaves this |
| 98 | +determination up to the maintainers of the project, i.e. those who are doing |
| 99 | +the work of sustaining the project. |
| 100 | + |
| 101 | +We encourage, but do not require, contributors making large changes to document |
| 102 | +the tools that they used as part of the rationale for why they believe their |
| 103 | +contribution has merit. This is similar in spirit to including a sed or Python |
| 104 | +script in the commit message when making large-scale changes to the project, |
| 105 | +such as updating the LLVM IR textual syntax. |
| 106 | + |
| 107 | +## Examples |
| 108 | + |
| 109 | +Here are some examples of contributions that demonstrate how to apply |
| 110 | +the principles of this policy: |
| 111 | + |
| 112 | +- [This PR](https://github.com/llvm/llvm-project/pull/142869) contains a |
| 113 | + proof from Alive2, which is a strong signal of value and correctness. |
| 114 | +- This [generated |
| 115 | + documentation](https://discourse.llvm.org/t/searching-for-gsym-documentation/85185/2) |
| 116 | + was reviewed for correctness by a human before being posted. |
| 117 | + |
| 118 | +## References |
| 119 | + |
| 120 | +Our policy was informed by experiences in other communities: |
| 121 | + |
| 122 | +- [Rust policy on burdensome |
| 123 | + PRs](https://github.com/rust-lang/compiler-team/issues/893) |
| 124 | +- [Seth Larson's post](https://sethmlarson.dev/slop-security-reports) |
| 125 | + on slop security reports in the Python ecosystem |
| 126 | +- The METR paper [Measuring the Impact of Early-2025 AI on Experienced |
| 127 | + Open-Source Developer |
| 128 | + Productivity](https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/). |
| 129 | +- [QEMU bans use of AI content |
| 130 | + generators](https://www.qemu.org/docs/master/devel/code-provenance.html#use-of-ai-content-generators) |
0 commit comments