-
Notifications
You must be signed in to change notification settings - Fork 737
ci(lint): run jscpd in CI, compare to static threshold. #5833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
068283f to
91860e4
Compare
.github/workflows/node.js.yml
Outdated
| - run: npm run lint | ||
|
|
||
| jscpd: | ||
| # needs: lint-commits |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we just need to remove this an its probably good to merge?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, still exploring adding some more functionality, but that can be follow up. Seems pretty low-risk to test this simple version in the meantime.
justinmk3
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Co-authored-by: Justin M. Keyes <[email protected]>
## Problem Follow up to: #5833 A static threshold is useful, but if the threshold fails, we want to be able to see what the new duplicated code is. ## Solution Run `jscpd` on the current branch, and if it is above the set threshold, we fail (same as before). Now, we also filter the resulting report to contain only clones from the files changed with respect to the base branch. Then, if the filtered report is non-empty, i.e. one of the changed files has a clone, fail and display the clones in CI. Adding this extra step allows us to know when we are adding a clone that was not otherwise in the codebase. It also allows us to see this clone in CI, before the PR is merged, giving the developer a chance to remove it. As an example, see commit (a8e4860) where a copy-pasted function is added, and notice that CI fails and displays only the section of the report relevant to this clone. Then, once we remove this work in the following commit (53d3973), the check passes. Included in this PR, is that it runs only on `pull_request` and no longer on `push`. ## Notes - In developing this, it became apparent that `jscpd` is not perfect and was not able to detect the clone added in (e7357c0). Will need to dig deeper into their algorithm (based on https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm) and implementation (https://github.com/kucherenko/jscpdto) to understand why this is missed, and if it presents a larger problem. - Also, if a clone already exists in a file that is changed in the PR, this check will fail. In this case, the options are to ignore the check, mark the case as an exception, or fix the clone. To encourage developers to not repeatedly ignore the check, we still employ a static threshold via #5833. - If the branch hasn't been rebased up to date, it can pick up clones from unrelated changes. ## Future Work - We want to lower the threshold as much as possible by removing duplicates and marking exceptions when necessary. - More experience using this tool will help us find a `minLines` argument that provides the desired sensitivity. This argument allows us to ignore clones under `minLines` in length. Right now it is 3, which is likely too granular, but better to start there and loosen as needed. - Right now the CI output of the clones is very verbose. Digging through this could be time consuming, especially when many clones are added. Could be worth filtering it to provide a more ergonomic experience.
Problem
New contributors often copy-paste logic and test cases, which creates a maintenance cost for little gain.
Examples
We call these repeated segments of code "clones".
Solution
implement a static threshold.
Run
jscpdin CI and compare the % duplicated to a fixed threshold of 1.34%, which is the current duplication rate in the repo.The main issue with this solution is that it fails to provide easy debugging options when new duplicates are found. Currently exploring adding this functionality in a follow-up.
Notes
License: I confirm that my contribution is made under the terms of the Apache 2.0 license.