Skip to content

Commit 5e0ff12

Browse files
authored
ci(lint): expand jscpd to use branch comparison aws#5862
## Problem Follow up to: aws#5833 A static threshold is useful, but if the threshold fails, we want to be able to see what the new duplicated code is. ## Solution Run `jscpd` on the current branch, and if it is above the set threshold, we fail (same as before). Now, we also filter the resulting report to contain only clones from the files changed with respect to the base branch. Then, if the filtered report is non-empty, i.e. one of the changed files has a clone, fail and display the clones in CI. Adding this extra step allows us to know when we are adding a clone that was not otherwise in the codebase. It also allows us to see this clone in CI, before the PR is merged, giving the developer a chance to remove it. As an example, see commit (aws@a8e4860) where a copy-pasted function is added, and notice that CI fails and displays only the section of the report relevant to this clone. Then, once we remove this work in the following commit (aws@53d3973), the check passes. Included in this PR, is that it runs only on `pull_request` and no longer on `push`. ## Notes - In developing this, it became apparent that `jscpd` is not perfect and was not able to detect the clone added in (aws@e7357c0). Will need to dig deeper into their algorithm (based on https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm) and implementation (https://github.com/kucherenko/jscpdto) to understand why this is missed, and if it presents a larger problem. - Also, if a clone already exists in a file that is changed in the PR, this check will fail. In this case, the options are to ignore the check, mark the case as an exception, or fix the clone. To encourage developers to not repeatedly ignore the check, we still employ a static threshold via aws#5833. - If the branch hasn't been rebased up to date, it can pick up clones from unrelated changes. ## Future Work - We want to lower the threshold as much as possible by removing duplicates and marking exceptions when necessary. - More experience using this tool will help us find a `minLines` argument that provides the desired sensitivity. This argument allows us to ignore clones under `minLines` in length. Right now it is 3, which is likely too granular, but better to start there and loosen as needed. - Right now the CI output of the clones is very verbose. Digging through this could be time consuming, especially when many clones are added. Could be worth filtering it to provide a more ergonomic experience.
1 parent a4662aa commit 5e0ff12

File tree

3 files changed

+91
-21
lines changed

3 files changed

+91
-21
lines changed
Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
# github actions: https://docs.github.com/en/actions/use-cases-and-examples/building-and-testing/building-and-testing-nodejs
2+
# setup-node: https://github.com/actions/setup-node
3+
4+
name: Copy-Paste Detection
5+
6+
on:
7+
pull_request:
8+
branches: [master, feature/*, staging]
9+
10+
jobs:
11+
jscpd:
12+
runs-on: ubuntu-latest
13+
strategy:
14+
matrix:
15+
node-version: [18.x]
16+
env:
17+
NODE_OPTIONS: '--max-old-space-size=8192'
18+
19+
steps:
20+
- uses: actions/checkout@v4
21+
with:
22+
fetch-depth: 0
23+
24+
- name: Use Node.js ${{ matrix.node-version }}
25+
uses: actions/setup-node@v4
26+
with:
27+
node-version: ${{ matrix.node-version }}
28+
29+
- name: Fetch fork upstream
30+
run: |
31+
git remote add forkUpstream https://github.com/${{ github.event.pull_request.head.repo.full_name }} # URL of the fork
32+
git fetch forkUpstream # Fetch fork
33+
34+
- name: Determine base and target branches for comparison.
35+
run: |
36+
echo "CURRENT_BRANCH=${{ github.head_ref }}" >> $GITHUB_ENV
37+
echo "TARGET_BRANCH=${{ github.event.pull_request.base.ref }}" >> $GITHUB_ENV
38+
- run: git diff --name-only origin/$TARGET_BRANCH forkUpstream/$CURRENT_BRANCH > diff_output.txt
39+
- run: |
40+
npm install -g jscpd
41+
42+
- run: jscpd --config "$GITHUB_WORKSPACE/.github/workflows/jscpd.json"
43+
44+
- if: always()
45+
uses: actions/upload-artifact@v4
46+
with:
47+
name: unfiltered-jscpd-report
48+
path: ./jscpd-report.json
49+
50+
- name: Filter jscpd report for changed files
51+
run: |
52+
if [ ! -f ./jscpd-report.json ]; then
53+
echo "jscpd-report.json not found"
54+
exit 1
55+
fi
56+
echo "Filtering jscpd report for changed files..."
57+
CHANGED_FILES=$(jq -R -s -c 'split("\n")[:-1]' diff_output.txt)
58+
echo "Changed files: $CHANGED_FILES"
59+
jq --argjson changed_files "$CHANGED_FILES" '
60+
.duplicates | map(select(
61+
(.firstFile?.name as $fname | $changed_files | any(. == $fname)) or
62+
(.secondFile?.name as $sname | $changed_files | any(. == $sname))
63+
))
64+
' ./jscpd-report.json > filtered-jscpd-report.json
65+
cat filtered-jscpd-report.json
66+
67+
- name: Check for duplicates
68+
run: |
69+
if [ $(wc -l < ./filtered-jscpd-report.json) -gt 1 ]; then
70+
echo "filtered_report_exists=true" >> $GITHUB_ENV
71+
else
72+
echo "filtered_report_exists=false" >> $GITHUB_ENV
73+
fi
74+
- name: upload filtered report (if applicable)
75+
if: env.filtered_report_exists == 'true'
76+
uses: actions/upload-artifact@v4
77+
with:
78+
name: filtered-jscpd-report
79+
path: ./filtered-jscpd-report.json
80+
81+
- name: Fail and log found duplicates.
82+
if: env.filtered_report_exists == 'true'
83+
run: |
84+
cat ./filtered-jscpd-report.json
85+
echo "Duplications found, failing the check."
86+
exit 1

.github/workflows/jscpd.json

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
{
22
"pattern": "packages/**/*.ts",
3-
"ignore": ["**node_modules**", "**dist**"],
3+
"ignore": ["**node_modules**", "**dist**", "*/scripts/*"],
44
"gitignore": true,
5-
"threshold": 1.34,
6-
"minLines": 15
5+
"threshold": 3.0,
6+
"minLines": 3,
7+
"output": "./",
8+
"reporters": ["json"]
79
}

.github/workflows/node.js.yml

Lines changed: 0 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -55,24 +55,6 @@ jobs:
5555
- run: npm run testCompile
5656
- run: npm run lint
5757

58-
jscpd:
59-
needs: lint-commits
60-
runs-on: ubuntu-latest
61-
strategy:
62-
matrix:
63-
node-version: [18.x]
64-
env:
65-
NODE_OPTIONS: '--max-old-space-size=8192'
66-
steps:
67-
- uses: actions/checkout@v4
68-
- name: Use Node.js ${{ matrix.node-version }}
69-
uses: actions/setup-node@v4
70-
with:
71-
node-version: ${{ matrix.node-version }}
72-
- run: npm install jscpd
73-
- name: Run jscpd
74-
run: npx jscpd --config "$GITHUB_WORKSPACE/.github/workflows/jscpd.json"
75-
7658
macos:
7759
needs: lint-commits
7860
name: test macOS

0 commit comments

Comments
 (0)