Skip to content

Conversation

shreyasNaik0101
Copy link
Contributor

This pull request addresses an issue related to automatic target validation during chunking. Currently, the system performs a diff between the head and the base for chunking purposes. This approach causes incorrect target selection when the head is out of sync with the base.

The proposed fix changes the diff target to be the head against the latest upstream match (i.e., the default pull request diff) instead of the base. This correction ensures accurate target selection even when the head and base are not synchronized.

This behavior and the issue can be referenced in detail under issue #2568.

dollaransh17 added a commit to dollaransh17/sherlock that referenced this pull request Oct 4, 2025
The validation workflow was comparing changes against the wrong base,
causing incorrect target detection when PR branches get out of sync.

Now it uses git merge-base to find where the branch actually split off,
so it only validates the targets that were actually changed in the PR.

This is the same fix as sherlock-project#2588 in the upstream repo.
@dollaransh17
Copy link
Contributor

Hey @shreyasNaik0101! I tested this merge-base approach in my fork and can confirm it fixes the incorrect target detection when branches are out of sync. Great solution!

The commit reference that appeared was just me testing your fix locally - I'm not submitting a competing PR. Looking forward to this landing!

@shreyasNaik0101
Copy link
Contributor Author

Thanks for verifying it! Glad to hear it worked as expected. Appreciate you testing it out!

@ppfeister
Copy link
Member

ppfeister commented Oct 4, 2025

There's one issue with this method.

This workflow runs automatically, even for those who haven't contributed to the repository in the past (notice on: pull_request_target instead of on: pull_request). Allowing untrusted code to run without any gates whatsoever can pose several security concerns. By checking out the repository base, we know that only trusted (upstream) code is being ran, and only the manifest itself is imported - allowing on-the-fly reviews of the most frequent new contributions.

So the correct approach would require corrections to the diff in the Discover modified targets job, leaving the checked-out repository in place

@shreyasNaik0101
Copy link
Contributor Author

Thanks again for the detailed feedback! I've pushed up a new version that uses the secure method you recommended. The diff logic should be correct now.

Let me know if there's anything else.

Copy link
Member

@ppfeister ppfeister left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, where is the new data.json being referenced for the test itself?

Comment on lines 40 to 44
git fetch origin pull/${{ github.event.pull_request.number }}/head:pr
# The initial checkout may be shallow. To find a merge-base,
# we need more history. We can 'unshallow' the repository if needed.
git fetch --unshallow || true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be just a single command, rather than making two net calls?

If there's a good reason not to, all ears ofc

@shreyasNaik0101
Copy link
Contributor Author

@ppfeister I've pushed an update that should address both points. Let me know what you think.

@ppfeister
Copy link
Member

Looks good so far... will revert if there are any issues, so keep this branch open temporarily ---- will provide an update here shortly after a few PRs are successfully screened.

@ppfeister ppfeister merged commit b9e28b9 into sherlock-project:master Oct 5, 2025
@ppfeister ppfeister self-assigned this Oct 5, 2025
@ppfeister
Copy link
Member

Further validated in #2611, using an old head. Should be good to delete your branch!

echo -e ">>> Changed targets: \n$(echo $CHANGED | tr ',' '\n')"
echo "changed_targets=$CHANGED" >> "$GITHUB_OUTPUT"
# --- The rest of the steps below are unchanged ---
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick retrospective tip -

These lines are unnecessary as the diff itself shows what was changed and what wasn't --- so really all it does it add clutter to the code.

Not important, just a friendly note as you continue your development work!

Cheers! And thanks again! Your changes saved me quite a bit of time here. And it resolved what was quite an annoying issue on PRs with stale merge bases.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip and for all the help, @ppfeister! I'm glad I could contribute.

@shreyasNaik0101 shreyasNaik0101 deleted the fix/correct-ci-diff branch October 7, 2025 05:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants