-
Notifications
You must be signed in to change notification settings - Fork 48
Description
Problem
Taskgraph currently does some extra logic to figure out a proper base revision that we then use to compute the files changed by the graph:
taskgraph/src/taskgraph/decision.py
Line 202 in 99840c4
| parameters["base_rev"] = _determine_more_accurate_base_rev( |
This logic is solving some deficiencies with Github events:
- For force pushes, the
event.beforeis the newly orphaned head rather than the actual base revision. - For pushes to new branches, the
event.beforeis the null revision - For pull requests, the
pull_request.base.shaproperty contains the revision that the base reference currently points to, not the revision that the PR is based on.
So instead, we essentially use git merge-base <default branch> <head rev> to find the ancestor commit of both, and then set that to the base revision, which then gets used to compute files-changed.
The problem is that now we're starting to use shallow clones, which means that merge-base doesn't work without progressively deepening the repository until we find the ancestor (negating the benefit of shallow clones).
So a new solution is needed!
Possible Solutions
Simply stop using git merge-base
In this solution, we would simply run git diff <base> <head> to get the files modified between the two trees. Here we would be consciously making a tradeoff of sometimes having inaccurate files-changed for simplicity and clone performance. The above scenarios would roughly shake out like:
files-changedwould include files that were modified by orphaned commits under<base>, even if<head>didn't touch them. This would cause us to run more tasks than necessary.- We'd probably have to special case this, maybe only looking at files touched by
<head>. - Pull requests would have files-changed that are derived from all commits on the base branch that the PR hasn't rebased on top of yet, potentially running many more tasks than expected
Use the Github API
It should be possible to get the files modified from a PR or force push using the Github API. This would be fast and simple to implement, though we'd need to start worrying about rate limits, tokens and it isn't portable if we ever want to support non-Github repos.
In this case, we'd likely still want a fallback to the merge-base solution for non-Github repos or when hitting rate limits.