You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
builtin/remote: rework how remote refs get renamed
It was recently reported [1] that renaming a remote that has dangling
symrefs is broken. This issue can be trivially reproduced:
$ git init repo
Initialized empty Git repository in /tmp/repo/.git/
$ cd repo/
$ git remote add origin /dev/null
$ git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/master
$ git remote rename origin renamed
$ git symbolic-ref refs/remotes/origin/HEAD
refs/remotes/origin/master
$ git symbolic-ref refs/remotes/renamed/HEAD
fatal: ref refs/remotes/renamed/HEAD is not a symbolic ref
As one can see, the "HEAD" reference did not get renamed but stays in
the same place. There are two issues here:
- We use `refs_resolve_ref_unsafe()` to resolve references, but we
don't pass the `RESOLVE_REF_NO_RECURSE` flag. Consequently, if the
reference does not resolve, the function will fail and we thus
ignore this branch.
- We use `refs_for_each_ref()` to iterate through the old remote's
references, but that function ignores broken references.
Both of these issues are easy to fix. But having a closer look at the
logic that renames remote references surfaces that it leaves a lot to be
desired overall.
The problem is that we're using O(|refs| + |symrefs| * 2) many reference
transactions to perform the renames. We first delete all symrefs, then
individually rename every direct reference and finally we recreate the
symrefs. On the one hand this isn't even remotely an atomic operation,
so if we hit any error we'll already have deleted all references.
But more importantly it is also extremely inefficient. The number of
transactions for symrefs doesn't really bother us too much, as there
should generally only be a single symref anyway ("HEAD"). But the
renames are very expensive:
- For the "reftable" backend we perform auto-compaction after every
single rename, which does add up.
- For the "files" backend we potentially have to rewrite the
"packed-refs" file on every single rename in case they are packed.
The consequence here is quadratic runtime performance. Renaming a
100k references takes hours to complete.
Refactor the code to use a single transaction to perform all the
reference updates atomically, which speeds up the transaction quite
significantly:
Benchmark 1: rename remote (refformat = files, revision = HEAD~)
Time (mean ± σ): 238.770 s ± 13.857 s [User: 91.473 s, System: 143.793 s]
Range (min … max): 204.863 s … 247.699 s 10 runs
Benchmark 2: rename remote (refformat = files, revision = HEAD)
Time (mean ± σ): 2.103 s ± 0.036 s [User: 0.360 s, System: 1.313 s]
Range (min … max): 2.011 s … 2.141 s 10 runs
Summary
rename remote (refformat = files, revision = HEAD) ran
113.53 ± 6.87 times faster than rename remote (refformat = files, revision = HEAD~)
For the "reftable" backend we see a significant speedup, as well, but
given that we don't have quadratic runtime behaviour there it's way less
extreme:
Benchmark 1: rename remote (refformat = reftable, revision = HEAD~)
Time (mean ± σ): 8.604 s ± 0.539 s [User: 4.985 s, System: 2.368 s]
Range (min … max): 7.880 s … 9.556 s 10 runs
Benchmark 2: rename remote (refformat = reftable, revision = HEAD)
Time (mean ± σ): 1.177 s ± 0.103 s [User: 0.446 s, System: 0.270 s]
Range (min … max): 1.023 s … 1.410 s 10 runs
Summary
rename remote (refformat = reftable, revision = HEAD) ran
7.31 ± 0.79 times faster than rename remote (refformat = reftable, revision = HEAD~)
There is one issue though with using atomic transactions: when nesting a
remote into itself it can happen that renamed references conflict with
the old referencse. For example, when we have a reference
"refs/remotes/origin/foo" and we rename "origin" to "origin/foo", then
we'll end up with an F/D conflict when we try to create the renamed
reference "refs/remotes/origin/foo/foo".
This situation is overall quite unlikely to happen: people tend to not
use nested remotes, and if they do they must at the same time also have
a conflicting refname. But the end result would be that the old remote
references stay intact whereas all the other parts of the repository
have been adjusted for the new remote name.
Address this by queueing and preparing the reference update before we
touch any other part of the repository. Like this we can make sure that
the reference update will go through before rewriting the configuration.
Otherwise, if the transaction fails to prepare we can gracefully abort
the whole operation without any changes having been performed in the
repository yet. Furthermore, we can detect the conflict and print some
helpful advice for how the user can resolve this situation. So overall,
the tradeoff is that:
- Reference transactions are now all-or-nothing. This is a significant
improvement over the previous state where we may have ended up with
partially-renamed references.
- Rewriting references is now significantly faster.
- We only rewrite the configuration in case we know that all
references can be updated.
- But we may refuse to rename a remote in case references conflict.
Overall this seems like an acceptable tradeoff.
While at it, fix the handling of symbolic/broken references by using
`refs_for_each_rawref()`. Add tests that cover both this reported issue
and tests that exercise nesting of remotes.
One thing to note: with this change we cannot provide a proper progress
monitor anymore as we queue the references into the transactions as we
iterate through them. Consequently, as we don't know yet how many refs
there are in total, we cannot report how many percent of the operation
is done anymore. But that's a small price to pay considering that you
now shouldn't need the progress monitor in most situations at all
anymore.
[1]: <CANrWfmQWa=RJnm7d3C7ogRX6Tth2eeuGwvwrNmzS2gr+eP0OpA@mail.gmail.com>
Reported-by: Han Jiang <[email protected]>
Signed-off-by: Patrick Steinhardt <[email protected]>
Signed-off-by: Junio C Hamano <[email protected]>
0 commit comments