Skip to content

Commit 9e39a96

Browse files
pks-tgitster
authored andcommitted
refs: stop re-verifying common prefixes for availability
One of the checks done by `refs_verify_refnames_available()` is whether any of the prefixes of a reference already exists. For example, given a reference "refs/heads/main", we'd check whether "refs/heads" or "refs" already exist, and if so we'd abort the transaction. When updating multiple references at once, this check is performed for each of the references individually. Consequently, because references tend to have common prefixes like "refs/heads/" or refs/tags/", we evaluate the availability of these prefixes repeatedly. Naturally this is a waste of compute, as the availability of those prefixes should in general not change in the middle of a transaction. And if it would, backends would notice at a later point in time. Optimize this pattern by storing prefixes in a `strset` so that we can trivially track those prefixes that we have already checked. This leads to a significant speedup with the "reftable" backend when creating many references that all share a common prefix: Benchmark 1: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~) Time (mean ± σ): 63.1 ms ± 1.8 ms [User: 41.0 ms, System: 21.6 ms] Range (min … max): 60.6 ms … 69.5 ms 38 runs Benchmark 2: update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) Time (mean ± σ): 40.0 ms ± 1.3 ms [User: 29.3 ms, System: 10.3 ms] Range (min … max): 38.1 ms … 47.3 ms 61 runs Summary update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD) ran 1.58 ± 0.07 times faster than update-ref: create many refs (refformat = reftable, preexisting = 100000, new = 10000, revision = HEAD~) For the "files" backend we see an improvement, but a much smaller one: Benchmark 1: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~) Time (mean ± σ): 395.8 ms ± 5.3 ms [User: 63.6 ms, System: 330.5 ms] Range (min … max): 387.0 ms … 404.6 ms 10 runs Benchmark 2: update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) Time (mean ± σ): 386.0 ms ± 4.0 ms [User: 51.5 ms, System: 332.8 ms] Range (min … max): 380.8 ms … 392.6 ms 10 runs Summary update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD) ran 1.03 ± 0.02 times faster than update-ref: create many refs (refformat = files, preexisting = 100000, new = 10000, revision = HEAD~) This change also leads to a modest improvement when writing references with "initial" semantics, for example when migrating references. The following benchmarks are migrating 1m references from the "reftable" to the "files" backend: Benchmark 1: migrate reftable:files (refcount = 1000000, revision = HEAD~) Time (mean ± σ): 836.6 ms ± 5.6 ms [User: 645.2 ms, System: 185.2 ms] Range (min … max): 829.6 ms … 845.9 ms 10 runs Benchmark 2: migrate reftable:files (refcount = 1000000, revision = HEAD) Time (mean ± σ): 759.8 ms ± 5.1 ms [User: 574.9 ms, System: 178.9 ms] Range (min … max): 753.1 ms … 768.8 ms 10 runs Summary migrate reftable:files (refcount = 1000000, revision = HEAD) ran 1.10 ± 0.01 times faster than migrate reftable:files (refcount = 1000000, revision = HEAD~) And vice versa: Benchmark 1: migrate files:reftable (refcount = 1000000, revision = HEAD~) Time (mean ± σ): 870.7 ms ± 5.7 ms [User: 735.2 ms, System: 127.4 ms] Range (min … max): 861.6 ms … 883.2 ms 10 runs Benchmark 2: migrate files:reftable (refcount = 1000000, revision = HEAD) Time (mean ± σ): 799.1 ms ± 8.5 ms [User: 661.1 ms, System: 130.2 ms] Range (min … max): 787.5 ms … 812.6 ms 10 runs Summary migrate files:reftable (refcount = 1000000, revision = HEAD) ran 1.09 ± 0.01 times faster than migrate files:reftable (refcount = 1000000, revision = HEAD~) The impact here is significantly smaller given that we don't perform any reference reads with "initial" semantics, so the speedup only comes from us doing less string list lookups. Signed-off-by: Patrick Steinhardt <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 268ea85 commit 9e39a96

File tree

1 file changed

+12
-0
lines changed

1 file changed

+12
-0
lines changed

refs.c

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2485,6 +2485,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
24852485
struct strbuf dirname = STRBUF_INIT;
24862486
struct strbuf referent = STRBUF_INIT;
24872487
struct string_list_item *item;
2488+
struct strset dirnames;
24882489
int ret = -1;
24892490

24902491
/*
@@ -2494,6 +2495,8 @@ int refs_verify_refnames_available(struct ref_store *refs,
24942495

24952496
assert(err);
24962497

2498+
strset_init(&dirnames);
2499+
24972500
for_each_string_list_item(item, refnames) {
24982501
const char *refname = item->string;
24992502
const char *extra_refname;
@@ -2523,6 +2526,14 @@ int refs_verify_refnames_available(struct ref_store *refs,
25232526
if (skip && string_list_has_string(skip, dirname.buf))
25242527
continue;
25252528

2529+
/*
2530+
* If we've already seen the directory we don't need to
2531+
* process it again. Skip it to avoid checking checking
2532+
* common prefixes like "refs/heads/" repeatedly.
2533+
*/
2534+
if (!strset_add(&dirnames, dirname.buf))
2535+
continue;
2536+
25262537
if (!initial_transaction &&
25272538
!refs_read_raw_ref(refs, dirname.buf, &oid, &referent,
25282539
&type, &ignore_errno)) {
@@ -2583,6 +2594,7 @@ int refs_verify_refnames_available(struct ref_store *refs,
25832594
cleanup:
25842595
strbuf_release(&referent);
25852596
strbuf_release(&dirname);
2597+
strset_clear(&dirnames);
25862598
return ret;
25872599
}
25882600

0 commit comments

Comments
 (0)