Skip to content

Commit 7903efb

Browse files
derrickstoleegitster
authored andcommitted
bundle-uri: download in creationToken order
The creationToken heuristic provides an ordering on the bundles advertised by a bundle list. Teach the Git client to download bundles differently when this heuristic is advertised. The bundles in the list are sorted by their advertised creationToken values, then downloaded in decreasing order. This avoids the previous strategy of downloading bundles in an arbitrary order and attempting to apply them (likely failing in the case of required commits) until discovering the order through attempted unbundling. During a fresh 'git clone', it may make sense to download the bundles in increasing order, since that would prevent the need to attempt unbundling a bundle with required commits that do not exist in our empty object store. The cost of testing an unbundle is quite low, and instead the chosen order is optimizing for a future bundle download during a 'git fetch' operation with a non-empty object store. Since the Git client continues fetching from the Git remote after downloading and unbundling bundles, the client's object store can be ahead of the bundle provider's object store. The next time it attempts to download from the bundle list, it makes most sense to download only the most-recent bundles until all tips successfully unbundle. The strategy implemented here provides that short-circuit where the client downloads a minimal set of bundles. However, we are not satisfied by the naive approach of downloading bundles until one successfully unbundles, expecting the earlier bundles to successfully unbundle now. The example repository in t5558 demonstrates this well: ---------------- bundle-4 4 / \ ----|---|------- bundle-3 | | | 3 | | ----|---|------- bundle-2 | | 2 | | | ----|---|------- bundle-1 \ / 1 | (previous commits) In this repository, if we already have the objects for bundle-1 and then try to fetch from this list, the naive approach will fail. bundle-4 requires both bundle-3 and bundle-2, though bundle-3 will successfully unbundle without bundle-2. Thus, the algorithm needs to keep this in mind. A later implementation detail will store the maximum creationToken seen during such a bundle download, and the client will avoid downloading a bundle unless its creationToken is strictly greater than that stored value. For now, if the client seeks to download from an identical bundle list since its previous download, it will download the most-recent bundle then stop since its required commits are already in the object store. Add tests that exercise this behavior, but we will expand upon these tests when incremental downloads during 'git fetch' make use of creationToken values. Signed-off-by: Derrick Stolee <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 512fccf commit 7903efb

File tree

3 files changed

+233
-9
lines changed

3 files changed

+233
-9
lines changed

bundle-uri.c

Lines changed: 154 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -447,6 +447,139 @@ static int download_bundle_to_file(struct remote_bundle_info *bundle, void *data
447447
return 0;
448448
}
449449

450+
struct bundles_for_sorting {
451+
struct remote_bundle_info **items;
452+
size_t alloc;
453+
size_t nr;
454+
};
455+
456+
static int append_bundle(struct remote_bundle_info *bundle, void *data)
457+
{
458+
struct bundles_for_sorting *list = data;
459+
list->items[list->nr++] = bundle;
460+
return 0;
461+
}
462+
463+
/**
464+
* For use in QSORT() to get a list sorted by creationToken
465+
* in decreasing order.
466+
*/
467+
static int compare_creation_token_decreasing(const void *va, const void *vb)
468+
{
469+
const struct remote_bundle_info * const *a = va;
470+
const struct remote_bundle_info * const *b = vb;
471+
472+
if ((*a)->creationToken > (*b)->creationToken)
473+
return -1;
474+
if ((*a)->creationToken < (*b)->creationToken)
475+
return 1;
476+
return 0;
477+
}
478+
479+
static int fetch_bundles_by_token(struct repository *r,
480+
struct bundle_list *list)
481+
{
482+
int cur;
483+
int move_direction = 0;
484+
struct bundle_list_context ctx = {
485+
.r = r,
486+
.list = list,
487+
.mode = list->mode,
488+
};
489+
struct bundles_for_sorting bundles = {
490+
.alloc = hashmap_get_size(&list->bundles),
491+
};
492+
493+
ALLOC_ARRAY(bundles.items, bundles.alloc);
494+
495+
for_all_bundles_in_list(list, append_bundle, &bundles);
496+
497+
QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing);
498+
499+
/*
500+
* Attempt to download and unbundle the minimum number of bundles by
501+
* creationToken in decreasing order. If we fail to unbundle (after
502+
* a successful download) then move to the next non-downloaded bundle
503+
* and attempt downloading. Once we succeed in applying a bundle,
504+
* move to the previous unapplied bundle and attempt to unbundle it
505+
* again.
506+
*
507+
* In the case of a fresh clone, we will likely download all of the
508+
* bundles before successfully unbundling the oldest one, then the
509+
* rest of the bundles unbundle successfully in increasing order
510+
* of creationToken.
511+
*
512+
* If there are existing objects, then this process may terminate
513+
* early when all required commits from "new" bundles exist in the
514+
* repo's object store.
515+
*/
516+
cur = 0;
517+
while (cur >= 0 && cur < bundles.nr) {
518+
struct remote_bundle_info *bundle = bundles.items[cur];
519+
if (!bundle->file) {
520+
/*
521+
* Not downloaded yet. Try downloading.
522+
*
523+
* Note that bundle->file is non-NULL if a download
524+
* was attempted, even if it failed to download.
525+
*/
526+
if (fetch_bundle_uri_internal(ctx.r, bundle, ctx.depth + 1, ctx.list)) {
527+
/* Mark as unbundled so we do not retry. */
528+
bundle->unbundled = 1;
529+
530+
/* Try looking deeper in the list. */
531+
move_direction = 1;
532+
goto move;
533+
}
534+
535+
/* We expect bundles when using creationTokens. */
536+
if (!is_bundle(bundle->file, 1)) {
537+
warning(_("file downloaded from '%s' is not a bundle"),
538+
bundle->uri);
539+
break;
540+
}
541+
}
542+
543+
if (bundle->file && !bundle->unbundled) {
544+
/*
545+
* This was downloaded, but not successfully
546+
* unbundled. Try unbundling again.
547+
*/
548+
if (unbundle_from_file(ctx.r, bundle->file)) {
549+
/* Try looking deeper in the list. */
550+
move_direction = 1;
551+
} else {
552+
/*
553+
* Succeeded in unbundle. Retry bundles
554+
* that previously failed to unbundle.
555+
*/
556+
move_direction = -1;
557+
bundle->unbundled = 1;
558+
}
559+
}
560+
561+
/*
562+
* Else case: downloaded and unbundled successfully.
563+
* Skip this by moving in the same direction as the
564+
* previous step.
565+
*/
566+
567+
move:
568+
/* Move in the specified direction and repeat. */
569+
cur += move_direction;
570+
}
571+
572+
free(bundles.items);
573+
574+
/*
575+
* We succeed if the loop terminates because 'cur' drops below
576+
* zero. The other case is that we terminate because 'cur'
577+
* reaches the end of the list, so we have a failure no matter
578+
* which bundles we apply from the list.
579+
*/
580+
return cur >= 0;
581+
}
582+
450583
static int download_bundle_list(struct repository *r,
451584
struct bundle_list *local_list,
452585
struct bundle_list *global_list,
@@ -484,7 +617,15 @@ static int fetch_bundle_list_in_config_format(struct repository *r,
484617
goto cleanup;
485618
}
486619

487-
if ((result = download_bundle_list(r, &list_from_bundle,
620+
/*
621+
* If this list uses the creationToken heuristic, then the URIs
622+
* it advertises are expected to be bundles, not nested lists.
623+
* We can drop 'global_list' and 'depth'.
624+
*/
625+
if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) {
626+
result = fetch_bundles_by_token(r, &list_from_bundle);
627+
global_list->heuristic = BUNDLE_HEURISTIC_CREATIONTOKEN;
628+
} else if ((result = download_bundle_list(r, &list_from_bundle,
488629
global_list, depth)))
489630
goto cleanup;
490631

@@ -626,6 +767,14 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list)
626767
int result;
627768
struct bundle_list global_list;
628769

770+
/*
771+
* If the creationToken heuristic is used, then the URIs
772+
* advertised by 'list' are not nested lists and instead
773+
* direct bundles. We do not need to use global_list.
774+
*/
775+
if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
776+
return fetch_bundles_by_token(r, list);
777+
629778
init_bundle_list(&global_list);
630779

631780
/* If a bundle is added to this global list, then it is required. */
@@ -634,7 +783,10 @@ int fetch_bundle_list(struct repository *r, struct bundle_list *list)
634783
if ((result = download_bundle_list(r, list, &global_list, 0)))
635784
goto cleanup;
636785

637-
result = unbundle_all_bundles(r, &global_list);
786+
if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN)
787+
result = fetch_bundles_by_token(r, list);
788+
else
789+
result = unbundle_all_bundles(r, &global_list);
638790

639791
cleanup:
640792
for_all_bundles_in_list(&global_list, unlink_bundle, NULL);

t/t5558-clone-bundle-uri.sh

Lines changed: 33 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -401,17 +401,43 @@ test_expect_success 'clone bundle list (http, creationToken)' '
401401
git -C clone-list-http-2 cat-file --batch-check <oids &&
402402
403403
cat >expect <<-EOF &&
404-
$HTTPD_URL/bundle-1.bundle
405-
$HTTPD_URL/bundle-2.bundle
406-
$HTTPD_URL/bundle-3.bundle
404+
$HTTPD_URL/bundle-list
407405
$HTTPD_URL/bundle-4.bundle
406+
$HTTPD_URL/bundle-3.bundle
407+
$HTTPD_URL/bundle-2.bundle
408+
$HTTPD_URL/bundle-1.bundle
409+
EOF
410+
411+
test_remote_https_urls <trace-clone.txt >actual &&
412+
test_cmp expect actual
413+
'
414+
415+
test_expect_success 'clone incomplete bundle list (http, creationToken)' '
416+
test_when_finished rm -f trace*.txt &&
417+
418+
cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" &&
419+
cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF &&
420+
[bundle]
421+
version = 1
422+
mode = all
423+
heuristic = creationToken
424+
425+
[bundle "bundle-1"]
426+
uri = bundle-1.bundle
427+
creationToken = 1
428+
EOF
429+
430+
GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \
431+
git clone --bundle-uri="$HTTPD_URL/bundle-list" \
432+
--single-branch --branch=base --no-tags \
433+
"$HTTPD_URL/smart/fetch.git" clone-token-http &&
434+
435+
cat >expect <<-EOF &&
408436
$HTTPD_URL/bundle-list
437+
$HTTPD_URL/bundle-1.bundle
409438
EOF
410439
411-
# Since the creationToken heuristic is not yet understood by the
412-
# client, the order cannot be verified at this moment. Sort the
413-
# list for consistent results.
414-
test_remote_https_urls <trace-clone.txt | sort >actual &&
440+
test_remote_https_urls <trace-clone.txt >actual &&
415441
test_cmp expect actual
416442
'
417443

t/t5601-clone.sh

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -831,6 +831,52 @@ test_expect_success 'auto-discover multiple bundles from HTTP clone' '
831831
grep -f pattern trace.txt
832832
'
833833

834+
test_expect_success 'auto-discover multiple bundles from HTTP clone: creationToken heuristic' '
835+
test_when_finished rm -rf "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" &&
836+
test_when_finished rm -rf clone-heuristic trace*.txt &&
837+
838+
test_commit -C src newest &&
839+
git -C src bundle create "$HTTPD_DOCUMENT_ROOT_PATH/newest.bundle" HEAD~1..HEAD &&
840+
git clone --bare --no-local src "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" &&
841+
842+
cat >>"$HTTPD_DOCUMENT_ROOT_PATH/repo4.git/config" <<-EOF &&
843+
[uploadPack]
844+
advertiseBundleURIs = true
845+
846+
[bundle]
847+
version = 1
848+
mode = all
849+
heuristic = creationToken
850+
851+
[bundle "everything"]
852+
uri = $HTTPD_URL/everything.bundle
853+
creationtoken = 1
854+
855+
[bundle "new"]
856+
uri = $HTTPD_URL/new.bundle
857+
creationtoken = 2
858+
859+
[bundle "newest"]
860+
uri = $HTTPD_URL/newest.bundle
861+
creationtoken = 3
862+
EOF
863+
864+
GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \
865+
git -c protocol.version=2 \
866+
-c transfer.bundleURI=true clone \
867+
"$HTTPD_URL/smart/repo4.git" clone-heuristic &&
868+
869+
cat >expect <<-EOF &&
870+
$HTTPD_URL/newest.bundle
871+
$HTTPD_URL/new.bundle
872+
$HTTPD_URL/everything.bundle
873+
EOF
874+
875+
# We should fetch all bundles in the expected order.
876+
test_remote_https_urls <trace-clone.txt >actual &&
877+
test_cmp expect actual
878+
'
879+
834880
# DO NOT add non-httpd-specific tests here, because the last part of this
835881
# test script is only executed when httpd is available and enabled.
836882

0 commit comments

Comments
 (0)