Skip to content

Commit 0903d8b

Browse files
committed
Merge branch 'ds/bundle-uri-4'
Bundle URIs part 4. * ds/bundle-uri-4: clone: unbundle the advertised bundles bundle-uri: download bundles from an advertised list bundle-uri: allow relative URLs in bundle lists strbuf: introduce strbuf_strip_file_from_path() bundle-uri: serve bundle.* keys from config bundle-uri client: add helper for testing server transport: rename got_remote_heads bundle-uri client: add boolean transfer.bundleURI setting clone: request the 'bundle-uri' command when available t: create test harness for 'bundle-uri' command protocol v2: add server-side "bundle-uri" skeleton
2 parents 3f2e4c0 + 876094a commit 0903d8b

24 files changed

+1041
-12
lines changed

Documentation/config/transfer.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,3 +115,9 @@ transfer.unpackLimit::
115115
transfer.advertiseSID::
116116
Boolean. When true, client and server processes will advertise their
117117
unique session IDs to their remote counterpart. Defaults to false.
118+
119+
transfer.bundleURI::
120+
When `true`, local `git clone` commands will request bundle
121+
information from the remote server (if advertised) and download
122+
bundles before continuing the clone through the Git protocol.
123+
Defaults to `false`.

Documentation/gitprotocol-v2.txt

Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -578,6 +578,207 @@ and associated requested information, each separated by a single space.
578578

579579
obj-info = obj-id SP obj-size
580580

581+
bundle-uri
582+
~~~~~~~~~~
583+
584+
If the 'bundle-uri' capability is advertised, the server supports the
585+
`bundle-uri' command.
586+
587+
The capability is currently advertised with no value (i.e. not
588+
"bundle-uri=somevalue"), a value may be added in the future for
589+
supporting command-wide extensions. Clients MUST ignore any unknown
590+
capability values and proceed with the 'bundle-uri` dialog they
591+
support.
592+
593+
The 'bundle-uri' command is intended to be issued before `fetch` to
594+
get URIs to bundle files (see linkgit:git-bundle[1]) to "seed" and
595+
inform the subsequent `fetch` command.
596+
597+
The client CAN issue `bundle-uri` before or after any other valid
598+
command. To be useful to clients it's expected that it'll be issued
599+
after an `ls-refs` and before `fetch`, but CAN be issued at any time
600+
in the dialog.
601+
602+
DISCUSSION of bundle-uri
603+
^^^^^^^^^^^^^^^^^^^^^^^^
604+
605+
The intent of the feature is optimize for server resource consumption
606+
in the common case by changing the common case of fetching a very
607+
large PACK during linkgit:git-clone[1] into a smaller incremental
608+
fetch.
609+
610+
It also allows servers to achieve better caching in combination with
611+
an `uploadpack.packObjectsHook` (see linkgit:git-config[1]).
612+
613+
By having new clones or fetches be a more predictable and common
614+
negotiation against the tips of recently produces *.bundle file(s).
615+
Servers might even pre-generate the results of such negotiations for
616+
the `uploadpack.packObjectsHook` as new pushes come in.
617+
618+
One way that servers could take advantage of these bundles is that the
619+
server would anticipate that fresh clones will download a known bundle,
620+
followed by catching up to the current state of the repository using ref
621+
tips found in that bundle (or bundles).
622+
623+
PROTOCOL for bundle-uri
624+
^^^^^^^^^^^^^^^^^^^^^^^
625+
626+
A `bundle-uri` request takes no arguments, and as noted above does not
627+
currently advertise a capability value. Both may be added in the
628+
future.
629+
630+
When the client issues a `command=bundle-uri` request, the response is a
631+
list of key-value pairs provided as packet lines with value
632+
`<key>=<value>`. Each `<key>` should be interpreted as a config key from
633+
the `bundle.*` namespace to construct a list of bundles. These keys are
634+
grouped by a `bundle.<id>.` subsection, where each key corresponding to a
635+
given `<id>` contributes attributes to the bundle defined by that `<id>`.
636+
See linkgit:git-config[1] for the specific details of these keys and how
637+
the Git client will interpret their values.
638+
639+
Clients MUST parse the line according to the above format, lines that do
640+
not conform to the format SHOULD be discarded. The user MAY be warned in
641+
such a case.
642+
643+
bundle-uri CLIENT AND SERVER EXPECTATIONS
644+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
645+
646+
URI CONTENTS::
647+
The content at the advertised URIs MUST be one of two types.
648+
+
649+
The advertised URI may contain a bundle file that `git bundle verify`
650+
would accept. I.e. they MUST contain one or more reference tips for
651+
use by the client, MUST indicate prerequisites (in any) with standard
652+
"-" prefixes, and MUST indicate their "object-format", if
653+
applicable.
654+
+
655+
The advertised URI may alternatively contain a plaintext file that `git
656+
config --list` would accept (with the `--file` option). The key-value
657+
pairs in this list are in the `bundle.*` namespace (see
658+
linkgit:git-config[1]).
659+
660+
bundle-uri CLIENT ERROR RECOVERY::
661+
A client MUST above all gracefully degrade on errors, whether that
662+
error is because of bad missing/data in the bundle URI(s), because
663+
that client is too dumb to e.g. understand and fully parse out bundle
664+
headers and their prerequisite relationships, or something else.
665+
+
666+
Server operators should feel confident in turning on "bundle-uri" and
667+
not worry if e.g. their CDN goes down that clones or fetches will run
668+
into hard failures. Even if the server bundle bundle(s) are
669+
incomplete, or bad in some way the client should still end up with a
670+
functioning repository, just as if it had chosen not to use this
671+
protocol extension.
672+
+
673+
All subsequent discussion on client and server interaction MUST keep
674+
this in mind.
675+
676+
bundle-uri SERVER TO CLIENT::
677+
The ordering of the returned bundle uris is not significant. Clients
678+
MUST parse their headers to discover their contained OIDS and
679+
prerequisites. A client MUST consider the content of the bundle(s)
680+
themselves and their header as the ultimate source of truth.
681+
+
682+
A server MAY even return bundle(s) that don't have any direct
683+
relationship to the repository being cloned (either through accident,
684+
or intentional "clever" configuration), and expect a client to sort
685+
out what data they'd like from the bundle(s), if any.
686+
687+
bundle-uri CLIENT TO SERVER::
688+
The client SHOULD provide reference tips found in the bundle header(s)
689+
as 'have' lines in any subsequent `fetch` request. A client MAY also
690+
ignore the bundle(s) entirely if doing so is deemed worse for some
691+
reason, e.g. if the bundles can't be downloaded, it doesn't like the
692+
tips it finds etc.
693+
694+
WHEN ADVERTISED BUNDLE(S) REQUIRE NO FURTHER NEGOTIATION::
695+
If after issuing `bundle-uri` and `ls-refs`, and getting the header(s)
696+
of the bundle(s) the client finds that the ref tips it wants can be
697+
retrieved entirely from advertised bundle(s), the client MAY disconnect
698+
from the Git server. The results of such a 'clone' or 'fetch' should be
699+
indistinguishable from the state attained without using bundle-uri.
700+
701+
EARLY CLIENT DISCONNECTIONS AND ERROR RECOVERY::
702+
A client MAY perform an early disconnect while still downloading the
703+
bundle(s) (having streamed and parsed their headers). In such a case
704+
the client MUST gracefully recover from any errors related to
705+
finishing the download and validation of the bundle(s).
706+
+
707+
I.e. a client might need to re-connect and issue a 'fetch' command,
708+
and possibly fall back to not making use of 'bundle-uri' at all.
709+
+
710+
This "MAY" behavior is specified as such (and not a "SHOULD") on the
711+
assumption that a server advertising bundle uris is more likely than
712+
not to be serving up a relatively large repository, and to be pointing
713+
to URIs that have a good chance of being in working order. A client
714+
MAY e.g. look at the payload size of the bundles as a heuristic to see
715+
if an early disconnect is worth it, should falling back on a full
716+
"fetch" dialog be necessary.
717+
718+
WHEN ADVERTISED BUNDLE(S) REQUIRE FURTHER NEGOTIATION::
719+
A client SHOULD commence a negotiation of a PACK from the server via
720+
the "fetch" command using the OID tips found in advertised bundles,
721+
even if's still in the process of downloading those bundle(s).
722+
+
723+
This allows for aggressive early disconnects from any interactive
724+
server dialog. The client blindly trusts that the advertised OID tips
725+
are relevant, and issues them as 'have' lines, it then requests any
726+
tips it would like (usually from the "ls-refs" advertisement) via
727+
'want' lines. The server will then compute a (hopefully small) PACK
728+
with the expected difference between the tips from the bundle(s) and
729+
the data requested.
730+
+
731+
The only connection the client then needs to keep active is to the
732+
concurrently downloading static bundle(s), when those and the
733+
incremental PACK are retrieved they should be inflated and
734+
validated. Any errors at this point should be gracefully recovered
735+
from, see above.
736+
737+
bundle-uri PROTOCOL FEATURES
738+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
739+
740+
The client constructs a bundle list from the `<key>=<value>` pairs
741+
provided by the server. These pairs are part of the `bundle.*` namespace
742+
as documented in linkgit:git-config[1]. In this section, we discuss some
743+
of these keys and describe the actions the client will do in response to
744+
this information.
745+
746+
In particular, the `bundle.version` key specifies an integer value. The
747+
only accepted value at the moment is `1`, but if the client sees an
748+
unexpected value here then the client MUST ignore the bundle list.
749+
750+
As long as `bundle.version` is understood, all other unknown keys MAY be
751+
ignored by the client. The server will guarantee compatibility with older
752+
clients, though newer clients may be better able to use the extra keys to
753+
minimize downloads.
754+
755+
Any backwards-incompatible addition of pre-URI key-value will be
756+
guarded by a new `bundle.version` value or values in 'bundle-uri'
757+
capability advertisement itself, and/or by new future `bundle-uri`
758+
request arguments.
759+
760+
Some example key-value pairs that are not currently implemented but could
761+
be implemented in the future include:
762+
763+
* Add a "hash=<val>" or "size=<bytes>" advertise the expected hash or
764+
size of the bundle file.
765+
766+
* Advertise that one or more bundle files are the same (to e.g. have
767+
clients round-robin or otherwise choose one of N possible files).
768+
769+
* A "oid=<OID>" shortcut and "prerequisite=<OID>" shortcut. For
770+
expressing the common case of a bundle with one tip and no
771+
prerequisites, or one tip and one prerequisite.
772+
+
773+
This would allow for optimizing the common case of servers who'd like
774+
to provide one "big bundle" containing only their "main" branch,
775+
and/or incremental updates thereof.
776+
+
777+
A client receiving such a a response MAY assume that they can skip
778+
retrieving the header from a bundle at the indicated URI, and thus
779+
save themselves and the server(s) the request(s) needed to inspect the
780+
headers of that bundle or bundles.
781+
581782
GIT
582783
---
583784
Part of the linkgit:git[1] suite

builtin/clone.c

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1271,6 +1271,27 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
12711271
if (refs)
12721272
mapped_refs = wanted_peer_refs(refs, &remote->fetch);
12731273

1274+
if (!bundle_uri) {
1275+
/*
1276+
* Populate transport->got_remote_bundle_uri and
1277+
* transport->bundle_uri. We might get nothing.
1278+
*/
1279+
transport_get_remote_bundle_uri(transport);
1280+
1281+
if (transport->bundles &&
1282+
hashmap_get_size(&transport->bundles->bundles)) {
1283+
/* At this point, we need the_repository to match the cloned repo. */
1284+
if (repo_init(the_repository, git_dir, work_tree))
1285+
warning(_("failed to initialize the repo, skipping bundle URI"));
1286+
else if (fetch_bundle_list(the_repository,
1287+
transport->bundles))
1288+
warning(_("failed to fetch advertised bundles"));
1289+
} else {
1290+
clear_bundle_list(transport->bundles);
1291+
FREE_AND_NULL(transport->bundles);
1292+
}
1293+
}
1294+
12741295
if (mapped_refs) {
12751296
int hash_algo = hash_algo_by_ptr(transport_get_hash_algo(transport));
12761297

bundle-uri.c

Lines changed: 86 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
#include "hashmap.h"
88
#include "pkt-line.h"
99
#include "config.h"
10+
#include "remote.h"
1011

1112
static int compare_bundles(const void *hashmap_cmp_fn_data,
1213
const struct hashmap_entry *he1,
@@ -49,6 +50,7 @@ void clear_bundle_list(struct bundle_list *list)
4950

5051
for_all_bundles_in_list(list, clear_remote_bundle_info, NULL);
5152
hashmap_clear_and_free(&list->bundles, struct remote_bundle_info, ent);
53+
free(list->baseURI);
5254
}
5355

5456
int for_all_bundles_in_list(struct bundle_list *list,
@@ -163,7 +165,7 @@ static int bundle_list_update(const char *key, const char *value,
163165
if (!strcmp(subkey, "uri")) {
164166
if (bundle->uri)
165167
return -1;
166-
bundle->uri = xstrdup(value);
168+
bundle->uri = relative_url(list->baseURI, value, NULL);
167169
return 0;
168170
}
169171

@@ -190,6 +192,18 @@ int bundle_uri_parse_config_format(const char *uri,
190192
.error_action = CONFIG_ERROR_ERROR,
191193
};
192194

195+
if (!list->baseURI) {
196+
struct strbuf baseURI = STRBUF_INIT;
197+
strbuf_addstr(&baseURI, uri);
198+
199+
/*
200+
* If the URI does not end with a trailing slash, then
201+
* remove the filename portion of the path. This is
202+
* important for relative URIs.
203+
*/
204+
strbuf_strip_file_from_path(&baseURI);
205+
list->baseURI = strbuf_detach(&baseURI, NULL);
206+
}
193207
result = git_config_from_file_with_options(config_to_bundle_list,
194208
filename, list,
195209
&opts);
@@ -563,6 +577,77 @@ int fetch_bundle_uri(struct repository *r, const char *uri)
563577
return result;
564578
}
565579

580+
int fetch_bundle_list(struct repository *r, struct bundle_list *list)
581+
{
582+
int result;
583+
struct bundle_list global_list;
584+
585+
init_bundle_list(&global_list);
586+
587+
/* If a bundle is added to this global list, then it is required. */
588+
global_list.mode = BUNDLE_MODE_ALL;
589+
590+
if ((result = download_bundle_list(r, list, &global_list, 0)))
591+
goto cleanup;
592+
593+
result = unbundle_all_bundles(r, &global_list);
594+
595+
cleanup:
596+
for_all_bundles_in_list(&global_list, unlink_bundle, NULL);
597+
clear_bundle_list(&global_list);
598+
return result;
599+
}
600+
601+
/**
602+
* API for serve.c.
603+
*/
604+
605+
int bundle_uri_advertise(struct repository *r, struct strbuf *value UNUSED)
606+
{
607+
static int advertise_bundle_uri = -1;
608+
609+
if (advertise_bundle_uri != -1)
610+
goto cached;
611+
612+
advertise_bundle_uri = 0;
613+
repo_config_get_maybe_bool(r, "uploadpack.advertisebundleuris", &advertise_bundle_uri);
614+
615+
cached:
616+
return advertise_bundle_uri;
617+
}
618+
619+
static int config_to_packet_line(const char *key, const char *value, void *data)
620+
{
621+
struct packet_reader *writer = data;
622+
623+
if (!strncmp(key, "bundle.", 7))
624+
packet_write_fmt(writer->fd, "%s=%s", key, value);
625+
626+
return 0;
627+
}
628+
629+
int bundle_uri_command(struct repository *r,
630+
struct packet_reader *request)
631+
{
632+
struct packet_writer writer;
633+
packet_writer_init(&writer, 1);
634+
635+
while (packet_reader_read(request) == PACKET_READ_NORMAL)
636+
die(_("bundle-uri: unexpected argument: '%s'"), request->line);
637+
if (request->status != PACKET_READ_FLUSH)
638+
die(_("bundle-uri: expected flush after arguments"));
639+
640+
/*
641+
* Read all "bundle.*" config lines to the client as key=value
642+
* packet lines.
643+
*/
644+
git_config(config_to_packet_line, &writer);
645+
646+
packet_writer_flush(&writer);
647+
648+
return 0;
649+
}
650+
566651
/**
567652
* General API for {transport,connect}.c etc.
568653
*/

0 commit comments

Comments
 (0)