Skip to content

Commit 4c7f956

Browse files
jonathantanmygitster
authored andcommitted
fetch-pack: exclude blobs when lazy-fetching trees
A partial clone with missing trees can be obtained using "git clone --filter=tree:none <repo>". In such a repository, when a tree needs to be lazily fetched, any tree or blob it directly or indirectly references is fetched as well, regardless of whether the original command required those objects, or if the local repository already had some of them. This is because the fetch protocol, which the lazy fetch uses, does not allow clients to request that only the wanted objects be sent, which would be the ideal solution. This patch implements a partial solution: specify the "blob:none" filter, somewhat reducing the fetch payload. This change has no effect when lazily fetching blobs (due to how filters work). And if lazily fetching a commit (such repositories are difficult to construct and is not a use case we support very well, but it is possible), referenced commits and trees are still fetched - only the blobs are not fetched. The necessary code change is done in fetch_pack() instead of somewhere closer to where the "filter" instruction is written to the wire so that only one part of the code needs to be changed in order for users of all protocol versions to benefit from this optimization. Signed-off-by: Jonathan Tan <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 12f19a9 commit 4c7f956

File tree

3 files changed

+62
-0
lines changed

3 files changed

+62
-0
lines changed

fetch-pack.c

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1615,6 +1615,20 @@ struct ref *fetch_pack(struct fetch_pack_args *args,
16151615
if (nr_sought)
16161616
nr_sought = remove_duplicates_in_refs(sought, nr_sought);
16171617

1618+
if (args->no_dependents && !args->filter_options.choice) {
1619+
/*
1620+
* The protocol does not support requesting that only the
1621+
* wanted objects be sent, so approximate this by setting a
1622+
* "blob:none" filter if no filter is already set. This works
1623+
* for all object types: note that wanted blobs will still be
1624+
* sent because they are directly specified as a "want".
1625+
*
1626+
* NEEDSWORK: Add an option in the protocol to request that
1627+
* only the wanted objects be sent, and implement it.
1628+
*/
1629+
parse_list_objects_filter(&args->filter_options, "blob:none");
1630+
}
1631+
16181632
if (!ref) {
16191633
packet_flush(fd[1]);
16201634
die(_("no matching remote head"));

fetch-pack.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,13 @@ struct fetch_pack_args {
4343
unsigned from_promisor:1;
4444

4545
/*
46+
* Attempt to fetch only the wanted objects, and not any objects
47+
* referred to by them. Due to protocol limitations, extraneous
48+
* objects may still be included. (When fetching non-blob
49+
* objects, only blobs are excluded; when fetching a blob, the
50+
* blob itself will still be sent. The client does not need to
51+
* know whether a wanted object is a blob or not.)
52+
*
4653
* If 1, fetch_pack() will also not modify any object flags.
4754
* This allows fetch_pack() to safely be called by any function,
4855
* regardless of which object flags it uses (if any).

t/t0410-partial-clone.sh

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -170,6 +170,47 @@ test_expect_success 'fetching of missing objects' '
170170
git verify-pack --verbose "$IDX" | grep "$HASH"
171171
'
172172

173+
test_expect_success 'fetching of missing blobs works' '
174+
rm -rf server repo &&
175+
test_create_repo server &&
176+
test_commit -C server foo &&
177+
git -C server repack -a -d --write-bitmap-index &&
178+
179+
git clone "file://$(pwd)/server" repo &&
180+
git hash-object repo/foo.t >blobhash &&
181+
rm -rf repo/.git/objects/* &&
182+
183+
git -C server config uploadpack.allowanysha1inwant 1 &&
184+
git -C server config uploadpack.allowfilter 1 &&
185+
git -C repo config core.repositoryformatversion 1 &&
186+
git -C repo config extensions.partialclone "origin" &&
187+
188+
git -C repo cat-file -p $(cat blobhash)
189+
'
190+
191+
test_expect_success 'fetching of missing trees does not fetch blobs' '
192+
rm -rf server repo &&
193+
test_create_repo server &&
194+
test_commit -C server foo &&
195+
git -C server repack -a -d --write-bitmap-index &&
196+
197+
git clone "file://$(pwd)/server" repo &&
198+
git -C repo rev-parse foo^{tree} >treehash &&
199+
git hash-object repo/foo.t >blobhash &&
200+
rm -rf repo/.git/objects/* &&
201+
202+
git -C server config uploadpack.allowanysha1inwant 1 &&
203+
git -C server config uploadpack.allowfilter 1 &&
204+
git -C repo config core.repositoryformatversion 1 &&
205+
git -C repo config extensions.partialclone "origin" &&
206+
git -C repo cat-file -p $(cat treehash) &&
207+
208+
# Ensure that the tree, but not the blob, is fetched
209+
git -C repo rev-list --objects --missing=print $(cat treehash) >objects &&
210+
grep "^$(cat treehash)" objects &&
211+
grep "^[?]$(cat blobhash)" objects
212+
'
213+
173214
test_expect_success 'rev-list stops traversal at missing and promised commit' '
174215
rm -rf repo &&
175216
test_create_repo repo &&

0 commit comments

Comments
 (0)