Skip to content

Commit f05da2b

Browse files
steadmongitster
authored andcommitted
clone, submodule: pass partial clone filters to submodules
When cloning a repo with a --filter and with --recurse-submodules enabled, the partial clone filter only applies to the top-level repo. This can lead to unexpected bandwidth and disk usage for projects which include large submodules. For example, a user might wish to make a partial clone of Gerrit and would run: `git clone --recurse-submodules --filter=blob:5k https://gerrit.googlesource.com/gerrit`. However, only the superproject would be a partial clone; all the submodules would have all blobs downloaded regardless of their size. With this change, the same filter can also be applied to submodules, meaning the expected bandwidth and disk savings apply consistently. To avoid changing default behavior, add a new clone flag, `--also-filter-submodules`. When this is set along with `--filter` and `--recurse-submodules`, the filter spec is passed along to git-submodule and git-submodule--helper, such that submodule clones also have the filter applied. This applies the same filter to the superproject and all submodules. Users who need to customize the filter per-submodule would need to clone with `--no-recurse-submodules` and then manually initialize each submodule with the proper filter. Applying filters to submodules should be safe thanks to Jonathan Tan's recent work [1, 2, 3] eliminating the use of alternates as a method of accessing submodule objects, so any submodule object access now triggers a lazy fetch from the submodule's promisor remote if the accessed object is missing. This patch is a reworked version of [4], which was created prior to Jonathan Tan's work. [1]: 8721e2e (Merge branch 'jt/partial-clone-submodule-1', 2021-07-16) [2]: 11e5d0a (Merge branch 'jt/grep-wo-submodule-odb-as-alternate', 2021-09-20) [3]: 162a13b (Merge branch 'jt/no-abuse-alternate-odb-for-submodules', 2021-10-25) [4]: https://lore.kernel.org/git/52bf9d45b8e2b72ff32aa773f2415bf7b2b86da2.1563322192.git.steadmon@google.com/ Signed-off-by: Josh Steadmon <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 297ca89 commit f05da2b

File tree

8 files changed

+175
-8
lines changed

8 files changed

+175
-8
lines changed

Documentation/config/clone.txt

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,3 +6,8 @@ clone.defaultRemoteName::
66
clone.rejectShallow::
77
Reject to clone a repository if it is a shallow one, can be overridden by
88
passing option `--reject-shallow` in command line. See linkgit:git-clone[1]
9+
10+
clone.filterSubmodules::
11+
If a partial clone filter is provided (see `--filter` in
12+
linkgit:git-rev-list[1]) and `--recurse-submodules` is used, also apply
13+
the filter to submodules.

Documentation/git-clone.txt

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ SYNOPSIS
1616
[--depth <depth>] [--[no-]single-branch] [--no-tags]
1717
[--recurse-submodules[=<pathspec>]] [--[no-]shallow-submodules]
1818
[--[no-]remote-submodules] [--jobs <n>] [--sparse] [--[no-]reject-shallow]
19-
[--filter=<filter>] [--] <repository>
19+
[--filter=<filter> [--also-filter-submodules]] [--] <repository>
2020
[<directory>]
2121

2222
DESCRIPTION
@@ -182,6 +182,11 @@ objects from the source repository into a pack in the cloned repository.
182182
at least `<size>`. For more details on filter specifications, see
183183
the `--filter` option in linkgit:git-rev-list[1].
184184

185+
--also-filter-submodules::
186+
Also apply the partial clone filter to any submodules in the repository.
187+
Requires `--filter` and `--recurse-submodules`. This can be turned on by
188+
default by setting the `clone.filterSubmodules` config option.
189+
185190
--mirror::
186191
Set up a mirror of the source repository. This implies `--bare`.
187192
Compared to `--bare`, `--mirror` not only maps local branches of the

Documentation/git-submodule.txt

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ If you really want to remove a submodule from the repository and commit
133133
that use linkgit:git-rm[1] instead. See linkgit:gitsubmodules[7] for removal
134134
options.
135135

136-
update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--[no-]single-branch] [--] [<path>...]::
136+
update [--init] [--remote] [-N|--no-fetch] [--[no-]recommend-shallow] [-f|--force] [--checkout|--rebase|--merge] [--reference <repository>] [--depth <depth>] [--recursive] [--jobs <n>] [--[no-]single-branch] [--filter <filter spec>] [--] [<path>...]::
137137
+
138138
--
139139
Update the registered submodules to match what the superproject
@@ -177,6 +177,10 @@ submodule with the `--init` option.
177177

178178
If `--recursive` is specified, this command will recurse into the
179179
registered submodules, and update any nested submodules within.
180+
181+
If `--filter <filter spec>` is specified, the given partial clone filter will be
182+
applied to the submodule. See linkgit:git-rev-list[1] for details on filter
183+
specifications.
180184
--
181185
set-branch (-b|--branch) <branch> [--] <path>::
182186
set-branch (-d|--default) [--] <path>::

builtin/clone.c

Lines changed: 34 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,8 @@ static int option_dissociate;
7171
static int max_jobs = -1;
7272
static struct string_list option_recurse_submodules = STRING_LIST_INIT_NODUP;
7373
static struct list_objects_filter_options filter_options;
74+
static int option_filter_submodules = -1; /* unspecified */
75+
static int config_filter_submodules = -1; /* unspecified */
7476
static struct string_list server_options = STRING_LIST_INIT_NODUP;
7577
static int option_remote_submodules;
7678

@@ -150,6 +152,8 @@ static struct option builtin_clone_options[] = {
150152
OPT_SET_INT('6', "ipv6", &family, N_("use IPv6 addresses only"),
151153
TRANSPORT_FAMILY_IPV6),
152154
OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
155+
OPT_BOOL(0, "also-filter-submodules", &option_filter_submodules,
156+
N_("apply partial clone filters to submodules")),
153157
OPT_BOOL(0, "remote-submodules", &option_remote_submodules,
154158
N_("any cloned submodules will use their remote-tracking branch")),
155159
OPT_BOOL(0, "sparse", &option_sparse_checkout,
@@ -650,7 +654,7 @@ static int git_sparse_checkout_init(const char *repo)
650654
return result;
651655
}
652656

653-
static int checkout(int submodule_progress)
657+
static int checkout(int submodule_progress, int filter_submodules)
654658
{
655659
struct object_id oid;
656660
char *head;
@@ -729,6 +733,10 @@ static int checkout(int submodule_progress)
729733
strvec_push(&args, "--no-fetch");
730734
}
731735

736+
if (filter_submodules && filter_options.choice)
737+
strvec_pushf(&args, "--filter=%s",
738+
expand_list_objects_filter_spec(&filter_options));
739+
732740
if (option_single_branch >= 0)
733741
strvec_push(&args, option_single_branch ?
734742
"--single-branch" :
@@ -749,6 +757,8 @@ static int git_clone_config(const char *k, const char *v, void *cb)
749757
}
750758
if (!strcmp(k, "clone.rejectshallow"))
751759
config_reject_shallow = git_config_bool(k, v);
760+
if (!strcmp(k, "clone.filtersubmodules"))
761+
config_filter_submodules = git_config_bool(k, v);
752762

753763
return git_default_config(k, v, cb);
754764
}
@@ -871,6 +881,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
871881
struct remote *remote;
872882
int err = 0, complete_refs_before_fetch = 1;
873883
int submodule_progress;
884+
int filter_submodules = 0;
874885

875886
struct transport_ls_refs_options transport_ls_refs_options =
876887
TRANSPORT_LS_REFS_OPTIONS_INIT;
@@ -1066,6 +1077,27 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
10661077
if (option_reject_shallow != -1)
10671078
reject_shallow = option_reject_shallow;
10681079

1080+
/*
1081+
* If option_filter_submodules is specified from CLI option,
1082+
* ignore config_filter_submodules from git_clone_config.
1083+
*/
1084+
if (config_filter_submodules != -1)
1085+
filter_submodules = config_filter_submodules;
1086+
if (option_filter_submodules != -1)
1087+
filter_submodules = option_filter_submodules;
1088+
1089+
/*
1090+
* Exit if the user seems to be doing something silly with submodule
1091+
* filter flags (but not with filter configs, as those should be
1092+
* set-and-forget).
1093+
*/
1094+
if (option_filter_submodules > 0 && !filter_options.choice)
1095+
die(_("the option '%s' requires '%s'"),
1096+
"--also-filter-submodules", "--filter");
1097+
if (option_filter_submodules > 0 && !option_recurse_submodules.nr)
1098+
die(_("the option '%s' requires '%s'"),
1099+
"--also-filter-submodules", "--recurse-submodules");
1100+
10691101
/*
10701102
* apply the remote name provided by --origin only after this second
10711103
* call to git_config, to ensure it overrides all config-based values.
@@ -1299,7 +1331,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix)
12991331
}
13001332

13011333
junk_mode = JUNK_LEAVE_REPO;
1302-
err = checkout(submodule_progress);
1334+
err = checkout(submodule_progress, filter_submodules);
13031335

13041336
free(remote_name);
13051337
strbuf_release(&reflog_msg);

builtin/submodule--helper.c

Lines changed: 27 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@
2020
#include "diff.h"
2121
#include "object-store.h"
2222
#include "advice.h"
23+
#include "list-objects-filter-options.h"
2324

2425
#define OPT_QUIET (1 << 0)
2526
#define OPT_CACHED (1 << 1)
@@ -1630,6 +1631,7 @@ struct module_clone_data {
16301631
const char *name;
16311632
const char *url;
16321633
const char *depth;
1634+
struct list_objects_filter_options *filter_options;
16331635
struct string_list reference;
16341636
unsigned int quiet: 1;
16351637
unsigned int progress: 1;
@@ -1796,6 +1798,10 @@ static int clone_submodule(struct module_clone_data *clone_data)
17961798
strvec_push(&cp.args, "--dissociate");
17971799
if (sm_gitdir && *sm_gitdir)
17981800
strvec_pushl(&cp.args, "--separate-git-dir", sm_gitdir, NULL);
1801+
if (clone_data->filter_options && clone_data->filter_options->choice)
1802+
strvec_pushf(&cp.args, "--filter=%s",
1803+
expand_list_objects_filter_spec(
1804+
clone_data->filter_options));
17991805
if (clone_data->single_branch >= 0)
18001806
strvec_push(&cp.args, clone_data->single_branch ?
18011807
"--single-branch" :
@@ -1852,6 +1858,7 @@ static int module_clone(int argc, const char **argv, const char *prefix)
18521858
{
18531859
int dissociate = 0, quiet = 0, progress = 0, require_init = 0;
18541860
struct module_clone_data clone_data = MODULE_CLONE_DATA_INIT;
1861+
struct list_objects_filter_options filter_options;
18551862

18561863
struct option module_clone_options[] = {
18571864
OPT_STRING(0, "prefix", &clone_data.prefix,
@@ -1881,30 +1888,34 @@ static int module_clone(int argc, const char **argv, const char *prefix)
18811888
N_("disallow cloning into non-empty directory")),
18821889
OPT_BOOL(0, "single-branch", &clone_data.single_branch,
18831890
N_("clone only one branch, HEAD or --branch")),
1891+
OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
18841892
OPT_END()
18851893
};
18861894

18871895
const char *const git_submodule_helper_usage[] = {
18881896
N_("git submodule--helper clone [--prefix=<path>] [--quiet] "
18891897
"[--reference <repository>] [--name <name>] [--depth <depth>] "
1890-
"[--single-branch] "
1898+
"[--single-branch] [--filter <filter-spec>]"
18911899
"--url <url> --path <path>"),
18921900
NULL
18931901
};
18941902

1903+
memset(&filter_options, 0, sizeof(filter_options));
18951904
argc = parse_options(argc, argv, prefix, module_clone_options,
18961905
git_submodule_helper_usage, 0);
18971906

18981907
clone_data.dissociate = !!dissociate;
18991908
clone_data.quiet = !!quiet;
19001909
clone_data.progress = !!progress;
19011910
clone_data.require_init = !!require_init;
1911+
clone_data.filter_options = &filter_options;
19021912

19031913
if (argc || !clone_data.url || !clone_data.path || !*(clone_data.path))
19041914
usage_with_options(git_submodule_helper_usage,
19051915
module_clone_options);
19061916

19071917
clone_submodule(&clone_data);
1918+
list_objects_filter_release(&filter_options);
19081919
return 0;
19091920
}
19101921

@@ -1994,6 +2005,7 @@ struct submodule_update_clone {
19942005
const char *recursive_prefix;
19952006
const char *prefix;
19962007
int single_branch;
2008+
struct list_objects_filter_options *filter_options;
19972009

19982010
/* to be consumed by git-submodule.sh */
19992011
struct update_clone_data *update_clone;
@@ -2154,6 +2166,9 @@ static int prepare_to_clone_next_submodule(const struct cache_entry *ce,
21542166
strvec_pushl(&child->args, "--prefix", suc->prefix, NULL);
21552167
if (suc->recommend_shallow && sub->recommend_shallow == 1)
21562168
strvec_push(&child->args, "--depth=1");
2169+
if (suc->filter_options && suc->filter_options->choice)
2170+
strvec_pushf(&child->args, "--filter=%s",
2171+
expand_list_objects_filter_spec(suc->filter_options));
21572172
if (suc->require_init)
21582173
strvec_push(&child->args, "--require-init");
21592174
strvec_pushl(&child->args, "--path", sub->path, NULL);
@@ -2498,6 +2513,8 @@ static int update_clone(int argc, const char **argv, const char *prefix)
24982513
const char *update = NULL;
24992514
struct pathspec pathspec;
25002515
struct submodule_update_clone suc = SUBMODULE_UPDATE_CLONE_INIT;
2516+
struct list_objects_filter_options filter_options;
2517+
int ret;
25012518

25022519
struct option module_update_clone_options[] = {
25032520
OPT_STRING(0, "prefix", &prefix,
@@ -2528,6 +2545,7 @@ static int update_clone(int argc, const char **argv, const char *prefix)
25282545
N_("disallow cloning into non-empty directory")),
25292546
OPT_BOOL(0, "single-branch", &suc.single_branch,
25302547
N_("clone only one branch, HEAD or --branch")),
2548+
OPT_PARSE_LIST_OBJECTS_FILTER(&filter_options),
25312549
OPT_END()
25322550
};
25332551

@@ -2540,20 +2558,26 @@ static int update_clone(int argc, const char **argv, const char *prefix)
25402558
update_clone_config_from_gitmodules(&suc.max_jobs);
25412559
git_config(git_update_clone_config, &suc.max_jobs);
25422560

2561+
memset(&filter_options, 0, sizeof(filter_options));
25432562
argc = parse_options(argc, argv, prefix, module_update_clone_options,
25442563
git_submodule_helper_usage, 0);
2564+
suc.filter_options = &filter_options;
25452565

25462566
if (update)
25472567
if (parse_submodule_update_strategy(update, &suc.update) < 0)
25482568
die(_("bad value for update parameter"));
25492569

2550-
if (module_list_compute(argc, argv, prefix, &pathspec, &suc.list) < 0)
2570+
if (module_list_compute(argc, argv, prefix, &pathspec, &suc.list) < 0) {
2571+
list_objects_filter_release(&filter_options);
25512572
return 1;
2573+
}
25522574

25532575
if (pathspec.nr)
25542576
suc.warn_if_uninitialized = 1;
25552577

2556-
return update_submodules(&suc);
2578+
ret = update_submodules(&suc);
2579+
list_objects_filter_release(&filter_options);
2580+
return ret;
25572581
}
25582582

25592583
static int run_update_procedure(int argc, const char **argv, const char *prefix)

git-submodule.sh

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ USAGE="[--quiet] [--cached]
1010
or: $dashless [--quiet] status [--cached] [--recursive] [--] [<path>...]
1111
or: $dashless [--quiet] init [--] [<path>...]
1212
or: $dashless [--quiet] deinit [-f|--force] (--all| [--] <path>...)
13-
or: $dashless [--quiet] update [--init] [--remote] [-N|--no-fetch] [-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] [--reference <repository>] [--recursive] [--[no-]single-branch] [--] [<path>...]
13+
or: $dashless [--quiet] update [--init [--filter=<filter-spec>]] [--remote] [-N|--no-fetch] [-f|--force] [--checkout|--merge|--rebase] [--[no-]recommend-shallow] [--reference <repository>] [--recursive] [--[no-]single-branch] [--] [<path>...]
1414
or: $dashless [--quiet] set-branch (--default|--branch <branch>) [--] <path>
1515
or: $dashless [--quiet] set-url [--] <path> <newurl>
1616
or: $dashless [--quiet] summary [--cached|--files] [--summary-limit <n>] [commit] [--] [<path>...]
@@ -49,6 +49,7 @@ dissociate=
4949
single_branch=
5050
jobs=
5151
recommend_shallow=
52+
filter=
5253

5354
die_if_unmatched ()
5455
{
@@ -347,6 +348,14 @@ cmd_update()
347348
--no-single-branch)
348349
single_branch="--no-single-branch"
349350
;;
351+
--filter)
352+
case "$2" in '') usage ;; esac
353+
filter="--filter=$2"
354+
shift
355+
;;
356+
--filter=*)
357+
filter="$1"
358+
;;
350359
--)
351360
shift
352361
break
@@ -361,6 +370,11 @@ cmd_update()
361370
shift
362371
done
363372

373+
if test -n "$filter" && test "$init" != "1"
374+
then
375+
usage
376+
fi
377+
364378
if test -n "$init"
365379
then
366380
cmd_init "--" "$@" || return
@@ -379,6 +393,7 @@ cmd_update()
379393
$single_branch \
380394
$recommend_shallow \
381395
$jobs \
396+
$filter \
382397
-- \
383398
"$@" || echo "#unmatched" $?
384399
} | {

t/t5617-clone-submodules-remote.sh

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,13 @@ test_expect_success 'setup' '
2828
)
2929
'
3030

31+
# bare clone giving "srv.bare" for use as our server.
32+
test_expect_success 'setup bare clone for server' '
33+
git clone --bare "file://$(pwd)/." srv.bare &&
34+
git -C srv.bare config --local uploadpack.allowfilter 1 &&
35+
git -C srv.bare config --local uploadpack.allowanysha1inwant 1
36+
'
37+
3138
test_expect_success 'clone with --no-remote-submodules' '
3239
test_when_finished "rm -rf super_clone" &&
3340
git clone --recurse-submodules --no-remote-submodules "file://$pwd/." super_clone &&
@@ -65,4 +72,38 @@ test_expect_success 'clone with --single-branch' '
6572
)
6673
'
6774

75+
# do basic partial clone from "srv.bare"
76+
# confirm partial clone was registered in the local config for super and sub.
77+
test_expect_success 'clone with --filter' '
78+
git clone --recurse-submodules \
79+
--filter blob:none --also-filter-submodules \
80+
"file://$pwd/srv.bare" super_clone &&
81+
test_cmp_config -C super_clone true remote.origin.promisor &&
82+
test_cmp_config -C super_clone blob:none remote.origin.partialclonefilter &&
83+
test_cmp_config -C super_clone/sub true remote.origin.promisor &&
84+
test_cmp_config -C super_clone/sub blob:none remote.origin.partialclonefilter
85+
'
86+
87+
# check that clone.filterSubmodules works (--also-filter-submodules can be
88+
# omitted)
89+
test_expect_success 'filters applied with clone.filterSubmodules' '
90+
test_config_global clone.filterSubmodules true &&
91+
git clone --recurse-submodules --filter blob:none \
92+
"file://$pwd/srv.bare" super_clone2 &&
93+
test_cmp_config -C super_clone2 true remote.origin.promisor &&
94+
test_cmp_config -C super_clone2 blob:none remote.origin.partialclonefilter &&
95+
test_cmp_config -C super_clone2/sub true remote.origin.promisor &&
96+
test_cmp_config -C super_clone2/sub blob:none remote.origin.partialclonefilter
97+
'
98+
99+
test_expect_success '--no-also-filter-submodules overrides clone.filterSubmodules=true' '
100+
test_config_global clone.filterSubmodules true &&
101+
git clone --recurse-submodules --filter blob:none \
102+
--no-also-filter-submodules \
103+
"file://$pwd/srv.bare" super_clone3 &&
104+
test_cmp_config -C super_clone3 true remote.origin.promisor &&
105+
test_cmp_config -C super_clone3 blob:none remote.origin.partialclonefilter &&
106+
test_cmp_config -C super_clone3/sub false --default false remote.origin.promisor
107+
'
108+
68109
test_done

0 commit comments

Comments
 (0)