Skip to content

Commit b90d9f7

Browse files
chooglengitster
authored andcommitted
fetch: fetch unpopulated, changed submodules
"git fetch --recurse-submodules" only considers populated submodules (i.e. submodules that can be found by iterating the index), which makes "git fetch" behave differently based on which commit is checked out. As a result, even if the user has initialized all submodules correctly, they may not fetch the necessary submodule commits, and commands like "git checkout --recurse-submodules" might fail. Teach "git fetch" to fetch cloned, changed submodules regardless of whether they are populated. This is in addition to the current behavior of fetching populated submodules (which is always attempted regardless of what was fetched in the superproject, or even if nothing was fetched in the superproject). A submodule may be encountered multiple times (via the list of populated submodules or via the list of changed submodules). When this happens, "git fetch" only reads the 'populated copy' and ignores the 'changed copy'. Amend the verify_fetch_result() test helper so that we can assert on which 'copy' is being read. Signed-off-by: Glen Choo <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 5370b91 commit b90d9f7

File tree

6 files changed

+477
-46
lines changed

6 files changed

+477
-46
lines changed

Documentation/fetch-options.txt

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -186,15 +186,23 @@ endif::git-pull[]
186186
ifndef::git-pull[]
187187
--recurse-submodules[=yes|on-demand|no]::
188188
This option controls if and under what conditions new commits of
189-
populated submodules should be fetched too. It can be used as a
190-
boolean option to completely disable recursion when set to 'no' or to
191-
unconditionally recurse into all populated submodules when set to
192-
'yes', which is the default when this option is used without any
193-
value. Use 'on-demand' to only recurse into a populated submodule
194-
when the superproject retrieves a commit that updates the submodule's
195-
reference to a commit that isn't already in the local submodule
196-
clone. By default, 'on-demand' is used, unless
197-
`fetch.recurseSubmodules` is set (see linkgit:git-config[1]).
189+
submodules should be fetched too. When recursing through submodules,
190+
`git fetch` always attempts to fetch "changed" submodules, that is, a
191+
submodule that has commits that are referenced by a newly fetched
192+
superproject commit but are missing in the local submodule clone. A
193+
changed submodule can be fetched as long as it is present locally e.g.
194+
in `$GIT_DIR/modules/` (see linkgit:gitsubmodules[7]); if the upstream
195+
adds a new submodule, that submodule cannot be fetched until it is
196+
cloned e.g. by `git submodule update`.
197+
+
198+
When set to 'on-demand', only changed submodules are fetched. When set
199+
to 'yes', all populated submodules are fetched and submodules that are
200+
both unpopulated and changed are fetched. When set to 'no', submodules
201+
are never fetched.
202+
+
203+
When unspecified, this uses the value of `fetch.recurseSubmodules` if it
204+
is set (see linkgit:git-config[1]), defaulting to 'on-demand' if unset.
205+
When this option is used without any value, it defaults to 'yes'.
198206
endif::git-pull[]
199207

200208
-j::

Documentation/git-fetch.txt

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -287,12 +287,10 @@ include::transfer-data-leaks.txt[]
287287

288288
BUGS
289289
----
290-
Using --recurse-submodules can only fetch new commits in already checked
291-
out submodules right now. When e.g. upstream added a new submodule in the
292-
just fetched commits of the superproject the submodule itself cannot be
293-
fetched, making it impossible to check out that submodule later without
294-
having to do a fetch again. This is expected to be fixed in a future Git
295-
version.
290+
Using --recurse-submodules can only fetch new commits in submodules that are
291+
present locally e.g. in `$GIT_DIR/modules/`. If the upstream adds a new
292+
submodule, that submodule cannot be fetched until it is cloned e.g. by `git
293+
submodule update`. This is expected to be fixed in a future Git version.
296294

297295
SEE ALSO
298296
--------

builtin/fetch.c

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2178,13 +2178,13 @@ int cmd_fetch(int argc, const char **argv, const char *prefix)
21782178
max_children = fetch_parallel_config;
21792179

21802180
add_options_to_argv(&options);
2181-
result = fetch_populated_submodules(the_repository,
2182-
&options,
2183-
submodule_prefix,
2184-
recurse_submodules,
2185-
recurse_submodules_default,
2186-
verbosity < 0,
2187-
max_children);
2181+
result = fetch_submodules(the_repository,
2182+
&options,
2183+
submodule_prefix,
2184+
recurse_submodules,
2185+
recurse_submodules_default,
2186+
verbosity < 0,
2187+
max_children);
21882188
strvec_clear(&options);
21892189
}
21902190

submodule.c

Lines changed: 177 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -808,16 +808,48 @@ static const char *default_name_or_path(const char *path_or_name)
808808

809809
/*
810810
* Holds relevant information for a changed submodule. Used as the .util
811-
* member of the changed submodule string_list_item.
811+
* member of the changed submodule name string_list_item.
812+
*
813+
* (super_oid, path) allows the submodule config to be read from _some_
814+
* .gitmodules file. We store this information the first time we find a
815+
* superproject commit that points to the submodule, but this is
816+
* arbitrary - we can choose any (super_oid, path) that matches the
817+
* submodule's name.
818+
*
819+
* NEEDSWORK: Storing an arbitrary commit is undesirable because we can't
820+
* guarantee that we're reading the commit that the user would expect. A better
821+
* scheme would be to just fetch a submodule by its name. This requires two
822+
* steps:
823+
* - Create a function that behaves like repo_submodule_init(), but accepts a
824+
* submodule name instead of treeish_name and path. This should be easy
825+
* because repo_submodule_init() internally uses the submodule's name.
826+
*
827+
* - Replace most instances of 'struct submodule' (which is the .gitmodules
828+
* config) with just the submodule name. This is OK because we expect
829+
* submodule settings to be stored in .git/config (via "git submodule init"),
830+
* not .gitmodules. This also lets us delete get_non_gitmodules_submodule(),
831+
* which constructs a bogus 'struct submodule' for the sake of giving a
832+
* placeholder name to a gitlink.
812833
*/
813834
struct changed_submodule_data {
835+
/*
836+
* The first superproject commit in the rev walk that points to
837+
* the submodule.
838+
*/
839+
const struct object_id *super_oid;
840+
/*
841+
* Path to the submodule in the superproject commit referenced
842+
* by 'super_oid'.
843+
*/
844+
char *path;
814845
/* The submodule commits that have changed in the rev walk. */
815846
struct oid_array new_commits;
816847
};
817848

818849
static void changed_submodule_data_clear(struct changed_submodule_data *cs_data)
819850
{
820851
oid_array_clear(&cs_data->new_commits);
852+
free(cs_data->path);
821853
}
822854

823855
static void collect_changed_submodules_cb(struct diff_queue_struct *q,
@@ -862,9 +894,14 @@ static void collect_changed_submodules_cb(struct diff_queue_struct *q,
862894
continue;
863895

864896
item = string_list_insert(changed, name);
865-
if (!item->util)
897+
if (item->util)
898+
cs_data = item->util;
899+
else {
866900
item->util = xcalloc(1, sizeof(struct changed_submodule_data));
867-
cs_data = item->util;
901+
cs_data = item->util;
902+
cs_data->super_oid = commit_oid;
903+
cs_data->path = xstrdup(p->two->path);
904+
}
868905
oid_array_append(&cs_data->new_commits, &p->two->oid);
869906
}
870907
}
@@ -1253,14 +1290,36 @@ void check_for_new_submodule_commits(struct object_id *oid)
12531290
oid_array_append(&ref_tips_after_fetch, oid);
12541291
}
12551292

1293+
/*
1294+
* Returns 1 if there is at least one submodule gitdir in
1295+
* $GIT_DIR/modules and 0 otherwise. This follows
1296+
* submodule_name_to_gitdir(), which looks for submodules in
1297+
* $GIT_DIR/modules, not $GIT_COMMON_DIR.
1298+
*
1299+
* A submodule can be moved to $GIT_DIR/modules manually by running "git
1300+
* submodule absorbgitdirs", or it may be initialized there by "git
1301+
* submodule update".
1302+
*/
1303+
static int repo_has_absorbed_submodules(struct repository *r)
1304+
{
1305+
int ret;
1306+
struct strbuf buf = STRBUF_INIT;
1307+
1308+
strbuf_repo_git_path(&buf, r, "modules/");
1309+
ret = file_exists(buf.buf) && !is_empty_dir(buf.buf);
1310+
strbuf_release(&buf);
1311+
return ret;
1312+
}
1313+
12561314
static void calculate_changed_submodule_paths(struct repository *r,
12571315
struct string_list *changed_submodule_names)
12581316
{
12591317
struct strvec argv = STRVEC_INIT;
12601318
struct string_list_item *name;
12611319

1262-
/* No need to check if there are no submodules configured */
1263-
if (!submodule_from_path(r, NULL, NULL))
1320+
/* No need to check if no submodules would be fetched */
1321+
if (!submodule_from_path(r, NULL, NULL) &&
1322+
!repo_has_absorbed_submodules(r))
12641323
return;
12651324

12661325
strvec_push(&argv, "--"); /* argv[0] program name */
@@ -1333,7 +1392,16 @@ int submodule_touches_in_range(struct repository *r,
13331392
}
13341393

13351394
struct submodule_parallel_fetch {
1336-
int count;
1395+
/*
1396+
* The index of the last index entry processed by
1397+
* get_fetch_task_from_index().
1398+
*/
1399+
int index_count;
1400+
/*
1401+
* The index of the last string_list entry processed by
1402+
* get_fetch_task_from_changed().
1403+
*/
1404+
int changed_count;
13371405
struct strvec args;
13381406
struct repository *r;
13391407
const char *prefix;
@@ -1342,7 +1410,16 @@ struct submodule_parallel_fetch {
13421410
int quiet;
13431411
int result;
13441412

1413+
/*
1414+
* Names of submodules that have new commits. Generated by
1415+
* walking the newly fetched superproject commits.
1416+
*/
13451417
struct string_list changed_submodule_names;
1418+
/*
1419+
* Names of submodules that have already been processed. Lets us
1420+
* avoid fetching the same submodule more than once.
1421+
*/
1422+
struct string_list seen_submodule_names;
13461423

13471424
/* Pending fetches by OIDs */
13481425
struct fetch_task **oid_fetch_tasks;
@@ -1353,6 +1430,7 @@ struct submodule_parallel_fetch {
13531430
#define SPF_INIT { \
13541431
.args = STRVEC_INIT, \
13551432
.changed_submodule_names = STRING_LIST_INIT_DUP, \
1433+
.seen_submodule_names = STRING_LIST_INIT_DUP, \
13561434
.submodules_with_errors = STRBUF_INIT, \
13571435
}
13581436

@@ -1390,6 +1468,7 @@ struct fetch_task {
13901468
const struct submodule *sub;
13911469
unsigned free_sub : 1; /* Do we need to free the submodule? */
13921470
const char *default_argv; /* The default fetch mode. */
1471+
struct strvec git_args; /* Args for the child git process. */
13931472

13941473
struct oid_array *commits; /* Ensure these commits are fetched */
13951474
};
@@ -1425,6 +1504,8 @@ static void fetch_task_release(struct fetch_task *p)
14251504
if (p->repo)
14261505
repo_clear(p->repo);
14271506
FREE_AND_NULL(p->repo);
1507+
1508+
strvec_clear(&p->git_args);
14281509
}
14291510

14301511
static struct repository *get_submodule_repo_for(struct repository *r,
@@ -1463,6 +1544,9 @@ static struct fetch_task *fetch_task_create(struct submodule_parallel_fetch *spf
14631544
task->free_sub = 1;
14641545
}
14651546

1547+
if (string_list_lookup(&spf->seen_submodule_names, task->sub->name))
1548+
goto cleanup;
1549+
14661550
switch (get_fetch_recurse_config(task->sub, spf))
14671551
{
14681552
default:
@@ -1493,10 +1577,12 @@ static struct fetch_task *fetch_task_create(struct submodule_parallel_fetch *spf
14931577
}
14941578

14951579
static struct fetch_task *
1496-
get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
1580+
get_fetch_task_from_index(struct submodule_parallel_fetch *spf,
1581+
struct strbuf *err)
14971582
{
1498-
for (; spf->count < spf->r->index->cache_nr; spf->count++) {
1499-
const struct cache_entry *ce = spf->r->index->cache[spf->count];
1583+
for (; spf->index_count < spf->r->index->cache_nr; spf->index_count++) {
1584+
const struct cache_entry *ce =
1585+
spf->r->index->cache[spf->index_count];
15001586
struct fetch_task *task;
15011587

15021588
if (!S_ISGITLINK(ce->ce_mode))
@@ -1511,7 +1597,7 @@ get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
15111597
strbuf_addf(err, _("Fetching submodule %s%s\n"),
15121598
spf->prefix, ce->name);
15131599

1514-
spf->count++;
1600+
spf->index_count++;
15151601
return task;
15161602
} else {
15171603
struct strbuf empty_submodule_path = STRBUF_INIT;
@@ -1539,11 +1625,83 @@ get_fetch_task(struct submodule_parallel_fetch *spf, struct strbuf *err)
15391625
return NULL;
15401626
}
15411627

1628+
static struct fetch_task *
1629+
get_fetch_task_from_changed(struct submodule_parallel_fetch *spf,
1630+
struct strbuf *err)
1631+
{
1632+
for (; spf->changed_count < spf->changed_submodule_names.nr;
1633+
spf->changed_count++) {
1634+
struct string_list_item item =
1635+
spf->changed_submodule_names.items[spf->changed_count];
1636+
struct changed_submodule_data *cs_data = item.util;
1637+
struct fetch_task *task;
1638+
1639+
if (!is_tree_submodule_active(spf->r, cs_data->super_oid,cs_data->path))
1640+
continue;
1641+
1642+
task = fetch_task_create(spf, cs_data->path,
1643+
cs_data->super_oid);
1644+
if (!task)
1645+
continue;
1646+
1647+
if (!task->repo) {
1648+
strbuf_addf(err, _("Could not access submodule '%s' at commit %s\n"),
1649+
cs_data->path,
1650+
find_unique_abbrev(cs_data->super_oid, DEFAULT_ABBREV));
1651+
1652+
fetch_task_release(task);
1653+
free(task);
1654+
continue;
1655+
}
1656+
1657+
if (!spf->quiet)
1658+
strbuf_addf(err,
1659+
_("Fetching submodule %s%s at commit %s\n"),
1660+
spf->prefix, task->sub->path,
1661+
find_unique_abbrev(cs_data->super_oid,
1662+
DEFAULT_ABBREV));
1663+
1664+
spf->changed_count++;
1665+
/*
1666+
* NEEDSWORK: Submodules set/unset a value for
1667+
* core.worktree when they are populated/unpopulated by
1668+
* "git checkout" (and similar commands, see
1669+
* submodule_move_head() and
1670+
* connect_work_tree_and_git_dir()), but if the
1671+
* submodule is unpopulated in another way (e.g. "git
1672+
* rm", "rm -r"), core.worktree will still be set even
1673+
* though the directory doesn't exist, and the child
1674+
* process will crash while trying to chdir into the
1675+
* nonexistent directory.
1676+
*
1677+
* In this case, we know that the submodule has no
1678+
* working tree, so we can work around this by
1679+
* setting "--work-tree=." (--bare does not work because
1680+
* worktree settings take precedence over bare-ness).
1681+
* However, this is not necessarily true in other cases,
1682+
* so a generalized solution is still necessary.
1683+
*
1684+
* Possible solutions:
1685+
* - teach "git [add|rm]" to unset core.worktree and
1686+
* discourage users from removing submodules without
1687+
* using a Git command.
1688+
* - teach submodule child processes to ignore stale
1689+
* core.worktree values.
1690+
*/
1691+
strvec_push(&task->git_args, "--work-tree=.");
1692+
return task;
1693+
}
1694+
return NULL;
1695+
}
1696+
15421697
static int get_next_submodule(struct child_process *cp, struct strbuf *err,
15431698
void *data, void **task_cb)
15441699
{
15451700
struct submodule_parallel_fetch *spf = data;
1546-
struct fetch_task *task = get_fetch_task(spf, err);
1701+
struct fetch_task *task =
1702+
get_fetch_task_from_index(spf, err);
1703+
if (!task)
1704+
task = get_fetch_task_from_changed(spf, err);
15471705

15481706
if (task) {
15491707
struct strbuf submodule_prefix = STRBUF_INIT;
@@ -1553,6 +1711,8 @@ static int get_next_submodule(struct child_process *cp, struct strbuf *err,
15531711
prepare_submodule_repo_env_in_gitdir(&cp->env_array);
15541712
cp->git_cmd = 1;
15551713
strvec_init(&cp->args);
1714+
if (task->git_args.nr)
1715+
strvec_pushv(&cp->args, task->git_args.v);
15561716
strvec_pushv(&cp->args, spf->args.v);
15571717
strvec_push(&cp->args, task->default_argv);
15581718
strvec_push(&cp->args, "--submodule-prefix");
@@ -1564,6 +1724,7 @@ static int get_next_submodule(struct child_process *cp, struct strbuf *err,
15641724
*task_cb = task;
15651725

15661726
strbuf_release(&submodule_prefix);
1727+
string_list_insert(&spf->seen_submodule_names, task->sub->name);
15671728
return 1;
15681729
}
15691730

@@ -1678,11 +1839,11 @@ static int fetch_finish(int retvalue, struct strbuf *err,
16781839
return 0;
16791840
}
16801841

1681-
int fetch_populated_submodules(struct repository *r,
1682-
const struct strvec *options,
1683-
const char *prefix, int command_line_option,
1684-
int default_option,
1685-
int quiet, int max_parallel_jobs)
1842+
int fetch_submodules(struct repository *r,
1843+
const struct strvec *options,
1844+
const char *prefix, int command_line_option,
1845+
int default_option,
1846+
int quiet, int max_parallel_jobs)
16861847
{
16871848
int i;
16881849
struct submodule_parallel_fetch spf = SPF_INIT;

0 commit comments

Comments
 (0)