Skip to content

Commit 078b895

Browse files
grnchgitster
authored andcommitted
fetch-pack: new --stdin option to read refs from stdin
If a remote repo has too many tags (or branches), cloning it over the smart HTTP transport can fail because remote-curl.c puts all the refs from the remote repo on the fetch-pack command line. This can make the command line longer than the global OS command line limit, causing fetch-pack to fail. This is especially a problem on Windows where the command line limit is orders of magnitude shorter than Linux. There are already real repos out there that msysGit cannot clone over smart HTTP due to this problem. Here is an easy way to trigger this problem: git init too-many-refs cd too-many-refs echo bla > bla.txt git add . git commit -m test sha=$(git rev-parse HEAD) tag=$(perl -e 'print "bla" x 30') for i in `seq 50000`; do echo $sha refs/tags/$tag-$i >> .git/packed-refs done Then share this repo over the smart HTTP protocol and try cloning it: $ git clone http://localhost/.../too-many-refs/.git Cloning into 'too-many-refs'... fatal: cannot exec 'fetch-pack': Argument list too long 50k tags is obviously an absurd number, but it is required to demonstrate the problem on Linux because it has a much more generous command line limit. On Windows the clone fails with as little as 500 tags in the above loop, which is getting uncomfortably close to the number of tags you might see in real long lived repos. This is not just theoretical, msysGit is already failing to clone our company repo due to this. It's a large repo converted from CVS, nearly 10 years of history. Four possible solutions were discussed on the Git mailing list (in no particular order): 1) Call fetch-pack multiple times with smaller batches of refs. This was dismissed as inefficient and inelegant. 2) Add option --refs-fd=$n to pass a an fd from where to read the refs. This was rejected because inheriting descriptors other than stdin/stdout/stderr through exec() is apparently problematic on Windows, plus it would require changes to the run-command API to open extra pipes. 3) Add option --refs-from=$tmpfile to pass the refs using a temp file. This was not favored because of the temp file requirement. 4) Add option --stdin to pass the refs on stdin, one per line. In the end this option was chosen as the most efficient and most desirable from scripting perspective. There was however a small complication when using stdin to pass refs to fetch-pack. The --stateless-rpc option to fetch-pack also uses stdin for communication with the remote server. If we are going to sneak refs on stdin line by line, it would have to be done very carefully in the presence of --stateless-rpc, because when reading refs line by line we might read ahead too much data into our buffer and eat some of the remote protocol data which is also coming on stdin. One way to solve this would be to refactor get_remote_heads() in fetch-pack.c to accept a residual buffer from our stdin line parsing above, but this function is used in several places so other callers would be burdened by this residual buffer interface even when most of them don't need it. In the end we settled on the following solution: If --stdin is specified without --stateless-rpc, fetch-pack would read the refs from stdin one per line, in a script friendly format. However if --stdin is specified together with --stateless-rpc, fetch-pack would read the refs from stdin in packetized format (pkt-line) with a flush packet terminating the list of refs. This way we can read the exact number of bytes that we need from stdin, and then get_remote_heads() can continue reading from the same fd without losing a single byte of remote protocol data. This way the --stdin option only loses generality and scriptability when used together with --stateless-rpc, which is not easily scriptable anyway because it also uses pkt-line when talking to the remote server. Signed-off-by: Ivan Todoroski <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 828ea97 commit 078b895

File tree

3 files changed

+52
-1
lines changed

3 files changed

+52
-1
lines changed

Documentation/git-fetch-pack.txt

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,16 @@ OPTIONS
3232
--all::
3333
Fetch all remote refs.
3434

35+
--stdin::
36+
Take the list of refs from stdin, one per line. If there
37+
are refs specified on the command line in addition to this
38+
option, then the refs from stdin are processed after those
39+
on the command line.
40+
+
41+
If '--stateless-rpc' is specified together with this option then
42+
the list of refs must be in packet format (pkt-line). Each ref must
43+
be in a separate packet, and the list must end with a flush packet.
44+
3545
-q::
3646
--quiet::
3747
Pass '-q' flag to 'git unpack-objects'; this makes the

builtin/fetch-pack.c

Lines changed: 41 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,9 @@ static struct fetch_pack_args args = {
2323
};
2424

2525
static const char fetch_pack_usage[] =
26-
"git fetch-pack [--all] [--quiet|-q] [--keep|-k] [--thin] [--include-tag] [--upload-pack=<git-upload-pack>] [--depth=<n>] [--no-progress] [-v] [<host>:]<directory> [<refs>...]";
26+
"git fetch-pack [--all] [--stdin] [--quiet|-q] [--keep|-k] [--thin] "
27+
"[--include-tag] [--upload-pack=<git-upload-pack>] [--depth=<n>] "
28+
"[--no-progress] [-v] [<host>:]<directory> [<refs>...]";
2729

2830
#define COMPLETE (1U << 0)
2931
#define COMMON (1U << 1)
@@ -941,6 +943,10 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
941943
args.fetch_all = 1;
942944
continue;
943945
}
946+
if (!strcmp("--stdin", arg)) {
947+
args.stdin_refs = 1;
948+
continue;
949+
}
944950
if (!strcmp("-v", arg)) {
945951
args.verbose = 1;
946952
continue;
@@ -972,6 +978,40 @@ int cmd_fetch_pack(int argc, const char **argv, const char *prefix)
972978
if (!dest)
973979
usage(fetch_pack_usage);
974980

981+
if (args.stdin_refs) {
982+
/*
983+
* Copy refs from cmdline to new growable list, then
984+
* append the refs from the standard input.
985+
*/
986+
int alloc_heads = nr_heads;
987+
int size = nr_heads * sizeof(*heads);
988+
heads = memcpy(xmalloc(size), heads, size);
989+
if (args.stateless_rpc) {
990+
/* in stateless RPC mode we use pkt-line to read
991+
* from stdin, until we get a flush packet
992+
*/
993+
static char line[1000];
994+
for (;;) {
995+
int n = packet_read_line(0, line, sizeof(line));
996+
if (!n)
997+
break;
998+
if (line[n-1] == '\n')
999+
n--;
1000+
ALLOC_GROW(heads, nr_heads + 1, alloc_heads);
1001+
heads[nr_heads++] = xmemdupz(line, n);
1002+
}
1003+
}
1004+
else {
1005+
/* read from stdin one ref per line, until EOF */
1006+
struct strbuf line = STRBUF_INIT;
1007+
while (strbuf_getline(&line, stdin, '\n') != EOF) {
1008+
ALLOC_GROW(heads, nr_heads + 1, alloc_heads);
1009+
heads[nr_heads++] = strbuf_detach(&line, NULL);
1010+
}
1011+
strbuf_release(&line);
1012+
}
1013+
}
1014+
9751015
if (args.stateless_rpc) {
9761016
conn = NULL;
9771017
fd[0] = 0;

fetch-pack.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ struct fetch_pack_args {
1010
lock_pack:1,
1111
use_thin_pack:1,
1212
fetch_all:1,
13+
stdin_refs:1,
1314
verbose:1,
1415
no_progress:1,
1516
include_tag:1,

0 commit comments

Comments
 (0)