Skip to content

Commit 97be040

Browse files
peffgitster
authored andcommitted
cat-file: only split on whitespace when %(rest) is used
Commit c334b87 (cat-file: split --batch input lines on whitespace, 2013-07-11) taught `cat-file --batch-check` to split input lines on the first whitespace, and stash everything after the first token into the %(rest) output format element. It claimed: Object names cannot contain spaces, so any input with spaces would have resulted in a "missing" line. But that is not correct. Refs, object sha1s, and various peeling suffixes cannot contain spaces, but some object names can. In particular: 1. Tree paths like "[<tree>]:path with whitespace" 2. Reflog specifications like "@{2 days ago}" 3. Commit searches like "rev^{/grep me}" or ":/grep me" To remain backwards compatible, we cannot split on whitespace by default, hence we will ship 1.8.4 with the commit reverted. Resurrect its attempt but in a weaker form; only do the splitting when "%(rest)" is used in the output format. Since that element did not exist at all before c334b87, old scripts cannot be affected. The existence of object names with spaces does mean that you cannot reliably do: echo ":path with space and other data" | git cat-file --batch-check="%(objectname) %(rest)" as it would split the path and feed only ":path" to get_sha1. But that command is nonsensical. If you wanted to see "and other data" in "%(rest)", git cannot possibly know where the filename ends and the "rest" begins. It might be more robust to have something like "-z" to separate the input elements. But this patch is still a reasonable step before having that. It makes the easy cases easy; people who do not care about %(rest) do not have to consider it, and the %(rest) code handles the spaces and newlines of "rev-list --objects" correctly. Hard cases remain hard but possible (if you might get whitespace in your input, you do not get to use %(rest) and must split and join the output yourself using more flexible tools). And most importantly, it does not preclude us from having different splitting rules later if a "-z" (or similar) option is added. So we can make the hard cases easier later, if we choose to. Signed-off-by: Jeff King <[email protected]> Signed-off-by: Junio C Hamano <[email protected]>
1 parent 062aeee commit 97be040

File tree

3 files changed

+55
-5
lines changed

3 files changed

+55
-5
lines changed

Documentation/git-cat-file.txt

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -86,10 +86,9 @@ BATCH OUTPUT
8686
------------
8787

8888
If `--batch` or `--batch-check` is given, `cat-file` will read objects
89-
from stdin, one per line, and print information about them.
90-
91-
Each line is considered as a whole object name, and is parsed as if
92-
given to linkgit:git-rev-parse[1].
89+
from stdin, one per line, and print information about them. By default,
90+
the whole line is considered as an object, as if it were fed to
91+
linkgit:git-rev-parse[1].
9392

9493
You can specify the information shown for each object by using a custom
9594
`<format>`. The `<format>` is copied literally to stdout for each
@@ -110,6 +109,13 @@ newline. The available atoms are:
110109
The size, in bytes, that the object takes up on disk. See the
111110
note about on-disk sizes in the `CAVEATS` section below.
112111

112+
`rest`::
113+
If this atom is used in the output string, input lines are split
114+
at the first whitespace boundary. All characters before that
115+
whitespace are considered to be the object name; characters
116+
after that first run of whitespace (i.e., the "rest" of the
117+
line) are output in place of the `%(rest)` atom.
118+
113119
If no format is specified, the default format is `%(objectname)
114120
%(objecttype) %(objectsize)`.
115121

builtin/cat-file.c

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,13 +119,21 @@ struct expand_data {
119119
enum object_type type;
120120
unsigned long size;
121121
unsigned long disk_size;
122+
const char *rest;
122123

123124
/*
124125
* If mark_query is true, we do not expand anything, but rather
125126
* just mark the object_info with items we wish to query.
126127
*/
127128
int mark_query;
128129

130+
/*
131+
* Whether to split the input on whitespace before feeding it to
132+
* get_sha1; this is decided during the mark_query phase based on
133+
* whether we have a %(rest) token in our format.
134+
*/
135+
int split_on_whitespace;
136+
129137
/*
130138
* After a mark_query run, this object_info is set up to be
131139
* passed to sha1_object_info_extended. It will point to the data
@@ -163,6 +171,11 @@ static void expand_atom(struct strbuf *sb, const char *atom, int len,
163171
data->info.disk_sizep = &data->disk_size;
164172
else
165173
strbuf_addf(sb, "%lu", data->disk_size);
174+
} else if (is_atom("rest", atom, len)) {
175+
if (data->mark_query)
176+
data->split_on_whitespace = 1;
177+
else if (data->rest)
178+
strbuf_addstr(sb, data->rest);
166179
} else
167180
die("unknown format element: %.*s", len, atom);
168181
}
@@ -273,7 +286,23 @@ static int batch_objects(struct batch_options *opt)
273286
warn_on_object_refname_ambiguity = 0;
274287

275288
while (strbuf_getline(&buf, stdin, '\n') != EOF) {
276-
int error = batch_one_object(buf.buf, opt, &data);
289+
int error;
290+
291+
if (data.split_on_whitespace) {
292+
/*
293+
* Split at first whitespace, tying off the beginning
294+
* of the string and saving the remainder (or NULL) in
295+
* data.rest.
296+
*/
297+
char *p = strpbrk(buf.buf, " \t");
298+
if (p) {
299+
while (*p && strchr(" \t", *p))
300+
*p++ = '\0';
301+
}
302+
data.rest = p;
303+
}
304+
305+
error = batch_one_object(buf.buf, opt, &data);
277306
if (error)
278307
return error;
279308
}

t/t1006-cat-file.sh

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -78,6 +78,13 @@ $content"
7878
echo $sha1 | git cat-file --batch-check="%(objecttype) %(objectname)" >actual &&
7979
test_cmp expect actual
8080
'
81+
82+
test_expect_success '--batch-check with %(rest)' '
83+
echo "$type this is some extra content" >expect &&
84+
echo "$sha1 this is some extra content" |
85+
git cat-file --batch-check="%(objecttype) %(rest)" >actual &&
86+
test_cmp expect actual
87+
'
8188
}
8289

8390
hello_content="Hello World"
@@ -91,6 +98,14 @@ test_expect_success "setup" '
9198

9299
run_tests 'blob' $hello_sha1 $hello_size "$hello_content" "$hello_content"
93100

101+
test_expect_success '--batch-check without %(rest) considers whole line' '
102+
echo "$hello_sha1 blob $hello_size" >expect &&
103+
git update-index --add --cacheinfo 100644 $hello_sha1 "white space" &&
104+
test_when_finished "git update-index --remove \"white space\"" &&
105+
echo ":white space" | git cat-file --batch-check >actual &&
106+
test_cmp expect actual
107+
'
108+
94109
tree_sha1=$(git write-tree)
95110
tree_size=33
96111
tree_pretty_content="100644 blob $hello_sha1 hello"

0 commit comments

Comments
 (0)